## Monday, February 2, 2015

### Finding mass spectra with PostgreSQL: Spectral contrast angle in SQL

The spectral contrast angle of two spectra is another, according to literature, one of the methods for comparing spectra by similarity.

The spectral contrast angle S is calculated by building a intensity vector in N-dimensional space for each spectrum, where N is the number of m/z peaks in the spectrum, and finding the cosine of the angle between the two vectors. If both are the same, S is 1.0 if the two are orthogonal, S is 0.0.

The formula is S = sum(a.intensity*b.intensity) / sqrt(sum(a.intensity^2)*sum(b.intensity^2)) with a and b being the spectra to compare.

On a table like

CREATE TABLE <spectra_table>
(
id integer NOT NULL,
"m/z" numeric NOT NULL,
intensity numeric NOT NULL

)

this can easily be done in PostgtreSQL with a common table expression:

WITH SQ AS (select "m/z", intensity from <spectra_table> where id = 1), ST AS (select  "m/z", intensity from <spectra_table> where id = 2),
qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)
select sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"

One interesting property of S is, that it evaluates to 1.0 if the intensities of the two spectra are different, but only by a multiple of an integer. If I understood it right, this is correct, because this means that you have measured the same composition at different concentrations.

Put into a function:

CREATE OR REPLACE FUNCTION spectral_contrast(int, int) RETURNS numeric
AS ' WITH SQ AS (select "m/z", intensity from <spectra_table> where id = \$1), ST AS (select  "m/z", intensity from <spectra_table> where id = \$2),
qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)
select coalesce(sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) ,0.0) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"'

LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;

Please not the additional coalesce() to return 0.0 when the SQL evaluates to NULL because of the join.

In a real application, the calculation of S can of course be accelerated, e.g. by prefiltering spectra by their number of peaks or their m/z range.

That's it for today.