*the*methods for comparing spectra by similarity.

The spectral contrast angle

*S*is calculated by building a intensity vector in

*N*-dimensional space for each spectrum, where

*N*is the number of m/z peaks in the spectrum, and finding the cosine of the angle between the two vectors. If both are the same,

*S*is 1.0 if the two are orthogonal,

*S*is 0.0.

The formula is

*S = sum(a.intensity*b.intensity) / sqrt(sum(a.intensity^2)*sum(b.intensity^2))*with

*a*and

*b*being the spectra to compare.

On a table like

CREATE TABLE <spectra_table>

(

id integer NOT NULL,

"m/z" numeric NOT NULL,

intensity numeric NOT NULL

)

this can easily be done in PostgtreSQL with a common table expression:

WITH SQ AS (select "m/z", intensity from <spectra_table> where id = 1), ST AS (select "m/z", intensity from <spectra_table> where id = 2),

qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)

select sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"

One interesting property of

*S*is, that it evaluates to 1.0 if the intensities of the two spectra are different, but only by a multiple of an integer. If I understood it right, this is correct, because this means that you have measured the same composition at different concentrations.

Put into a function:

CREATE OR REPLACE FUNCTION spectral_contrast(int, int) RETURNS numeric

AS ' WITH SQ AS (select "m/z", intensity from <spectra_table> where id = $1), ST AS (select "m/z", intensity from <spectra_table> where id = $2),

qc as (select count(1) as qp from sq), tc as (select count(1) as tp from st)

select coalesce(sum(sq.intensity*st.intensity)/sqrt(sum(sq.intensity^2)*sum(st.intensity^2)) ,0.0) as spectral_contrast from SQ,ST,qc,tc where qc.qp = tc.tp and SQ."m/z" = ST."m/z"'

LANGUAGE SQL

IMMUTABLE

RETURNS NULL ON NULL INPUT;

Please not the additional

*coalesce()*to return 0.0 when the SQL evaluates to NULL because of the join.

In a real application, the calculation of

*S*can of course be accelerated, e.g. by prefiltering spectra by their number of peaks or their m/z range.

That's it for today.

## No comments:

## Post a Comment