The plate is bad: Deriving the elemental composition from a molformula in pgchem::tigress

Thursday, July 23, 2015

Deriving the elemental composition from a molformula in pgchem::tigress

pgchem::tigress can generate molecular formulae like C3H6NO2- from chemical structures.

But what if we need access to the elemental composition as a relation, e.g:

element	count
C	3
N	1
O	2

Fortunately, PostgreSQL is awesome:

CREATE OR REPLACE FUNCTION elemental_composition(molformula TEXT)
RETURNS TABLE(element TEXT, count INTEGER) AS
$BODY$
DECLARE token TEXT[];
DECLARE elements TEXT[];
BEGIN
elements := ARRAY['C','N','O','P','S','Cl']; --expand as needed
molformula := REPLACE(REPLACE(molformula,'-',''),'+','');

FOREACH element IN ARRAY elements LOOP
count := 1;
token := REGEXP_MATCHES(molformula, element || '[\d?]*');

IF (token[1] IS NOT NULL) THEN
    token := REGEXP_MATCHES(token[1],'[0-9]+');
        IF (token[1] iS NOT NULL) THEN
            count := token[1]::INTEGER;
        END IF;
RETURN NEXT;
END IF;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE STRICT
COST 1000;

SELECT * FROM elemental_composition('C3H6NO2-');

And that's it. Did I already mention that PostgreSQL is awesome? :-)

5 comments:

David FetterJuly 23, 2015 at 3:08 PM
You can do this without writing a function, or you could wrap the SQL in a function without PL/pgsql. The orignal was indented, but I don't know what blogspot will do to it.

SELECT
(regexp_split_to_array(pair,' '))[1] AS element,
(regexp_split_to_array(pair,' '))[2] AS count
FROM
regexp_split_to_table( /* Split elemens and counts into a table */
regexp_replace( /* Inject spaces between element and count */
regexp_replace( /* Inject commas between elements */
regexp_replace( /* Normalize element counts with 1s */
'C3H6NO2',
'([A-Za-z])([A-Z])',
$$\11\2$$,
'g'
),
'([[:digit:]]+)(.)',
$$\1,\2$$,
'g'
),
'([^[:digit:]])([[:digit:]])',
$$\1 \2$$,
'g'
),
','
) AS elements(pair);
ReplyDelete
Replies

Add comment

Thursday, July 23, 2015

Deriving the elemental composition from a molformula in pgchem::tigress

5 comments:

Blog Archive

Blog Shortlist