tag:blogger.com,1999:blog-8333362953216176367.post8386814505817292623..comments2022-07-24T21:57:29.200+02:00Comments on The plate is bad: Deriving the elemental composition from a molformula in pgchem::tigressergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-8333362953216176367.post-84956935768844001742015-08-04T09:07:13.056+02:002015-08-04T09:07:13.056+02:00"Damn blogger ate my reply. :(" And I sh..."Damn blogger ate my reply. :(" And I should check my waiting comments more often. :-( <br /><br />"To fix the bug, replace"<br /><br />Indeed it does. Unfortunately it also does this now:<br /><br />'C3H6NO2' -><br /><br />C 13<br />H 16<br />N 1<br />O 2<br /><br />which is wrong.<br /><br />It also counts elements that do not exist:<br /><br />'C3H6NO2X' -><br /><br />C 13<br />H 16<br />N 1<br />O 2<br />X 1<br /><br />So, having a whitelist of valid elements might come in handy. (In this particular case, it was also an application requirement.)<br /><br />And then, chemists might show up with something like this: 'CH3COOH'<br /><br />Or 'D2O' should be recognized as 'H2O' anyway.<br /><br />The tigress does not generate such formula but since they are perfectly valid, users might. This is easier to intercept in a function I guess.ergohttps://www.blogger.com/profile/06303496169445217331noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-55387288883762260592015-07-29T23:56:07.007+02:002015-07-29T23:56:07.007+02:00Damn blogger ate my reply. :(
If you indent David...Damn blogger ate my reply. :(<br /><br />If you indent David's code properly and have a small understanding of regex then it's really not hard to figure out what the code is doing.<br /><br />To fix the bug, replace<br /><br /> '([A-Za-z])([A-Z])',<br /><br />with<br /><br /> '([A-Za-z])([A-Z])?',Anonymoushttps://www.blogger.com/profile/10225165980320590149noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-15762326850155887252015-07-29T23:54:34.623+02:002015-07-29T23:54:34.623+02:00To fix that problem, replace
'([A-Za-z])([A-...To fix that problem, replace<br /><br /> '([A-Za-z])([A-Z])',<br /><br />with<br /><br /> '([A-Za-z])([A-Z])?',<br /><br />If you want to stick with enumerating every element, you should at least include all known elements... but of course then it breaks when a new one is discovered. With proper indentation and trivial knowledge of regex it's really not hard to figure out what David's code is doing...Anonymoushttps://www.blogger.com/profile/10225165980320590149noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-51643269501885489712015-07-24T10:18:40.386+02:002015-07-24T10:18:40.386+02:00Whoa, cool.
This approach has the benefit of reco...Whoa, cool.<br /><br />This approach has the benefit of recognizing all element types, without needing a list of 'allowed' atoms. However it does not remove charges (easy to fix), if the last atom has no count it evaluates the count to NULL instead of 1, and speed wise I can measure no difference between the two.<br /><br />So since some poor guys in operations may have to take a look at this too, I stay with the function for readability, albeit the Kung-Fu factor of your pure SQL solution is way higher of course. :-)ergohttps://www.blogger.com/profile/06303496169445217331noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-69604465305256160032015-07-23T15:08:27.692+02:002015-07-23T15:08:27.692+02:00You can do this without writing a function, or you...You can do this without writing a function, or you could wrap the SQL in a function without PL/pgsql. The orignal was indented, but I don't know what blogspot will do to it.<br /><br />SELECT<br /> (regexp_split_to_array(pair,' '))[1] AS element,<br /> (regexp_split_to_array(pair,' '))[2] AS count<br />FROM<br /> regexp_split_to_table( /* Split elemens and counts into a table */<br /> regexp_replace( /* Inject spaces between element and count */<br /> regexp_replace( /* Inject commas between elements */<br /> regexp_replace( /* Normalize element counts with 1s */<br /> 'C3H6NO2',<br /> '([A-Za-z])([A-Z])',<br /> $$\11\2$$,<br /> 'g'<br /> ),<br /> '([[:digit:]]+)(.)',<br /> $$\1,\2$$,<br /> 'g'<br /> ),<br /> '([^[:digit:]])([[:digit:]])',<br /> $$\1 \2$$,<br /> 'g'<br /> ),<br /> ','<br /> ) AS elements(pair);David Fetterhttps://www.blogger.com/profile/06408194859600882631noreply@blogger.com