Friday, July 23, 2010

Thesis abstract

The storage and retrieval of chemical graphical datatypes such as structures and reactions in relational database systems is a common technique used in academia and industry alike. Due to the computationally intensive algorithms used for (sub)graphisomorphism detection, such systems commonly use faster screening mechanisms in order to reduce the set of potentional match positives before applying aforementioned algorithms.

Widely used screening mechanisms are based on numerical and binary vectors, called fingerprints, with a clear dominance of binary fingerprints due to the raw speed advantage of bitwise operations and compactness in storage. The two most commonly used types of binary fingerprints are path-generated and substructure-generated, both of which have specific shortcomings, especially blind spots.

To overcome this shortcomings, the Pgchem::Tigress chemistry extension to the PostgreSQL object-relational database management system uses a hybrid binary fingerprint, consisting of an invariant path-generated part and an substructure-generated part which is externally configurable through a dictionary of substructure patterns.

This thesis presents a novel approach of using dynamic discrete optimization to find an optimized dictionary configuration for the substructure-generated part of the fingerprint for arbitrary sets of structural data.

By means of applying the method developed in this thesis, the computational power neccessary to run a chemical information system can be reduced by 42 percent on average. By improving the query throughput, upgrading the server hardware to the next level of computational power can be avoided and thus opportunity revenues of the operating costs are realized.

Update: It is now readable online.

Monday, July 19, 2010

Thesis defence

The date of my defence is 21.07.2010, 10:00-11:15 UTC+2.

If somebody's willing to cross their fingers for me, it'll be appreciated. :-)

Friday, July 9, 2010

ChemSpider Web API, anyone?

I'm trying to do a structure search using the published ChemSpider Web API:

I thought it would be sufficient to get the form they provide, put a molfile into the designated area, add an submit button and it works.

But all I get back are blank pages with no content. No error messages, nothing useful.

How is the darn thing meant to be used?

UPDATE: Adding a method="post" to the form did the trick, but apparently only substructure searches with unlimited result set are supported. This results in very poor performance with structures of low selectivity. Still not good


  1. I had an extra line in my molfile.
  2. There is really a bug in the API. Substructure search triggers exact search and vice versa. Since the API is heavily in use with workarounds to this bug, fixing this would break many applications.

Monday, July 5, 2010

The curious case of the infinite canvas

After some fruitless attempts to extend the usable canvas of DCE ChemPad with some kind of self-made view port, in order to allow drawing of larger-than-screen structures, I think I now found a more painless way.

Simply wrapping the DCE Control in a HorizontalScrollView was not enough, because the ScrollView intercepts too much MotionEvents, but after subclassing HorizontalScrollView and overriding onInterceptTouchEvent(), it seems to work as expected. The code needs some more polishing though to call it working though.

As you can see on the screenshot, the structure now can be scrolled left and right as needed to draw more than would fit on the screen. The thin line on the bottom separates the drawing from the scrolling sensitive area, because otherwise the whole screen would be scroll sensitive and drawing while scrolling is not very precise. The screenshot is from a NexusOne with Android 2.2.