Friday, November 26, 2010

pgchem::tigress sets new world record!

The IPB now has successfully loaded ~28 million structures from pubchem into a pgchem::tigress / PostgreSQL database.

This is the largest installation I know of and therefore sets a new world record - for pgchem. :-)

Friday, November 12, 2010

Development of pgchem::tigress 2.0 has started

I finally managed to get OpenBabel 2.3.0 and PostgreSQL 9.x to talk to each other on Windows. This only works when building with the MinGW Makefiles, not with MSYS, as PostgreSQL does not like mixing with the MSYS runtime libraries. And still a static build of libopenbabel and libinchi is required.

The overall good news is that the Makefile could be simplified a lot with help from Steffen Neumann from the Leibniz Institute of Plant Biochemistry and all current functions are still working with OB 2.3. The build process on Windows is still a bitch compared to building on Posix systems, though.

The roadmap for 2.0:

  • A lot of code needs cleanup, especially the make_molecule() function, which is unreadable and error prone. Mr. Neumann has also identified various possible memory leaks that need to be fixed
  • Allow 2D coordinate generation for Molfile output
  • Using OBQuery for non-SMARTS substructure searching
  • Add Spectrophore(TM) output
  • Add support for exporting Andrew Dalke's chemfp formats  (at least FPS)
  • Move SVG output from Dingo to OB, having one library less to care for
  • Add the fingerprint  linear optimizer in a generally usable implementation
  • See if the new stereochemistry inside OB fixes various issues that were reported
  • Basic reaction support
  • Win64 builds ?

This will take some time, so don't hold your breath. Since there are already some bugs reported for OB 2.3.0 I guess I'll have to wait for those being fixed before a production release of pgchem::tigress 2.0 becomes available.

Tuesday, November 9, 2010

GCC6 Day Three (09.11)

So, I rediscovered my notes from day three. Unfortunately there are only two lectures I can really write something about:

  1. Noel O'Boyles lecture about the in-silico design of polymers with optimal properties for organic solar cells. As far as I understood this, the predicted efficency for organic solar cells is about 13 %. Experimentally 6 % have been reached yet. By using OpenBabel together with cclib and Gaussian, synthetically accessible polymers with 11 % predicted efficiency could be designed.
  2. Roger Sayle from NextMove showed his recent work on a chemistry aware spell checker, based on the observation, that the current problems with chemical text mining do not come from poor OCR or poor name2structure conversion, but mainly from bad input because of typos etc. in the source texts and common spell checking software cannot cope with this.

Monday, November 8, 2010

GCC6 Day Two (08.11)

So, I rediscovered my notes from day two. Unfortunately there are only two lectures I can really write something about:
  1. The WizePairZ  algorithm was quite interesting. It aims to solve the problem that often unwanted properties of drug candidates scale in correlation with wanted properties. The idea behind WizePairZ is to automatically find transformations from one moelcule to another that reduce unwanted properties (ideally while improving wanted properties). Then such transformations are applied to molecules which are close to the boundary between 'unwanted' and 'wanted' space in order to push them over the boundary.
  2. The in-silico prediction of Phototocicity aims to find molecular descriptors and associated models that allow to predict the potential phototoxicity of substances, especially their UV absorption between 290 - 450 nm wavelength where the human skin doesn't filter anymore and the ozone layer doesn't filter yet.

GCC6 Day One (07.11)

Arrived 14:00 in Goslar

What's new in Knime presentation:
Mainly usability improvements, like conditional paths through workflows and annotations. This is really getting somewhere. Release planned for December 6th 2010

My presentation:
I was not boohed from the stage

MOSGrid presentation:
It's a molecular simulation grid driven by a consortium of academia and industry, lead by the University of Cologne

The planned usage is, that you can specify a task through a web interface in MSML (molecular simulation markup laguage), which describes a task for the grid, then the MSML is translated into input specific to a program (e.g. Gaussian).
The program is then run on the grid and the program specific output is translated again into MSML and returned to the user.


  • Gaussian
  • Turbomole

are supported.

Tasks can be chained into workflows. Lots of expertise and manpower, and if buerocracy doesn't step in the way, this could work and is worth a look. Unfortunately, the licensing of the used programs in a grid seems to be an unsolved legal problem.

Demonstration of MOSGrid:
A basic web interface (to Gaussian I guess) was shown and how to run quantum calculations on the grid from there. The interface is designed for humans. Currently there are no plans to offer this as a service to machines. Licensinsg issues of using commercial software in a grid must be ironed out to make it widely available, but beta testing will start in December to a limited number of users.

Friday, November 5, 2010

Thursday, November 4, 2010

The SMARTSViewer

This is a nifty service for visualizing SMARTS patterns from the University of Hamburg:

Probably useful when your SMARTS pattern does not match what you think it should...

Wednesday, October 27, 2010

Nasty thoughts

Typically reading the FDA Warning Letters is pretty dry stuff, but this looks like a fun product.

"The effects of ... will promote a thrilling energy, more stamina, constant readiness, nasty thoughts, prolonged arousal, feelings of well being, romantic and sensual experience."

"Nasty thoughts" - those guys from Natural Wellness, LLC really should try Mefloquine...

Wednesday, August 11, 2010

Searching Structures in ChemSpider from the Android Browser... now implemented as empirical formula search.

Not what was ultimately desired, but at least it works.

Friday, August 6, 2010

Searching Structures in ChemSpider from the Android Browser...

...does not work at the moment.

Having, with the friendly help of ChemSpider support, found out how to use their web API, I tried to call it from the android browser.

The HTML form works at least in Firefox, IE 6.x and Chrome, but from the Android browser it either opens the ChemSpider start page - or sometimes only a blank page.

It seems that I'll have to monitor what the Android browser really sends to ChemSpider with Wireshark to find out what the problem is.

Friday, July 23, 2010

Thesis abstract

The storage and retrieval of chemical graphical datatypes such as structures and reactions in relational database systems is a common technique used in academia and industry alike. Due to the computationally intensive algorithms used for (sub)graphisomorphism detection, such systems commonly use faster screening mechanisms in order to reduce the set of potentional match positives before applying aforementioned algorithms.

Widely used screening mechanisms are based on numerical and binary vectors, called fingerprints, with a clear dominance of binary fingerprints due to the raw speed advantage of bitwise operations and compactness in storage. The two most commonly used types of binary fingerprints are path-generated and substructure-generated, both of which have specific shortcomings, especially blind spots.

To overcome this shortcomings, the Pgchem::Tigress chemistry extension to the PostgreSQL object-relational database management system uses a hybrid binary fingerprint, consisting of an invariant path-generated part and an substructure-generated part which is externally configurable through a dictionary of substructure patterns.

This thesis presents a novel approach of using dynamic discrete optimization to find an optimized dictionary configuration for the substructure-generated part of the fingerprint for arbitrary sets of structural data.

By means of applying the method developed in this thesis, the computational power neccessary to run a chemical information system can be reduced by 42 percent on average. By improving the query throughput, upgrading the server hardware to the next level of computational power can be avoided and thus opportunity revenues of the operating costs are realized.

Update: It is now readable online.

Monday, July 19, 2010

Thesis defence

The date of my defence is 21.07.2010, 10:00-11:15 UTC+2.

If somebody's willing to cross their fingers for me, it'll be appreciated. :-)

Friday, July 9, 2010

ChemSpider Web API, anyone?

I'm trying to do a structure search using the published ChemSpider Web API:

I thought it would be sufficient to get the form they provide, put a molfile into the designated area, add an submit button and it works.

But all I get back are blank pages with no content. No error messages, nothing useful.

How is the darn thing meant to be used?

UPDATE: Adding a method="post" to the form did the trick, but apparently only substructure searches with unlimited result set are supported. This results in very poor performance with structures of low selectivity. Still not good


  1. I had an extra line in my molfile.
  2. There is really a bug in the API. Substructure search triggers exact search and vice versa. Since the API is heavily in use with workarounds to this bug, fixing this would break many applications.

Monday, July 5, 2010

The curious case of the infinite canvas

After some fruitless attempts to extend the usable canvas of DCE ChemPad with some kind of self-made view port, in order to allow drawing of larger-than-screen structures, I think I now found a more painless way.

Simply wrapping the DCE Control in a HorizontalScrollView was not enough, because the ScrollView intercepts too much MotionEvents, but after subclassing HorizontalScrollView and overriding onInterceptTouchEvent(), it seems to work as expected. The code needs some more polishing though to call it working though.

As you can see on the screenshot, the structure now can be scrolled left and right as needed to draw more than would fit on the screen. The thin line on the bottom separates the drawing from the scrolling sensitive area, because otherwise the whole screen would be scroll sensitive and drawing while scrolling is not very precise. The screenshot is from a NexusOne with Android 2.2.

Wednesday, June 2, 2010

DCE ChemPad Update 1.1

A few bugs have been fixed (sorry, no autolayout yet :-)), all controls have been moved into the menu and no titlebar anymore to save screen estate. The vibration feedback for fusing atoms works now as expected and you can send the molfile via the phone's messaging systems. Since it seems that attachment handling is broken not fully implemented on Android, the molfile is embedded as text in the message itself. A localized help function was added to the application.

Wednesday, May 26, 2010

DCE ChemPad 1.0 is out in the wild

DCE ChemPad is released for free on the Android Market!

It shows the capabilities of the DCE Chemistry Editor Control to add a chemical editor to arbitrary Android applications.

It was tested on the HTC Nexus One, ACER Liquid S 100, Motorola Milestone/Droid and the Emulator and should work with all Android versions >= 1.5. It does reportedly not work on the Motorola Cliq.

If you have acceess to the Market, please try it and tell me what you think...

Ah, screenshots:

Thursday, April 22, 2010

Solving the 'big finger vs. tiny screen' problem

  • Learn how to make custom View controls - check
  • Learn to make compound controls - check
  • Learn how to paint on the canvas - check
  • Design a effective 2D rendering pipeline for undirected graphs- check
  • Remember basic planar trigonometry - check
  • Design a fuzzy lock-on selection method for the touchscreen - check
  • Design a robust backing model - check
  • Design an effective UI for the touchscreen - check
  • Implement all the nasty details - check

Wednesday, April 14, 2010

NexusOne: Big finger vs. tiny screen

I now have a NexusOne at hand.

And while it is much faster than the Emulator, a very profane problem has come up. It is impossible to precisely draw a chemical structure with an editor designed for the mouse! It is just to sensitive to use it with a finger on a tiny screen. And a pen won't work with a capacitive touchscreen...

Maybe i have to take an intensive look at the package.

Thursday, April 1, 2010

JavaScript molecule editor roundup

Next to JsDraw which had it's 1.0 release recently, I've found two more pure JavaScript molecule editors:

WebCME which has a lot of features, notably a large library of molecules but is quite painful to use. This is because it's developers have chosen a system of 'select two atoms, add bond, deselect them, select another two atoms, add bond, oh wrong one, delete bond, add correct bond...' for drawing.

The ChemDoodle web components on the other hand have a totally minimalistic, yet powerful UI. Atoms are drawn by mouseclick, bonds drawn by mouse drag. Atom types can be changed via keyboard, bond types by mouseclick and delete is done by the Backspace or Delete key.

Unfortunately none of the three works on mobile browsers. Either they don't work at all or only by half.

In contrast, the jsMolEditor does work even in Android WebViews, but seems to be not under development anymore. I suspect that it was abandoned in favour of JsDraw.

What a pity. Having a (even simple) but working JavaScript molecule editor, that works in Smartphone WebViews would open a whole new world of applications for those devices. They now have the computational power to handle chemical data, but who wants to enter SMILES strings by hand...

pgchem::tigress 1.2 is out

Built and tested against PostgreSQL 8.4.2 with OpenBabel 2.2.3 on XP 32 bit, Windows 7 64 bit and Ubuntu 8.04 LTS 32 bit.

MACCS166 binary fingerprints.

Dice and Tversky similarity.

Small bug fixes.

Tuesday, March 16, 2010

How to run mx on Android

mx runs on Android!

To make it work, you need to build from sources and remove all references to javax.swing (which doesn't seem to break the rest of the code btw.), since Android does not contain AWT or Swing.

Then repackage the jar and it can be used in Android applications.

Chemistry on the smartphone, yay. :-)

Wednesday, March 10, 2010

Changing the public API between minor versions of PostgreSQL

I hate when they do this.

Between 8.3 and 8.4, the API for CREATE OPERATOR CLASS has changed. Now the RECHECK flag is obsolete, letting the index dynamically decide if it is lossy or not.

While this in itself is an improvement, it generates an incompatibility between GiST C code and scripts written for 8.3 and 8.4. Fortunately, the fix seems to be an easy one...

Wednesday, March 3, 2010

Wrapping native libraries with JNA: Dingo

Even with the advent of pure Java chemoinformatics toolkits like MX or the CDK, there is a lot of interesting native code floating around on the net. Unfortunately, wrapping native code with JNI is no real fun.

JNA comes to the rescue. It does all the neccessary loading and marshalling stuff dynamically in the background for you. All you need is a declaration of the native interface, the rest is magic.

Here's an incomplete but working example for Dingo 1.0:

package your_package_here;
import com.sun.jna.Native;
public class NativeDingoWrapper {

  static {

  public static native int dingoSetOutputFormat(String anOutputFormat);
  public static native int dingoSetColoring(int aColoringFlag);
  public static native int dingoSetHighlightColorEnabled(int aHighlightFlag);
  public static native int dingoSetHighlightThicknessEnabled(int aHighlightThicknessFlag);
  public static native int dingoSetStereoOldStyle(int aStereoFlag);
  public static native int dingoSetImageSize(int aWidth, int aHeight);
  public static native int dingoLoadMolFromString(String aMol);
  public static native int dingoLoadMolFromFile(String aFile);
  public static native int dingoSetOutputFile(String anOutputFile);
  public static native int dingoRender();
  public static native int dingoMoleculeIsEmpty();

And that's it.

The only drawback of JNA is that it needs a glue DLL specific to the operating system, so theoretically it is more platform limited than JNI.

But since "JNA has been built and tested on OSX (ppc, x86, x86_64), linux (x86, amd64), FreeBSD/OpenBSD (x86, amd64), Solaris (x86, amd64, sparc, sparcv9) and Windows (x86, amd64). It has also been built for windows/mobile and Linux/ppc64, although those platforms are not included in the distribution." this is a quite limited limitation for most cases.

I have successfully wrapped Dingo and Barsoi with JNA and up to now it just works as advertised.

Monday, February 15, 2010

Chemoinformatics in the browser: Fingerprint similarity calculation

Well there are other things that can be done in JavaScript beyond substructure search. For example, Tanimoto binary fingerprint similarity calculation needs just two short functions:

function popcount(b) {
var c, bi3b = 0xE994;
     c  = 3 & (bi3b >> ((b << 1) & 14));
     c += 3 & (bi3b >> ((b >> 2) & 14));
     c += 3 & (bi3b >> ((b >> 5) & 6));
return c;

function tanimoto(fp1, fp2) {
var a=0;
var b=0;
var c=0;

for (var i=fp1.length-1; i>=0; i--) {
    var block_fp1=fp1[i];
    var block_fp2=fp2[i];
    a += popcount(block_fp1);
    b += popcount(block_fp2);
    c += popcount(block_fp1 & block_fp2);
return c/(a+b-c);

The fingerprints have to be converted into JavaScript arrays of equal length containing signed numbers:

onclick="alert(tanimoto(new Array('1','-1073741825'),new Array('3','2147483647')));"


Friday, February 12, 2010

Chemoinformatics in the browser: Firefox catches up

>That's a big difference. Which version of Firefox? If 3.5, have you tried 3.6?

Yes, today. Chrome 4 is not faster than Chrome 3 but Firefox 3.6 now allows jobs of about 50 structures.

Browsermax. job size
Chrome 3.x100
Chrome 4.x100
Firefox 3.5.x25
Firefox 3.6.x50
IE 6.x5
IE 8.x10

Those batch sizes allow for script execution times of about 1 second. The idea behind this is, that this does not interfere with other scripts on a page if the job is running embedded, e.g. in an invisible iframe.

If the page is dedicated, much larger jobs might be possible up to the limit of the browser that triggers the 'A script is not responding' error message.

Update: IE 8.x is twice as fast as IE 6.x, but still slow compared to the competitiors.

Thursday, February 11, 2010

Chemoinformatics in the browser: Chrome finishes first

While developing my little demo in the previous article, I found that different browsers could handle different job sizes depending how fast their JavaScript engines are.

Chrome 3.x finishes first before Firefox 3.x and ye olde IE 6.x is almost unusably slow for substructure searching with JavaScript.

The possible job sizes are:

Browsermax. job size
Chrome 3.x100
Firefox 3.x25
IE 6.x5

Thus, the server sizes jobs according to the user-agent header sent:
if (uatype.find('Firefox/3') != -1):
timeout = 500
maxsize = 25
elif (uatype.find('Chrome/3') != -1):
timeout = 200
maxsize = 100
elif (uatype.find('MSIE') != -1):
maxsize = 5
timeout = 1000

While I knew that Chrome's JavaScript engine is fast, I didn't expect it to be that dominant.

Monday, February 8, 2010

Browsers of the world: Map! Reduce! Map! Reduce!

This article about the idea of collaborative map/reduce in the browser and this one on Depth-First gave me the idea to try something other than distributed word counting: distributed substructure matching.

The server was quickly written in Python, the backend in this case is Postgresql with a table holding the structures as V2000 molfiles in plain text format. No magic so far.

Here's the code of the server.

More interesting might be, how the substructure matching itself is done with 100% JavaScript. Thanks to JSDraw a pure JavaScript structure editor, which on closer inspection has some more interesting tricks up it's sleeve, notably a substructure matching capability, this is doable now.

The server schedules a job of maxsize random molecules from the database and constructs a page containing those molecules as molfiles. After the page has completely loaded in the browser, the matching is done and the page is posted back to the server which parses the result. Once manually started by opening http://:8080/get, the pages keep reloading automatically by means of a meta http-equiv="refresh" in the result page.

Of course, the server is very basic. It notably lacks keeping track of the results and housekeeping to restart broken jobs and uses a hardcoded substructure as search argument.

But it can be done.