The plate is bad: November 2010

Friday, November 26, 2010

pgchem::tigress sets new world record!

The IPB now has successfully loaded ~28 million structures from pubchem into a pgchem::tigress / PostgreSQL database.

This is the largest installation I know of and therefore sets a new world record - for pgchem. :-)

Friday, November 12, 2010

Development of pgchem::tigress 2.0 has started

I finally managed to get OpenBabel 2.3.0 and PostgreSQL 9.x to talk to each other on Windows. This only works when building with the MinGW Makefiles, not with MSYS, as PostgreSQL does not like mixing with the MSYS runtime libraries. And still a static build of libopenbabel and libinchi is required.

The overall good news is that the Makefile could be simplified a lot with help from Steffen Neumann from the Leibniz Institute of Plant Biochemistry and all current functions are still working with OB 2.3. The build process on Windows is still a bitch compared to building on Posix systems, though.

The roadmap for 2.0:

A lot of code needs cleanup, ~~especially the make_molecule() function, which is unreadable and error prone~~. Mr. Neumann has also identified various possible memory leaks that need to be fixed
~~Allow 2D coordinate generation for Molfile output~~
~~Using OBQuery for non-SMARTS substructure searching~~
~~Add Spectrophore(TM) output~~
~~Add support for exporting Andrew Dalke's chemfp formats (at least FPS)~~
~~Move SVG output from Dingo to OB, having one library less to care for~~
Add the fingerprint linear optimizer in a generally usable implementation
See if the new stereochemistry inside OB fixes various issues that were reported
~~Basic reaction support~~
Win64 builds ?

This will take some time, so don't hold your breath. Since there are already some bugs reported for OB 2.3.0 I guess I'll have to wait for those being fixed before a production release of pgchem::tigress 2.0 becomes available.

Tuesday, November 9, 2010

GCC6 Day Three (09.11)

So, I rediscovered my notes from day three. Unfortunately there are only two lectures I can really write something about:

Noel O'Boyles lecture about the in-silico design of polymers with optimal properties for organic solar cells. As far as I understood this, the predicted efficency for organic solar cells is about 13 %. Experimentally 6 % have been reached yet. By using OpenBabel together with cclib and Gaussian, synthetically accessible polymers with 11 % predicted efficiency could be designed.
Roger Sayle from NextMove showed his recent work on a chemistry aware spell checker, based on the observation, that the current problems with chemical text mining do not come from poor OCR or poor name2structure conversion, but mainly from bad input because of typos etc. in the source texts and common spell checking software cannot cope with this.

Monday, November 8, 2010

GCC6 Day Two (08.11)

So, I rediscovered my notes from day two. Unfortunately there are only two lectures I can really write something about:

The WizePairZ algorithm was quite interesting. It aims to solve the problem that often unwanted properties of drug candidates scale in correlation with wanted properties. The idea behind WizePairZ is to automatically find transformations from one moelcule to another that reduce unwanted properties (ideally while improving wanted properties). Then such transformations are applied to molecules which are close to the boundary between 'unwanted' and 'wanted' space in order to push them over the boundary.
The in-silico prediction of Phototocicity aims to find molecular descriptors and associated models that allow to predict the potential phototoxicity of substances, especially their UV absorption between 290 - 450 nm wavelength where the human skin doesn't filter anymore and the ozone layer doesn't filter yet.

GCC6 Day One (07.11)

Arrived 14:00 in Goslar

What's new in Knime presentation:
Mainly usability improvements, like conditional paths through workflows and annotations. This is really getting somewhere. Release planned for December 6th 2010

My presentation:
I was not boohed from the stage

MOSGrid presentation:
It's a molecular simulation grid driven by a consortium of academia and industry, lead by the University of Cologne

The planned usage is, that you can specify a task through a web interface in MSML (molecular simulation markup laguage), which describes a task for the grid, then the MSML is translated into input specific to a program (e.g. Gaussian).
The program is then run on the grid and the program specific output is translated again into MSML and returned to the user.

Currently

Gaussian
Turbomole

are supported.

Tasks can be chained into workflows. Lots of expertise and manpower, and if buerocracy doesn't step in the way, this could work and is worth a look. Unfortunately, the licensing of the used programs in a grid seems to be an unsolved legal problem.

Demonstration of MOSGrid:
A basic web interface (to Gaussian I guess) was shown and how to run quantum calculations on the grid from there. The interface is designed for humans. Currently there are no plans to offer this as a service to machines. Licensinsg issues of using commercial software in a grid must be ironed out to make it widely available, but beta testing will start in December to a limited number of users.

Friday, November 5, 2010

GCC6 in Goslar

I'll be at the 6th German Conference on Chemoinformatics in Goslar from November 7-9.

Thursday, November 4, 2010

The SMARTSViewer

This is a nifty service for visualizing SMARTS patterns from the University of Hamburg:

SMARTSViewer

Probably useful when your SMARTS pattern does not match what you think it should...