Wednesday, November 30, 2011

What's up, doc?


OK, I have now a working (i.e. not crashing & giving meaningful results) hybrid version of pgchem::tigress with OpenBabel and Indigo, where the OpenBabel functions in the search engine have been replaced by Indigo functionality.

Why?

  • To see if it can be done :-)
  • To learn more about programming with Indigo
  • To see which is faster
  • Indigo has better reaction support as it seems

Interestingly I could keep the external interface almost stable. As only a dozen or so functions outside the search engine are affected by the change, I think with some #IFDEFs I can make a common codebase that compiles either to 100% OpenBabel or the hybrid version.

Monday, November 14, 2011

R-Matrix implementation with Indigo

Long time no see.

I have been trying programming with Indigo recently and here is my first, straightforward (to avoid the term 'naive') implementation of an Ugi like R-Matrix (Gasteiger/Engel, Chemoinformatics, 2003, Wiley-VCH, p. 185ff.) for adding make/break/change bond detection to its reaction mapper:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "indigo.h"

static inline int mp(const int column, const int row, const int maxRows)
{
    return column-1+((row-1)*maxRows);
}

int main(int argc, char *argv[])
{

    int index, reaction, reactant, product, atom, bond, atomIter, reactantIter, productIter, neighborIter, totalAtoms, mappingNumber, targetMappingNumber, bondMakeBreak;
    int *bMatrix, *eMatrix;//, *rMatrix;

    totalAtoms=0;

    qword session = indigoAllocSessionId();

    indigoSetSessionId(session);

    printf("%s\n",indigoVersion());

    reaction = indigoLoadReactionFromFile("c:/temp/testrxn.rxn");

    //indigoFoldHydrogens(handle);

    indigoDearomatize(reaction);

    //printf("1:%s\n",indigoGetLastError());

    indigoAutomap(reaction,"discard");

    reactantIter = indigoIterateReactants(reaction);

    while((reactant = indigoNext(reactantIter)))
    {
        totalAtoms+=indigoCountAtoms(reactant);
    }

    indigoFree(reactantIter);

    bMatrix = (int*) calloc(totalAtoms*totalAtoms,sizeof(int));
    eMatrix = (int*) calloc(totalAtoms*totalAtoms,sizeof(int));
    //rMatrix = (int*) calloc(totalAtoms*totalAtoms,sizeof(int));

    reactantIter = indigoIterateReactants(reaction);

    while((reactant = indigoNext(reactantIter)))
    {
        atomIter = indigoIterateAtoms(reactant);
        while((atom = indigoNext(atomIter)))
        {
            mappingNumber = indigoGetAtomMappingNumber(reaction,atom);
            //printf("%i\n",mappingNumber);
            //indigoGetExplicitValence(atom, &valence);
            //printf("%i\n",valence);
            //bMatrix[mp(mappingNumber, mappingNumber, totalAtoms)]=valence;

            neighborIter = indigoIterateNeighbors(atom);

            while((atom = indigoNext(neighborIter)))
            {
                targetMappingNumber = indigoGetAtomMappingNumber(reaction,atom);

                index = mp(mappingNumber, targetMappingNumber, totalAtoms);

                if(bMatrix[index]==0)
                {
                    bond=indigoBond(atom);
                    bMatrix[index]=indigoBondOrder(bond);
                }
            }

            //printf("\n");

            indigoFree(neighborIter);
        }
        indigoFree(atomIter);
    }

    indigoFree(reactantIter);

    productIter = indigoIterateProducts(reaction);

    while((product = indigoNext(productIter)))
    {
        atomIter = indigoIterateAtoms(product);
        while((atom = indigoNext(atomIter)))
        {
            mappingNumber = indigoGetAtomMappingNumber(reaction,atom);
            //printf("%i\n",mappingNumber);
            //indigoGetExplicitValence(atom, &valence);
            //printf("%i\n",valence);
            //eMatrix[mp(mappingNumber, mappingNumber, totalAtoms)]=valence;

            neighborIter = indigoIterateNeighbors(atom);

            while((atom = indigoNext(neighborIter)))
            {
                targetMappingNumber = indigoGetAtomMappingNumber(reaction,atom);

                index = mp(mappingNumber, targetMappingNumber, totalAtoms);

                //printf("%i",indigoBondOrder(bond));
                if(eMatrix[index]==0)
                {
                    bond=indigoBond(atom);
                    eMatrix[index]=indigoBondOrder(bond);
                }
            }

            //printf("\n");

            indigoFree(neighborIter);
        }
        indigoFree(atomIter);
    }

    indigoFree(productIter);

    for(int i=0; i<(totalAtoms*totalAtoms); i++)
    {
        bondMakeBreak=eMatrix[i]-bMatrix[i];

        //rMatrix[i]=bondMakeBreak;

        if(bondMakeBreak>0)
        {
            if(bMatrix[i]>0)
                printf("change: %i to %i\n",(i / totalAtoms)+1,(i % totalAtoms)+1);
            else
                printf("make: %i to %i\n",(i / totalAtoms)+1,(i % totalAtoms)+1);
        }
        else if(bondMakeBreak<0)
        {
            if(bMatrix[i]>1)
                printf("change: %i to %i\n",(i / totalAtoms)+1,(i % totalAtoms)+1);
            else
                printf("break: %i to %i\n",(i / totalAtoms)+1,(i % totalAtoms)+1);
        }
    }

    //printf("2:%s\n",indigoGetLastError());

    indigoSaveRxnfileToFile(reaction,"c:/temp/testrxn_mapped.rxn");

    //printf("3:%s\n",indigoGetLastError());

    indigoFree(reaction);
    indigoReleaseSessionId(session);
    free(bMatrix);
    free(eMatrix);
    //free(rMatrix);

    return 0;
}
At the moment it has the following shortcomings:
  1. 50% too much memory consumption, since only half of the R-Matrix would suffice
  2. Each bond is found twice (Atom1 to Atom2 and Atom2 to Atom1)
  3. The make/break/change info is not added to the mapped reaction file

Have fun!