Monday, February 15, 2010

Chemoinformatics in the browser: Fingerprint similarity calculation

Well there are other things that can be done in JavaScript beyond substructure search. For example, Tanimoto binary fingerprint similarity calculation needs just two short functions:

function popcount(b) {
var c, bi3b = 0xE994;
     c  = 3 & (bi3b >> ((b << 1) & 14));
     c += 3 & (bi3b >> ((b >> 2) & 14));
     c += 3 & (bi3b >> ((b >> 5) & 6));
return c;
}

function tanimoto(fp1, fp2) {
var a=0;
var b=0;
var c=0;

for (var i=fp1.length-1; i>=0; i--) {
    var block_fp1=fp1[i];
    var block_fp2=fp2[i];
    a += popcount(block_fp1);
    b += popcount(block_fp2);
    c += popcount(block_fp1 & block_fp2);
}
return c/(a+b-c);
}

The fingerprints have to be converted into JavaScript arrays of equal length containing signed numbers:

onclick="alert(tanimoto(new Array('1','-1073741825'),new Array('3','2147483647')));"

0.9

Friday, February 12, 2010

Chemoinformatics in the browser: Firefox catches up

>That's a big difference. Which version of Firefox? If 3.5, have you tried 3.6?

Yes, today. Chrome 4 is not faster than Chrome 3 but Firefox 3.6 now allows jobs of about 50 structures.

Browsermax. job size
Chrome 3.x100
Chrome 4.x100
Firefox 3.5.x25
Firefox 3.6.x50
IE 6.x5
IE 8.x10

Those batch sizes allow for script execution times of about 1 second. The idea behind this is, that this does not interfere with other scripts on a page if the job is running embedded, e.g. in an invisible iframe.

If the page is dedicated, much larger jobs might be possible up to the limit of the browser that triggers the 'A script is not responding' error message.

Update: IE 8.x is twice as fast as IE 6.x, but still slow compared to the competitiors.

Thursday, February 11, 2010

Chemoinformatics in the browser: Chrome finishes first

While developing my little demo in the previous article, I found that different browsers could handle different job sizes depending how fast their JavaScript engines are.

Chrome 3.x finishes first before Firefox 3.x and ye olde IE 6.x is almost unusably slow for substructure searching with JavaScript.

The possible job sizes are:

Browsermax. job size
Chrome 3.x100
Firefox 3.x25
IE 6.x5

Thus, the server sizes jobs according to the user-agent header sent:
if (uatype.find('Firefox/3') != -1):
timeout = 500
maxsize = 25
elif (uatype.find('Chrome/3') != -1):
timeout = 200
maxsize = 100
elif (uatype.find('MSIE') != -1):
maxsize = 5
timeout = 1000
else:
return

While I knew that Chrome's JavaScript engine is fast, I didn't expect it to be that dominant.

Monday, February 8, 2010

Browsers of the world: Map! Reduce! Map! Reduce!

This article about the idea of collaborative map/reduce in the browser and this one on Depth-First gave me the idea to try something other than distributed word counting: distributed substructure matching.

The server was quickly written in Python, the backend in this case is Postgresql with a table holding the structures as V2000 molfiles in plain text format. No magic so far.

Here's the code of the server.

More interesting might be, how the substructure matching itself is done with 100% JavaScript. Thanks to JSDraw a pure JavaScript structure editor, which on closer inspection has some more interesting tricks up it's sleeve, notably a substructure matching capability, this is doable now.

The server schedules a job of maxsize random molecules from the database and constructs a page containing those molecules as molfiles. After the page has completely loaded in the browser, the matching is done and the page is posted back to the server which parses the result. Once manually started by opening http://:8080/get, the pages keep reloading automatically by means of a meta http-equiv="refresh" in the result page.

Of course, the server is very basic. It notably lacks keeping track of the results and housekeeping to restart broken jobs and uses a hardcoded substructure as search argument.

But it can be done.