The nseq datatype allows to store DNA and RNA sequences consisting of the letters AGCT or AGCU respectively in PostgreSQL.
By encoding four bases per Byte, it uses 75 percent less space on disk than text. While this idea is obvious and far from being novel, it still is one of the most efficient compression schemes for DNA/RNA, if not the most efficient.
As of now, nseq only supports very basic native operations. The main shortcoming is, that it has to be decompressed into text hence expanding by 4x, for e. g. substring operations and the like.
This will change.
Enough said - here it is...
Check out PostBIS as well, it already has much more features.
Thursday, June 25, 2015
Wednesday, June 3, 2015
Update to pgchem::tigress isotope pattern generation code
The isotope_pattern() function now contains data for the stable isotopes of 82 elements.
Thus, it fully supports HMDB, UNPD and ChEBI (except the transuranics) and is available here.
The individually affected files are obwrapper.cpp and libmercury++.h, in case you want to update your installation in-place.
Thus, it fully supports HMDB, UNPD and ChEBI (except the transuranics) and is available here.
The individually affected files are obwrapper.cpp and libmercury++.h, in case you want to update your installation in-place.
Subscribe to:
Posts (Atom)