The nseq datatype allows to store DNA and RNA sequences consisting of the letters AGCT or AGCU respectively in PostgreSQL.
By encoding four bases per Byte, it uses 75 percent less space on disk than text. While this idea is obvious and far from being novel, it still is one of the most efficient compression schemes for DNA/RNA, if not the most efficient.
As of now, nseq only supports very basic native operations. The main shortcoming is, that it has to be decompressed into text hence expanding by 4x, for e. g. substring operations and the like.
This will change.
Enough said - here it is...
Check out PostBIS as well, it already has much more features.