The plate is bad: Timeseries: How long can the elephant remember?

Thursday, August 25, 2016

Timeseries: How long can the elephant remember?

Frankly, I don't know where the practical limit for the number of rows in a single PostgreSQL table is from experience, but the interwebs seems to agree on 10^9 for narrow tables.

After a lively discussion with a NoSQL afficionado yesterday about the (in)ability to effectively store timeseries data in a RDBMS I made a quick calculation.

Timeseries data is usually a triple of the form key timestamp value, so it can be stored in a pretty narrow table, hence I stick to the 10^9 rows limit.

If we get a data point every second, we can store 10^9 seconds worth of data. 10^9 seconds is 16666666.6667 minutes, which is 277777.777778 hours, which is 11574.0740741 days, which is good for about 31 years of recording.

Every second of 31 years. Per table.

12 comments:

vdpAugust 25, 2016 at 6:51 PM
Most people storing time series would laugh at the idea of storing only one data point per second. A typical usecase is performance data, were a single app on a single node can generate hundreds of points each minute without being seen as wasteful. On the other hand, you probably don't care about going back more than 1 year with 1 second granularity; you can either delete or aggregate old data.

Postgres might still be a good tool for that, but you'll have to use many tables, probably with inheritance.
ReplyDelete
Replies
AnonymousAugust 26, 2016 at 5:20 AM
True....
But, I believe if use PostgreSQL in conjunction with the Pg_pathman extension, you essentially get unlimited time-series storage while maintaining good performance.
ReplyDelete
Replies
WiddershinsAugust 26, 2016 at 7:35 AM
Postgres can hold far more than 2^31 rows in a single table, and not just impractically. Currently the only major limitation on the number of rows a table can hold is that the table itself cannot take up more than 32TB of hard drive space; even with rows 128 bytes wide, that's rather more than 100x your estimate.
ReplyDelete
Replies
jerdavisAugust 28, 2016 at 6:58 PM
I typically denormalize and use an Array datatype to store a range of timeseries per row.
Add a view to get the classic row per sample data back out. A stored procedure or two to handle updates/inserts.

ReplyDelete
Replies
ColinSeptember 1, 2016 at 7:20 PM
in a tick database use case, there can easily be hundreds of events per millisecond. partitioning can address this use case at least partially.
ReplyDelete
Replies

Add comment

Thursday, August 25, 2016

Timeseries: How long can the elephant remember?

12 comments:

Blog Archive

Blog Shortlist