Thursday, August 25, 2016

Timeseries: How long can the elephant remember?

Frankly, I don't know where the practical limit for the number of rows in a single PostgreSQL table is from experience, but the interwebs seems to agree on 10^9 for narrow tables.

After a lively discussion with a NoSQL afficionado yesterday about the (in)ability to effectively store timeseries data in a RDBMS I made a quick calculation.

Timeseries data is usually a triple of the form key timestamp value, so it can be stored in a pretty narrow table, hence I stick to the 10^9 rows limit.

If we get a data point every second, we can store 10^9 seconds worth of data. 10^9 seconds is 16666666.6667 minutes, which is 277777.777778 hours, which is 11574.0740741 days, which is good for about 31 years of recording.

Every second of 31 years. Per table.

Thursday, August 18, 2016

Hexastores are easy

Did you know that you can make a Hexastore from a RDF triple in just one line of SQL? (This needs PostgreSQL 9.4 or better, because of the multi-array unnest)

CREATE OR REPLACE FUNCTION hexify(
    IN sub text,
    IN pred text,
    IN obj text)
  RETURNS TABLE(ord text, a text, b text, c text) AS
$$select A.t || B.t || C.t as ord, A.v, B.v, C.v from (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as A(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as B(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as C(v, t) where a.v != b.v and a.v != c.v and b.v != c.v order by ord desc$$
  LANGUAGE sql IMMUTABLE STRICT
  COST 100
  ROWS 6;

SELECT * FROM hexify('subject','predicate','object');


Sometimes, PostgreSQL SQL is just awesome...

More on Hexastores here and here.