So, for the test I changed the location to San Francisco and after about one hour the table tstest.tstream contains 3705 tweets.
select tag, count(*) as tag_count from tstest.tagstream group
by tag order by tag_count desc limit 10;
produces:
"tag";"tag_count"
"MTVHottest";376
"Hiring";210
"CareerArc";172
"Job";158
"job";157
"Jobs";145
"SanFrancisco";79
"hiring";69
"Sales";40
"SanJose";39
So there is something going on on MTV in the Bay Area. :-)
In a production system, I probably would write a background worker for stream injection instead of an external job.
And following discussions on Database Soup, it could be indeed interesting to configure the stream server for maximum throughput sacrificing durability for speed and then pull any data that should be preserved over a postgres_fdw link into another server with a more conservative setup.
BTW: If you plan to keep way more data in the ringbuffer to dig further into history occasionally, this would be a perfect case for using a long-tailed table.
No comments:
Post a Comment