Wednesday, December 9, 2015

The MongoDB BI analytics connector ... PostgreSQL FDW

Now this gets interesting. Apparently MongoDB will get a BI connector - which seems to be a Multicorn foreign data wrapper for PostgreSQL!

While the step is logical in some way given MongoDBs limited built-in analytical capabilities vs. what PostgreSQL can do by declaration, e.g. CTEs, window functions or TABLESAMPLE, this also could backfire badly. Well, I'm almost convinced it will backfire.

PostgreSQL already has 'NoSQL' capabilities like native JSON and HSTORE k/v, there is ToroDB, emulating a wire protocol compatible MongoDB on top of PostgreSQL. There is already work on Views (Slide 34) in ToroDB, which will enable Users to query documents stored in ToroDB not only with the MongoDB query language but also with SQL, thus seamlessly integrating ToroDB document data with plain PostgreSQL relational data.

Then, there is no reason to use MongoDB at all, except maybe data ingestion speed. Data ingestion in ToroDB is way slower than with a 'real' MongoDB, but this is being worked on.

And from my experience in a current project, with a bit of anticipatory thinking, PostgreSQL data ingestion speed can at least challenge MongoDB, with security, integrity, transactions and all - on a server with 1/4 the CPU cores than the Mongo-Server has.

So, the wolf and the lamb will feed together, and the lion will eat straw like the ox... - there are truly interesting times ahead. :-)


  1. PostgreSQL is not a distibuted DB.. This makes a lot of difference

    1. But ToroDB, which supports MongoDB replication protocol, is :)

    2. Ernst, you are completely right. There are a lot of news coming to the soon-to-be-released next version of ToroDB. A lot of work is being put on this version, including replication, views, configuration, performance improvements and other backends, more specialized in analytics and DW.

      Thank you for your mentions!

  2. That's all about a single node instance, of course, why use MongoDB over PostgreSQL. The main reason for NoSQL solutions is for having a scalable active-active distributed system, that give away some of the consistency that an RDBMS cannot.

  3. They said this github project is a proof: