Friday, December 11, 2015

The MongoDB BI analytics connector ... revisited

Woah, comments! :-)

Instead of individually answering them, I'll try to write a follow up.

In the meantime I've read the MongoDB BI connector setup guide. First it doesn't mention that Python is required, so the real thing could be written in C. Makes some sense to develop the concept with Python / Multicorn first and if it works go to C for performance. RDKit is developed that way.

Second I assumed that it would pull in data from MongoDB into PostgreSQL for performance reasons. Apparently it doesn't. From the look of it, PostgreSQL is just the runtime for the FDW, routing queries directly to MongoDB.

This attenuates my initial concern a bit, that from a managerial view, a setup where you have to administer MongoDB and PostgreSQL and develop against MongoDB and PostgreSQL will soon raise questions if one database wouldn't do.

And with Tableau, explicitly mentioned in the installation document, the winner wouldn't be MongoDB.

I've worked with Tableau and it is totally geared towards the relational model. To makes things worse for anything not SQL, it relies on ODBC, which even more limits the queries and data types it understands.
If accessing PostgreSQL's advanced features from Tableau is difficult, direct MongoDB would be hard. So this happened instead...

Still, if there is no business reason why it must be MongoDB and you can't go with PostgreSQL jsonb for example, justifying to continue with MongoDB once the users have licked NewSQL (aaah, those buzzwords) blood could be difficult, especially with the BI folks who just love having a rich analysis toolbox right in the database. So it's a slippery route for MongoDB as a company, to introduce their users to a viable competitor product.

And for PostgreSQL not being a distributed database: It isn't out of the box, but it can be. If and what will work for you, as always, depends on the use case:

PostgreSQL FDW

Pick your poison...

(PostgreSQL even can do map/reduce, if you want it to. Mind's the limit.)


  1. I think pgpool-2 is removing the distributed/parallel stuff:

    1. You're right. The parallel query execution is marked as deprecated and will be removed, but load balancing is not. Thank you for the info.