Wednesday, September 25, 2019

cloudfs_fdw

Since I needed a Foreign Data Wrapper for files stored on S3, and the ones I found did things like loading the whole file in memory before sending the first rows, I wrote my own, using Multicorn.

Along the way, I discovered libraries like smart-open and ijson that allow to stream various file formats from various filesystems - and so this escalated a bit, into cloudfs_fdw.

It currently supports CSV and JSON files from S3, HTTP/HTTPS sources and local or network  filesystems but since smart-open supports more than that (e.g. HDFS, SSH), it certainly can be extended if needed.

For now, have fun.