Wednesday, February 17, 2016

The Devil is in the details: Returning bytea from PL/R

If you return an R object from PL/R as bytea, it is passed through the R serialization interface.

This is mentioned in the fine manual, but not really prominently so.

I assume the rationale behind this is, that you can use R objects returned from one PL/R function directly as input to some other PL/R function.

But if you export an image or something else binary for use outside PL/R, you must get rid of the serialization first.

And there is a utility function for this in PL/R already:

SELECT plr_get_raw(some_plr_function_returning_bytea());

Otherwise the binary output will not be what you expect...

Friday, February 12, 2016

Fun with a integrarelational DBMS: SoilGrids

PostgreSQL has many ways to extend it's capabilities in well defined ways: Custom datatypes, custom functions, custom operators, even custom indexes.

And then there's the Foreign Data Wrapper, an API to pull (almost) any kind of data into PostgreSQL and treat it (almost) like a native table. There is already an impressive list of them, but sometimes you have to write your own one. Like yesterday, when I wanted to integrate pH data from the SoilGrids1km project into an already existing PostgreSQL/PostGIS system.

The data from SoilGrids is freely available, so I just could have downloaded it and put it into PostGIS. But the data set is a) huge and b) constantly updated and c) they have a REST API, so why not tap into it directly?

Maybe because the native language of the Foreign Data Wrapper is C and that's not exactly well suited for a fast prototype. :-) But then there is Multicorn, a bridge between the FDW API and Python, so I gave it a try...

After two hours, including the installation of Multicorn itself, I had this up and running:

CREATE SERVER soilgrids_srv
   FOREIGN DATA WRAPPER multicorn
  OPTIONS (wrapper 'soilgrids_fdw.SoilGridsForeignDataWrapper');

CREATE FOREIGN TABLE public.soilgrids1km
   (latitude real ,
    longitude real ,
    ph_l real ,
    ph_m real ,
    ph_u real ,
    depth real ,
    publication_date date )
   SERVER soilgrids_srv;

select * from public.soilgrids1km where latitude = '51.57' and longitude = '5.39'

latitude longitude ph_l ph_m ph_u depth  publication_date
51.57    5.39      4.2  5.8  7.4  -0.025 2014-04-02 

I bet a more pythonic person than me could write something like this in under one hour or so.

And that level of extensibility is one of the reasons why I think that PostgreSQL is one awesome DBMS. Integrarelational even...