I tried to teach PostgreSQL a new trick lately with the help of PL/R and here are the results...
Let's start with a very simple example of supervised machine learning, using a Support vector machine (SVM) from the e1071 package that I stole from here.
When you run this, the output is something like this
Which in this case is not surprising, because we cross validated with the very same data used for training. Usually you don't do this but split the data into a training and a validation set, but for the sake of brevity this model will do.
The model is now ready to predict classes from numerical input, like so:
And this is already all we need for a naive implementation of a SVM based predictor function in PostgreSQL.
Whoa, it lives! :-) But what about performance? Let's run this statement three times:
select s.*, r_predict1(s.*) from generate_series(1,1000) s;
1.4 seconds for each run. Average: 1.4 s.
That's not exactly stellar.
Can we do better?