I tried to teach PostgreSQL a new trick lately with the help of PL/R and here are the results...
Round 1:
Let's start with a very simple example of supervised machine learning, using a Support vector machine (SVM) from the e1071 package that I stole from here.
When you run this, the output is something like this
pred | classes | |
---|---|---|
a | b | |
a | 4 | 0 |
b | 0 | 6 |
Which in this case is not surprising, because we cross validated with the very same data used for training. Usually you don't do this but split the data into a training and a validation set, but for the sake of brevity this model will do.
The model is now ready to predict classes from numerical input, like so:
And this is already all we need for a naive implementation of a SVM based predictor function in PostgreSQL.
Let's try...
select r_predict1(7);
a
select r_predict1(2);
b
Whoa, it lives! :-) But what about performance? Let's run this statement three times:
select s.*, r_predict1(s.*) from generate_series(1,1000) s;
1.4 seconds for each run. Average: 1.4 s.
That's not exactly stellar.
Can we do better?
If declaring library, classes ... is expensive, I would suggest to pass all values at once calling and use a loop inside function.
ReplyDeleteThat's a winner. See my last post.
Delete