I tried to teach PostgreSQL a new trick lately with the help of PL/R and here are the results...
Round 1:
Let's start with a very simple example of supervised machine learning, using a Support vector machine (SVM) from the e1071 package that I stole from here.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(e1071) | |
data <- seq(1,10) | |
classes <- c('b','b','b','b','a','a','a','a','b','b') | |
mysvm = svm (data, classes, type='C', kernel='radial', gamma=0.1, cost=10) | |
pred = predict (mysvm, data) | |
print(table(pred, classes)) |
pred | classes | |
---|---|---|
a | b | |
a | 4 | 0 |
b | 0 | 6 |
Which in this case is not surprising, because we cross validated with the very same data used for training. Usually you don't do this but split the data into a training and a validation set, but for the sake of brevity this model will do.
The model is now ready to predict classes from numerical input, like so:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
x <- predict(mysvm,c(2)) | |
print(as.character(x[1:1])) | |
[1] "b" | |
x <- predict(mysvm,c(7)) | |
print(as.character(x[1:1])) | |
[1] "a" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE OR REPLACE FUNCTION r_predict1(inp integer) | |
RETURNS text AS | |
$BODY$ | |
library(e1071) | |
data <- seq(1,10) | |
classes <- c('b','b','b','b','a','a','a','a','b','b') | |
mysvm = svm (data, classes, type='C', kernel='radial', gamma=0.1, cost=10) | |
result <- predict(mysvm, inp) | |
return(as.character(result[1:1])) | |
$BODY$ | |
LANGUAGE plr IMMUTABLE STRICT | |
COST 100; |
select r_predict1(7);
a
select r_predict1(2);
b
Whoa, it lives! :-) But what about performance? Let's run this statement three times:
select s.*, r_predict1(s.*) from generate_series(1,1000) s;
1.4 seconds for each run. Average: 1.4 s.
That's not exactly stellar.
Can we do better?
If declaring library, classes ... is expensive, I would suggest to pass all values at once calling and use a loop inside function.
ReplyDeleteThat's a winner. See my last post.
Delete