Thursday, January 28, 2016

ML based prediction with PostgreSQL & PL/R in four rounds - III

The predict2() function initializes the SVM only on first call which improves performance significantly. But it still needs to build the model from scratch.

If the training takes a comparatively long time or the training data cannot be provided along with the code, this is a problem.

Round 3

R has the ability to serialize objects to disk and read them back with saveRDS() and readRDS().

library(e1071)
data <- seq(1,10)
classes <- c('b','b','b','b','a','a','a','a','b','b')
mysvm = svm (data, classes, type='C', kernel='radial', gamma=0.1, cost=10)
saveRDS(mysvm, "mysvm.rds")
view raw svm2.R hosted with ❤ by GitHub
Having saved the SVM object like that, we can restore it from disk instead of rebuilding it each time.

CREATE OR REPLACE FUNCTION r_predict3(inp integer)
RETURNS text AS
$BODY$
if (pg.state.firstpass)
{
library(e1071)
mysvm <- readRDS("mysvm.rds")
assign("pg.state.firstpass", FALSE, env=.GlobalEnv)
}
result <- predict(mysvm, inp)
return(as.character(result[1:1]))
$BODY$
LANGUAGE plr IMMUTABLE STRICT
COST 100;
view raw r_predict3.sql hosted with ❤ by GitHub
Let's run this statement three times again:

select s.*, r_predict3(s.*) from generate_series(1,1000) s;

484 ms for the first run. 302 ms for each of the following two. Average: 363 ms.

That's a 75% improvement compared to the original code.

Still, the first call is more expensive than the subsequent ones.

Can we do better?

No comments:

Post a Comment