Tuesday, December 29, 2015

More FORTRAN 90 like vector operations in PostgreSQL

It is also possible to do element wise operations of a vector and a scalar:

CREATE OR REPLACE FUNCTION public._vectormuls(
    v1 anyarray,
    v2 anyelement)
  RETURNS anyarray AS
$BODY$
DECLARE
BEGIN
if array_ndims(v1) != 1 Then
RAISE EXCEPTION 'Only vectors supported';
end if;

return array(select v.a * v.b from (select unnest(v1) as a, v2 as b) v);
END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OPERATOR public.*(
  PROCEDURE = _vectormuls,
  LEFTARG = anyarray,
  RIGHTARG = anyelement);

The runtime of the vector/scalar function is about 60% of the vector/vector variant, so the effort for unnest() clearly dominates the runtime of such functions.

And since the dot product of two vectors is the sum of the partial products, it is as simple as this:

CREATE OR REPLACE FUNCTION public._vectordotprod(
    v1 anyarray,
    v2 anyarray)
  RETURNS anyelement AS
$BODY$
DECLARE
BEGIN
if array_ndims(v1) != 1 OR array_ndims(v2) != 1 Then
RAISE EXCEPTION 'Only vectors supported';
end if;

if array_length(v1,1) != array_length(v2,1)  Then
RAISE EXCEPTION 'Vector size mismatch';
end if;

-- return SUM v.a + v.b from (select unnest(v1) as a, unnest(v2) as b) v; --9.4
-- return SUM(v.a * v.b) from (select unnest(v1) as a, unnest(v2) as b) v; --9.4
return SUM(a*b) from unnest(v1,v2) v (a,b); -- 9.5
END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OPERATOR public.+*(
  PROCEDURE = _vectordotprod,
  LEFTARG = anyarray,
  RIGHTARG = anyarray,
  COMMUTATOR = +*);

The operator name should be . or *+, but since PostgreSQL won't allow that, +* was the next best thing that worked...

BTW: I tried the same with sql instead of plpgsql as function language and the timings are the same. So the additional sanity checks and the language overhead effectively costs nothing compared to the unrolling of the vector(s).

4 comments:

  1. Did you by chance try that with plpythonu or plr? I kinda doubt they'd be faster, but maybe.

    If you care about the performance it wouldn't be very difficult to write a C function to do this. That would avoid a ton of marshaling between Datums and tuples, and for any pass by value types you could also palloc the entire result array in one shot. Wouldn't surprise me if those things were good for another 10-20%.

    ReplyDelete
  2. Oh, the other advantage to C would be it should make it easy to support >1 dimension.

    Also, make sure to take a look at http://www.postgresql.org/docs/9.5/static/intarray.html if you haven't. It would probably be pretty easy to add bigint/float/double support to that. Numeric would depend on how it was coded...

    ReplyDelete
  3. The version of the line for postgres 9.4 is wrong -- it should be:

    return SUM(v.a * v.b) from (select unnest(v1) as a, unnest(v2) as b) v; --9.4

    ReplyDelete
    Replies
    1. Thank you. Damn copy / paste. :-) Corrected.

      Delete