Q: So, what is TABLESAMPLE actually good for?
A: To speed things up that can be answered from a small sample instead of the whole population!
Like so:
CREATE TABLE public.ts_test
(
id integer NOT NULL DEFAULT nextval('ts_test_id_seq'::regclass),
val double precision,
CONSTRAINT ts_test_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
Insert 100 million random values in the range 0.0 <= x < 1.0:
INSERT INTO public.ts_test (val)
SELECT
random()
FROM
generate_series(1, 100000000);
And now, some queries to get the standard deviation.
For the whole population:
SELECT round(stddev_pop(val)::numeric,5) FROM ts_test;
0.28869 in 15005 ms
Error: 0%
Speedup: 1x
For a sample of size 20000 (good for 1% error and 99% confidence):
SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE BERNOULLI(0.02);
0.28867 in 2387 ms
Error: < 1%
Speedup: 6.3x
SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE SYSTEM(0.02);
0.28791 in 25 ms
Error: < 1%
Speedup: 600x
With a table that large, both methods BERNOULLI and SYSTEM are accurate enough to give results well below the 1% error target.
(For potential problems with SYSTEM, see this post)
SYSTEM aces BERNOULLI with a speedup of 600x vs. 6.3x relative to a full population query.
And this is hard to beat...
Addendum: TABLESAMPLE only works on Tables and materialized Views!
No comments:
Post a Comment