Tuesday, February 3, 2015

count(*) is faster than count(1)

Today I've discovered incidentally, that count(*) is slightly faster than count(1) on PostgreSQL 9.3.5:

create table  hmr(id serial, value real);

insert into hmr (value) select random()*10000000 from generate_series(1,10000000);

select count(1) from hmr; --731 msec


select count(*) from hmr; --661 msec

Used repeatedly, e.g. in a function, this can really add up and make a difference.
In the spectral contrast angle CTE from the previous post, it reduces runtime by two seconds or 24%.

I would have expected the constant to be faster than actually reading the row...

10 comments:

  1. Have you tried executing the queries in the reverse order? I think this is related to the cache as data may be cached after first query. I run the queries on the reverse order on (9.4) and i get the count(1) faster (because is the second one).

    Have you tried in reverse order?

    ReplyDelete
    Replies
    1. Yes, reverse order, different connections and multiple times. The result stays the same with 9.3.5.

      Delete
  2. How about selecting count(id) instead? Normally that's supposed to be faster. Also, if there's an index on your search condition it's better to select the count of that column so it can use an index-only scan.

    ReplyDelete
    Replies
    1. According to https://wiki.postgresql.org/wiki/Slow_Counting and https://wiki.postgresql.org/wiki/Index-only_scans, index-only scans are restricted to certain conditions. It might happen - or not.

      Delete
  3. You see a diff 0.1 sec on 10M rows. In your example are not any IO operation, any syscall. So it is very artificial. On real world there are not significant difference between these two cases.

    ReplyDelete
    Replies
    1. This was no criticism, just an observation. :-) Sometimes small things add up to significant amounts if you have to do them repeatedly...

      Delete
  4. count(*) is really count(). The requirement to type the * is just a historical artifact.

    => \da+ count
    List of aggregate functions
    Schema | Name | Result data type | Argument data types | Description
    ------------+-------+------------------+---------------------+-----------------------------------------------------------------
    pg_catalog | count | bigint | "any" | number of input rows for which the input expression is not null
    pg_catalog | count | bigint | * | number of input rows
    (2 rows)

    So count(1) is explicitly passing the 1 in and checking it for being NULL before incrementing a counter while count(*) is simply incrementing the counter.

    ReplyDelete
    Replies
    1. You are correct, and technically the function strictness check checks if 1 is NULL, not the function itself.

      Delete
  5. I would imagine that the (*) case could be optimized since it doesn't require that the tuple data be read and it is a known special-case whereas count(1) is simply a convention.

    ReplyDelete
  6. Can you post the `explain analyze` output of both queries? I'm pretty sure this is normal "distribution" caused by other stuff going on in the system

    ReplyDelete