Matvey Arye 2de6b02c16 Add optimization to use HashAggregate more often
This optimization adds a HashAggregate plan to many group by queries.
In plain postgres, many time-series queries will not use the hash
aggregate because the planner will incorrectly assume that the number of
rows is much larger than it actually is and will use the less efficient
GroupAggregate instead of a HashAggregate to prevent running out of
memory.

The planner will assume a large number of rows because the statistics
planner for grouping assumes that the number of distinct items produced
by a function is the same as the number of distinct items going in. This
is not true for functions like time_bucket and date_trunc. This
optimization fixes the statistics and add the HashAggregate plan if
appropriate.

The statistics now rely on evaluating the spread of a variable and
dividing it by the interval in the time_bucket or date_trunc.  This is
still an overestimate of the total number of groups but is better than
before. A further improvement on this will be to evaluate the quals
(WHERE clauses) on the query to try to derive a tighter spread on the
variable. This is left to a future optimization.
2018-06-21 14:01:02 -04:00
..
2017-11-10 09:44:20 +01:00
2017-10-03 10:51:32 +02:00