mirror of
https://github.com/timescale/timescaledb.git
synced 2025-05-28 09:46:44 +08:00
This optimization adds a HashAggregate plan to many group by queries. In plain postgres, many time-series queries will not use the hash aggregate because the planner will incorrectly assume that the number of rows is much larger than it actually is and will use the less efficient GroupAggregate instead of a HashAggregate to prevent running out of memory. The planner will assume a large number of rows because the statistics planner for grouping assumes that the number of distinct items produced by a function is the same as the number of distinct items going in. This is not true for functions like time_bucket and date_trunc. This optimization fixes the statistics and add the HashAggregate plan if appropriate. The statistics now rely on evaluating the spread of a variable and dividing it by the interval in the time_bucket or date_trunc. This is still an overestimate of the total number of groups but is better than before. A further improvement on this will be to evaluate the quals (WHERE clauses) on the query to try to derive a tighter spread on the variable. This is left to a future optimization.