This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:
- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
structure and does not actually compress it (though
TOAST-based compression can be applied on top).
These compression algorithms are are fully described in
tsl/src/compression/README.md.
The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.
More information can be found in
src/adts/README.md
This PR adds support for finalizing aggregates with FINALFUNC_EXTRA. To
do this, we need to pass NULLS correspond to all of the aggregate
parameters to the ffunc as arguments following the partial state value.
These arguments need to have the correct concrete types.
For polymorphic aggregates, the types cannot be derived from the catalog
but need to be somehow conveyed to the finalize_agg. Two designs were
considered:
1) Encode the type names as part of the partial state (bytea)
2) Pass down the arguments as parameters to the finalize_agg
In the end (2) was picked for the simple reason that (1) would have
increased the size of each partial, sometimes considerably (esp. for small
partial values).
The types are passed down as names not OIDs because in the continuous
agg case using OIDs is not safe for backup/restore and in the clustering
case the datanodes may not have the same type OIDs either.
The ability to get aggregate partials instead of the final state
is important for both continuous aggregation and clustering.
This commit adds the ability to work with aggregate partials.
Namely a function called _timescaledb_internal.partialize_agg
can now wrap an aggregate and return the partial results as a bytea.
The _timescaledb_internal.finalize_agg aggregate allows you to combine
and finalize partials.
The partialize_agg function works as a marker in the planner to force
the planner to return partial result.
Unfortunately, we could not get the planner to modify the plan directly
to aggregate partials. Instead, the finalize_agg is a real aggregate
that performs aggregation on the partial state. Note that it is not
yet parallel.
Aggregate that use FINALFUNC_EXTRA are currently not supported.
Co-authored-by: gayyappan <gayathri@timescale.com>
Co-authored-by: David Kohn <david@timescale.com>
There is no CREATE OR REPLACE AGGREGATE which means that the only way to replace
an aggregate is to DROP then CREATE which is problematic as it will fail
if the previous version of the aggregate has dependencies.
This commit makes sure aggregates are not dropped and recreated every time.
NOTE that WHEN CREATING NEW FUNCTIONS in sql/aggregates.sql you should also make
sure they are created in an update script so that both new users and people
updating from a previous version get the new function.
sql/aggregates.sql is run only once when TimescaleDB is installed and is not
rerun during updates that is why everything created there should also be in an
update file.
Fixes#612