Add a function to decompress a compressed batch entirely in one go, and
use it in some query plans. As a result of decompression, produce
ArrowArrays. They will be the base for the subsequent vectorized
computation of aggregates.
As a side effect, some heavy queries to compressed hypertables speed up
by about 15%. Point queries with LIMIT 1 can regress by up to 1 ms. If
the absolute highest performace is desired for such queries, bulk
decompression can be disabled by a GUC.
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.
test/sql/
- triggers.sql: Made required change
tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete
tsl/src/
- compression/compression.c:
- row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
- compressor_finish: Removed (obsolete)
src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
- process_altertable_end_subcmd: Removed (PG will handle case)
Previously, the detoasting in Array was incorrect and so the compressed
table stored pointers into the toast table of the uncomoressed table.
This commit fixes the bug and also add logic to the test to remove
the uncompressed table so such a bug would cause test failures in
the future.
This commit changes deltadelta compression to store the first element
in the simple8b array instead of out-of-line. Besides shrinking the
data in some cases, this also ensures that the simple8b array is never
empty, fixing the case where only a single element is stored.
This rebuilds indexes during compression and decompression. Previously,
indexes were not updated during these operations. We also fix
a small bug with orderby and segmentby handling of empty strings/
lists.
Finally, we add some more tests.
Dictionary compression can be a pessimization if there aren't many
repeated values. Since we want to have a single fallback compressor we
can recommend when one of the more specialized compressors aren't
appropriate, this commit adds a fallback where, if it would be more
efficient to store data as an array instead of dictionary-compressed,
the dictionary compressor will automatically return the value as an
array.
We eventually want to be able to compress chunks in the background as
they become old enough. As an incremental step in this directions, this
commit adds the ability to compress any table, albeit with an
unintuitive and brittle interface. This will eventually married to our
catalogs and background workers to provide a seamless experience.
This commit also fixes a bug in gorilla in which the compressor could
not handle the case where the leading/trailing zeroes were always 0.
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:
- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
structure and does not actually compress it (though
TOAST-based compression can be applied on top).
These compression algorithms are are fully described in
tsl/src/compression/README.md.
The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.
More information can be found in
src/adts/README.md