mirror of https://github.com/timescale/timescaledb.git synced 2025-05-23 22:41:34 +08:00

History

gayyappan 05319cd424 Support analyze of internal compression table

This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.

2020-11-11 15:05:14 -05:00

.clang-tidy

Improve linting support with clang-tidy

2020-05-29 14:04:25 +02:00

array.c

Remove support for PG9.6 and PG10

2020-06-02 23:48:35 +02:00

array.h

Remove unnecessary exports in tsl library

2020-08-17 18:58:18 +02:00

CMakeLists.txt

Add segment meta min/max

2019-10-29 19:02:58 -04:00

compress_utils.c

Add missing increment for PG11 decompression

2020-10-20 11:54:35 -07:00

compress_utils.h

Support compression on distributed hypertables

2020-05-27 17:31:09 +02:00

compression.c

Support analyze of internal compression table

2020-11-11 15:05:14 -05:00

compression.h

Support analyze of internal compression table

2020-11-11 15:05:14 -05:00

create.c

Make errors and messages conform to style guide

2020-10-20 16:49:32 +02:00

create.h

Use table_open/close and PG aggregated directive

2020-04-14 23:12:15 +02:00

datum_serialize.c

Initial support for PostgreSQL 12

2020-04-14 23:12:15 +02:00

datum_serialize.h

Use DatumSerialize for binary strings

2019-10-29 19:02:58 -04:00

deltadelta.c

Remove support for PG9.6 and PG10

2020-06-02 23:48:35 +02:00

deltadelta.h

Remove unnecessary exports in tsl library

2020-08-17 18:58:18 +02:00

dictionary_hash.h

Cleanup TODOs and FIXMEs

2020-05-18 20:16:03 -04:00

dictionary.c

Remove support for PG9.6 and PG10

2020-06-02 23:48:35 +02:00

dictionary.h

Remove unnecessary exports in tsl library

2020-08-17 18:58:18 +02:00

gorilla.c

Remove support for PG9.6 and PG10

2020-06-02 23:48:35 +02:00

gorilla.h

Remove unnecessary exports in tsl library

2020-08-17 18:58:18 +02:00

README.md

…

segment_meta.c

Split segment meta min_max into two columns

2019-10-29 19:02:58 -04:00

segment_meta.h

Split segment meta min_max into two columns

2019-10-29 19:02:58 -04:00

simple8b_rle.h

Remove unnecessary exports in tsl library

2020-08-17 18:58:18 +02:00

utils.h

…

README.md

Compression Algorithms

This is a collection of compression algorithms that are used to compress data of different types. The algorithms are optimized for time-series use-cases; many of them assume that adjacent rows will have "similar" values.

API

Each compression algorithm the API is divided into two parts: a compressor and a decompression iterator. The compressor is used to compress new data.

<algorithm name>_compressor_alloc - creates the compressor
<algorithm_name>_compressor_append_null - appends a null
<algorithm_name>_compressor_append_value - appends a non-null value
<agorithm_name>_compressor_finish - finalizes the compression and returns the compressed data

Data can be read back out using the decompression iterator. An iterator can operate backwards or forwards. There is no random access. The api is

<algorithm_name>_decompression_iterator_from_datum_<forward|reverse> - create a new DatumIterator in the forward or reverse direction.
a DatumIterator has a function pointer called try_next that returns the next DecompressResult.

A DecompressResult can either be a decompressed value datum, null, or a done marker to indicate that the iterator is done.

Each decompression algorithm also contains send and recv function to get the external binary representations.

CompressionAlgorithmDefinition is a structure that defines function pointers to get forward and reverse iterators as well as send and recv functions. The definitions array in compression.c contains a CompressionAlgorithmDefinition for each compression algorithm.

Base algorithms

The simple8b rle algorithm is a building block for many of the compression algorithms. It compresses a series of uint64 values. It compresses the data by packing the values into the least amount of bits necessary for the magnitude of the int values, using run-length-encoding for large numbers of repeated values, A complete description is in the header file. Note that this is a header-only implementation as performance is paramount here as it is used a primitive in all the other compression algorithms.

Compression Algorithms

DeltaDelta

for each integer, it takes the delta-of-deltas with the pervious integer, zigzag encodes this deltadelta, then finally simple8b_rle encodes this zigzagged result. This algorithm performs very well when the magnitude of the delta between adjacent values tends not to vary much, and is optimal for fixed rate-of-change.

Gorilla

gorilla encodes floats using the Facebook gorilla algorithm. It stores the compressed xors of adjacent values. It is one of the few simple algorithms that compresses floating point numbers reasonably well.

Dictionary

The dictionary mechanism stores data in two parts: a "dictionary" storing each unique value in the dataset (stored as an array, see below) and simple8b_rle compressed list of indexes into the dictionary, ordered by row. This scheme can store any type of data, but will only be a space improvement if the data set is of relatively low cardinality.

Array

The array "compression" method simply stores the data in an array-like structure and does not actually compress it (though TOAST-based compression can be applied on top). It is the compression mechanism used when no other compression mechanism works. It can store any type of data.