23 Commits

Author SHA1 Message Date
Alexander Kuzmenkov
eaa1206b7f Improvements for bulk decompression
* Restore default batch context size to fix a performance regression on
  sorted batch merge plans.
* Support reverse direction.
* Improve gorilla decompression by computing prefix sums of tag bitmaps
  during decompression.
2023-07-06 19:52:20 +02:00
Ante Kresic
fb0df1ae4e Insert into indexes during chunk compression
If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
2023-06-26 09:37:12 +02:00
Alexander Kuzmenkov
f26e656c0f Bulk decompression of compressed batches
Add a function to decompress a compressed batch entirely in one go, and
use it in some query plans. As a result of decompression, produce
ArrowArrays. They will be the base for the subsequent vectorized
computation of aggregates.

As a side effect, some heavy queries to compressed hypertables speed up
by about 15%. Point queries with LIMIT 1 can regress by up to 1 ms. If
the absolute highest performace is desired for such queries, bulk
decompression can be disabled by a GUC.
2023-06-07 16:21:50 +02:00
Alexander Kuzmenkov
030bfe867d Fix errors in decompression found by fuzzing
For deltadelta and gorilla codecs, add various length and consistency
checks that prevent segfaults on incorrect data.
2023-05-15 18:33:22 +02:00
Sven Klemm
9259311275 Fix JOIN handling in UPDATE/DELETE on compressed chunks
When JOINs were present during UPDATE/DELETE on compressed chunks
the code would decompress other hypertables that were not the
target of the UPDATE/DELETE operations and in the case of self-JOINs
potentially decompress chunks not required to be decompressed.
2023-05-04 13:52:14 +02:00
Bharathy
769f9fe609 Fix segfault when deleting from compressed chunk
During UPDATE/DELETE on compressed hypertables, we iterate over plan
tree to collect all scan nodes. For each scan nodes there can be
filter conditions.

Prior to this patch we collect only first filter condition and use
for first chunk which may be wrong. In this patch as and when we
encounter a target scan node, we immediatly process those chunks.

Fixes #5640
2023-05-03 23:19:26 +05:30
Ante Kresic
910663d0be Reduce decompression during UPDATE/DELETE
When updating or deleting tuples from a compressed chunk, we first
need to decompress the matching tuples then proceed with the operation.
This optimization reduces the amount of data decompressed by using
compressed metadata to decompress only the affected segments.
2023-04-25 15:49:59 +02:00
Ante Kresic
583c36e91e Refactor compression code to reduce duplication 2023-04-20 22:27:34 +02:00
Bharathy
1fb058b199 Support UPDATE/DELETE on compressed hypertables.
This patch does following:

1. Executor changes to parse qual ExprState to check if SEGMENTBY
   column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
   scan keys and move only those rows which match the WHERE clause
   to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
   UPDATE/DELETE operations on PG versions < 14, we don't support this
   feature on PG12 and PG13.
2023-04-05 17:19:45 +05:30
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
shhnwz
699fcf48aa Stats improvement for Uncompressed Chunks
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.

Fixes #309
2023-03-22 23:51:13 +05:30
Sven Klemm
65562f02e8 Support unique constraints on compressed chunks
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
2023-03-13 12:04:38 +01:00
Sven Klemm
dbe89644b5 Remove no longer used compression code
The recent refactoring of INSERT into compression chunk made this
code obsolete but forgot to remove it in that patch.
2023-01-16 14:18:56 +01:00
Ante Kresic
2475c1b92f Roll up uncompressed chunks into compressed ones
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.

From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov
f862212c8c Add clang-tidy warning readability-inconsistent-declaration-parameter-name
Mostly cosmetic stuff. Matched to definition automatically with
--fix-notes.
2022-10-20 19:42:11 +04:00
gayyappan
93be235d33 Support for inserts into compressed hypertables
Add CompressRowSingleState .
This has functions to compress a single row.
2021-05-24 18:03:47 -04:00
gayyappan
05319cd424 Support analyze of internal compression table
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
2020-11-11 15:05:14 -05:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
gayyappan
6832ed2ca5 Modify storage type for toast columns
This PR modifies the toast type for compressed columns based on
the algorithm used for compression.
2019-10-29 19:02:58 -04:00
Sven Klemm
bdc599793c Add helper function to get decompression iterator init function 2019-10-29 19:02:58 -04:00
Joshua Lockerman
6d55f6f615 Add decompress_chunk
This function is the inverse of compress_chunk: it takes a table
containing compressed data decompresses it, and writes it out to
another table.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
2f16d84c39 Add ability to compress tables
We eventually want to be able to compress chunks in the background as
they become old enough. As an incremental step in this directions, this
commit adds the ability to compress any table, albeit with an
unintuitive and brittle interface. This will eventually married to our
catalogs and background workers to provide a seamless experience.

This commit also fixes a bug in gorilla in which the compressor could
not handle the case where the leading/trailing zeroes were always 0.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00