timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-21 21:21:22 +08:00

Author	SHA1	Message	Date
Alexander Kuzmenkov	eaa1206b7f	Improvements for bulk decompression * Restore default batch context size to fix a performance regression on sorted batch merge plans. * Support reverse direction. * Improve gorilla decompression by computing prefix sums of tag bitmaps during decompression.	2023-07-06 19:52:20 +02:00
Ante Kresic	fb0df1ae4e	Insert into indexes during chunk compression If there any indexes on the compressed chunk, insert into them while inserting the heap data rather than reindexing the relation at the end. This reduces the amount of locking on the compressed chunk indexes which created issues when merging chunks and should help with the future updates of compressed data.	2023-06-26 09:37:12 +02:00
Alexander Kuzmenkov	f26e656c0f	Bulk decompression of compressed batches Add a function to decompress a compressed batch entirely in one go, and use it in some query plans. As a result of decompression, produce ArrowArrays. They will be the base for the subsequent vectorized computation of aggregates. As a side effect, some heavy queries to compressed hypertables speed up by about 15%. Point queries with LIMIT 1 can regress by up to 1 ms. If the absolute highest performace is desired for such queries, bulk decompression can be disabled by a GUC.	2023-06-07 16:21:50 +02:00
Alexander Kuzmenkov	030bfe867d	Fix errors in decompression found by fuzzing For deltadelta and gorilla codecs, add various length and consistency checks that prevent segfaults on incorrect data.	2023-05-15 18:33:22 +02:00
Sven Klemm	9259311275	Fix JOIN handling in UPDATE/DELETE on compressed chunks When JOINs were present during UPDATE/DELETE on compressed chunks the code would decompress other hypertables that were not the target of the UPDATE/DELETE operations and in the case of self-JOINs potentially decompress chunks not required to be decompressed.	2023-05-04 13:52:14 +02:00
Bharathy	769f9fe609	Fix segfault when deleting from compressed chunk During UPDATE/DELETE on compressed hypertables, we iterate over plan tree to collect all scan nodes. For each scan nodes there can be filter conditions. Prior to this patch we collect only first filter condition and use for first chunk which may be wrong. In this patch as and when we encounter a target scan node, we immediatly process those chunks. Fixes #5640	2023-05-03 23:19:26 +05:30
Ante Kresic	910663d0be	Reduce decompression during UPDATE/DELETE When updating or deleting tuples from a compressed chunk, we first need to decompress the matching tuples then proceed with the operation. This optimization reduces the amount of data decompressed by using compressed metadata to decompress only the affected segments.	2023-04-25 15:49:59 +02:00
Ante Kresic	583c36e91e	Refactor compression code to reduce duplication	2023-04-20 22:27:34 +02:00
Bharathy	1fb058b199	Support UPDATE/DELETE on compressed hypertables. This patch does following: 1. Executor changes to parse qual ExprState to check if SEGMENTBY column is specified in WHERE clause. 2. Based on step 1, we build scan keys. 3. Executor changes to do heapscan on compressed chunk based on scan keys and move only those rows which match the WHERE clause to staging area aka uncompressed chunk. 4. Mark affected chunk as partially compressed. 5. Perform regular UPDATE/DELETE operations on staging area. 6. Since there is no Custom Scan (HypertableModify) node for UPDATE/DELETE operations on PG versions < 14, we don't support this feature on PG12 and PG13.	2023-04-05 17:19:45 +05:30
Konstantina Skovola	72c0f5b25e	Rewrite recompress_chunk in C for segmentwise processing This patch introduces a C-function to perform the recompression at a finer granularity instead of decompressing and subsequently compressing the entire chunk. This improves performance for the following reasons: - it needs to sort less data at a time and - it avoids recreating the decompressed chunk and the heap inserts associated with that by decompressing each segment into a tuplesort instead. If no segmentby is specified when enabling compression or if an index does not exist on the compressed chunk then the operation is performed as before, decompressing and subsequently compressing the entire chunk.	2023-03-23 11:39:43 +02:00
shhnwz	699fcf48aa	Stats improvement for Uncompressed Chunks During the compression autovacuum use to be disabled for uncompressed chunk and enable after decompression. This leads to postgres maintainence issue. Let's not disable autovacuum for uncompressed chunk anymore. Let postgres take care of the stats in its natural way. Fixes #309	2023-03-22 23:51:13 +05:30
Sven Klemm	65562f02e8	Support unique constraints on compressed chunks This patch allows unique constraints on compressed chunks. When trying to INSERT into compressed chunks with unique constraints any potentially conflicting compressed batches will be decompressed to let postgres do constraint checking on the INSERT. With this patch only INSERT ON CONFLICT DO NOTHING will be supported. For decompression only segment by information is considered to determine conflicting batches. This will be enhanced in a follow-up patch to also include orderby metadata to require decompressing less batches.	2023-03-13 12:04:38 +01:00
Sven Klemm	dbe89644b5	Remove no longer used compression code The recent refactoring of INSERT into compression chunk made this code obsolete but forgot to remove it in that patch.	2023-01-16 14:18:56 +01:00
Ante Kresic	2475c1b92f	Roll up uncompressed chunks into compressed ones This change introduces a new option to the compression procedure which decouples the uncompressed chunk interval from the compressed chunk interval. It does this by allowing multiple uncompressed chunks into one compressed chunk as part of the compression procedure. The main use-case is to allow much smaller uncompressed chunks than compressed ones. This has several advantages: - Reduce the size of btrees on uncompressed data (thus allowing faster inserts because those indexes are memory-resident). - Decrease disk-space usage for uncompressed data. - Reduce number of chunks over historical data. From a UX point of view, we simple add a compression with clause option `compress_chunk_time_interval`. The user should set that according to their needs for constraint exclusion over historical data. Ideally, it should be a multiple of the uncompressed chunk interval and so we throw a warning if it is not.	2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov	f862212c8c	Add clang-tidy warning readability-inconsistent-declaration-parameter-name Mostly cosmetic stuff. Matched to definition automatically with --fix-notes.	2022-10-20 19:42:11 +04:00
gayyappan	93be235d33	Support for inserts into compressed hypertables Add CompressRowSingleState . This has functions to compress a single row.	2021-05-24 18:03:47 -04:00
gayyappan	05319cd424	Support analyze of internal compression table This commit modifies analyze behavior as follows: 1. When an internal compression table is analyzed, statistics from the compressed chunk (such as page count and tuple count) is used to update the statistics of the corresponding chunk parent, if it is missing. 2. Analyze compressed chunk instead of raw chunks When the command ANALYZE <hypertable> is executed, a) analyze uncompressed chunks and b) skip the raw chunk, but analyze the compressed chunk.	2020-11-11 15:05:14 -05:00
gayyappan	b93b30b0c2	Add counts to compression statistics Store information related to compressed and uncompressed row counts after compressing a chunk. This is saved in compression_chunk_size table.	2020-06-19 15:58:04 -04:00
gayyappan	6832ed2ca5	Modify storage type for toast columns This PR modifies the toast type for compressed columns based on the algorithm used for compression.	2019-10-29 19:02:58 -04:00
Sven Klemm	bdc599793c	Add helper function to get decompression iterator init function	2019-10-29 19:02:58 -04:00
Joshua Lockerman	6d55f6f615	Add decompress_chunk This function is the inverse of compress_chunk: it takes a table containing compressed data decompresses it, and writes it out to another table.	2019-10-29 19:02:58 -04:00
Joshua Lockerman	2f16d84c39	Add ability to compress tables We eventually want to be able to compress chunks in the background as they become old enough. As an incremental step in this directions, this commit adds the ability to compress any table, albeit with an unintuitive and brittle interface. This will eventually married to our catalogs and background workers to provide a seamless experience. This commit also fixes a bug in gorilla in which the compressor could not handle the case where the leading/trailing zeroes were always 0.	2019-10-29 19:02:58 -04:00
Joshua Lockerman	584f5d1061	Implement time-series compression algorithms This commit introduces 4 compression algorithms as well as 3 ADTs to support them. The compression algorithms are time-series optimized. The following algorithms are implemented: - DeltaDelta compresses integer and timestamp values - Gorilla compresses floats - Dictionary compression handles any data type and is optimized for low-cardinality datasets. - Array stores any data type in an array-like structure and does not actually compress it (though TOAST-based compression can be applied on top). These compression algorithms are are fully described in tsl/src/compression/README.md. The Abstract Data Types that are implemented are - Vector - A dynamic vector that can store any type. - BitArray - A dynamic vector to store bits. - SimpleHash - A hash table implementation from PG12. More information can be found in src/adts/README.md	2019-10-29 19:02:58 -04:00

23 Commits