106 Commits

Author SHA1 Message Date
Alexander Kuzmenkov
c34fd0b06c Do not use partial indexes for compression
They refer only to a subset of the table.
2023-11-06 13:34:33 +01:00
Sven Klemm
0aefc072c0 Pass RowDecompressor by reference in build_scankeys
Change the code to pass RowDecompressor by reference instead of
value. Found by coverity.
2023-10-29 17:10:23 +01:00
Jan Nidzwetzki
8767de658b Reduce WAL activity by freezing tuples immediately
When we compress a chunk, we create a new compressed chunk for storing
the compressed data. So far, the tuples were just inserted into the
compressed chunk and frozen by a later vacuum run.

However, freezing tuples causes WAL activity can be optimized because
the compressed chunk is created in the same transaction as the tuples.
This patch reduces the WAL activity by storing these tuples directly as
frozen and preventing a freeze operation in the future. This approach is
similar to PostgreSQL's COPY FREEZE.
2023-10-25 13:27:07 +02:00
Konstantina Skovola
7b7722e241 Fix index inserts during decompression
Since version 2.11.0 we would get a segmentation fault during
decompression when there was an expressional or partial index on the
uncompressed chunk.
This patch fixes this by calling ExecInsertIndexTuples to insert into
indexes during chunk decompression, instead of CatalogIndexInsert.

In addition, when enabling compression on a hypertable, we check the
unique indexes defined on it to provide performance improvement hints
in case the unique index columns are not specified as compression
parameters.
However this check threw an error when expression columns were present
in the index, preventing the user from enabling compression.
This patch fixes this by simply ignoring the expression columns in the
index, since we cannot currently segment by an expression.

Fixes #6205, #6186
2023-10-24 16:51:57 +03:00
Sven Klemm
8f3bb0ba70 Fix EXPLAIN for compressed DML
EXPLAIN ANALYZE for compressed DML would error out with `bogus varno`
error because we would modify the original expressions of the plan
that were still referenced in nodes instead of adjusting copies and
using those copies in our internal scans.
2023-10-18 13:50:23 +02:00
Sven Klemm
332bbdb6e7 Show batches/tuples decompressed in EXPLAIN output
This patch adds tracking number of batches and tuples that needed
to be decompressed as part of DML operations on compressed hypertables.
These will be visible in EXPLAIN ANALYZE output like so:

QUERY PLAN
 Custom Scan (HypertableModify) (actual rows=0 loops=1)
   Batches decompressed: 2
   Tuples decompressed: 25
   ->  Insert on decompress_tracking (actual rows=0 loops=1)
         ->  Custom Scan (ChunkDispatch) (actual rows=2 loops=1)
               ->  Values Scan on "*VALUES*" (actual rows=2 loops=1)
(6 rows)
2023-10-14 18:01:36 +02:00
Ante Kresic
1932c02fc9 Avoid decompressing batches using an empty slot
When running COPY command into a compressed hypertable, we
could end up using an empty slot for filtering compressed batches.
This happens when a previously created copy buffer for a chunk
does not contain any new tuples for inserting. The fix is to
verify slots before attempting to do anything else.
2023-09-27 09:08:59 +02:00
Sven Klemm
a7e7e675a4 Improve compression datatype handling
Fall back to btree operator input type when it is binary compatible with
the column type and no operator for column type could be found. This
should improve performance when using column types like char or varchar
instead of text.
2023-09-18 12:18:21 +02:00
Bharathy
e66a40038e Fix server crash on UPDATE of compressed chunk
UPDATE query with system attributes in WHERE clause causes
server to crash. This patch fixes this issue by checking for
system attributes and handle cases only for segmentby attributes
in fill_predicate_context().

Fixes #6024
2023-09-04 09:23:26 +00:00
Lakshmi Narayanan Sreethar
3438636a05 PG16: Macro HeapKeyTest is now an inline function
postgres/postgres@4eb3b112
2023-08-16 18:25:54 +05:30
Lakshmi Narayanan Sreethar
4bd704f3fc Further code cleanup after PG12 removal
Removed PG12 specific code guarded by the PG13_LT and PG13_GE macros.
2023-08-11 00:31:48 +05:30
noctarius aka Christoph Engelbert
b5b46a3e58
Make logrepl markers for (partial) decompressions (#5805)
Added logical replication messages (PG14+) as markers for (partial)
decompression events (mutual compression), which makes it possible to
differentiate inserts happening as part of the decompression vs actual
inserts by the user, and filter the former out of the event stream.
While some tools may be interested in all events, synching the pure
"state" (without internal behavior) is required for others.

As of now this PR is missing tests. I wonder if anyone has a good idea
how to create an automatic test for it.
2023-08-09 13:28:54 +02:00
Lakshmi Narayanan Sreethar
3af0d282ea PG16: ExecInsertIndexTuples requires additional parameter
PG16 adds a new boolean parameter to the ExecInsertIndexTuples function
to denote if the index is a BRIN index, which is then used to determine
if the index update can be skipped. The fix also removes the
INDEX_ATTR_BITMAP_ALL enum value.

Adapt these changes by updating the compat function to accomodate the
new parameter added to the ExecInsertIndexTuples function and using an
alternative for the removed INDEX_ATTR_BITMAP_ALL enum value.

postgres/postgres@19d8e23
2023-08-09 03:04:12 +05:30
Alexander Kuzmenkov
eaa1206b7f Improvements for bulk decompression
* Restore default batch context size to fix a performance regression on
  sorted batch merge plans.
* Support reverse direction.
* Improve gorilla decompression by computing prefix sums of tag bitmaps
  during decompression.
2023-07-06 19:52:20 +02:00
Ante Kresic
fb0df1ae4e Insert into indexes during chunk compression
If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
2023-06-26 09:37:12 +02:00
Lakshmi Narayanan Sreethar
d96e72af60 PG16: Rename RelFileNode references to RelFileNumber or RelFileLocator
postgres/postgres@b0a55e4
2023-06-21 22:52:22 +05:30
Bharathy
c48f905f78 Index scan support for UPDATE/DELETE.
During UPDATE/DELETE on compressed hypertables, we do a sequential
scan which can be improved by supporting index scans.

In this patch for a given UPDATE/DELETE query, if there are any
WHERE conditions specified using SEGMENT BY columns, we use index
scan to fetch all matching rows. Fetched rows will be decompressed
and moved to uncompressed chunk and a regular UPDATE/DELETE is
performed on the uncompressed chunk.
2023-06-15 19:59:04 +05:30
Alexander Kuzmenkov
f26e656c0f Bulk decompression of compressed batches
Add a function to decompress a compressed batch entirely in one go, and
use it in some query plans. As a result of decompression, produce
ArrowArrays. They will be the base for the subsequent vectorized
computation of aggregates.

As a side effect, some heavy queries to compressed hypertables speed up
by about 15%. Point queries with LIMIT 1 can regress by up to 1 ms. If
the absolute highest performace is desired for such queries, bulk
decompression can be disabled by a GUC.
2023-06-07 16:21:50 +02:00
Fabrízio de Royes Mello
4cef387f85 Replace heap_endscan to table_endscan 2023-06-02 11:15:11 -04:00
Alexander Kuzmenkov
6589f43160 Compression fuzzing in CI
This serves as a way to exercise the decompression fuzzing code, which
will be useful when we need to change the decompression functions. Also
this way we'll have a check in CI that uses libfuzzer, and it will be
easier to apply it to other areas of code in the future.
2023-05-25 11:35:07 +02:00
Alexander Kuzmenkov
8ff0648fd0 Fix ubsan failure in gorilla decompression
Also add more tests
2023-05-16 21:32:52 +02:00
Alexander Kuzmenkov
030bfe867d Fix errors in decompression found by fuzzing
For deltadelta and gorilla codecs, add various length and consistency
checks that prevent segfaults on incorrect data.
2023-05-15 18:33:22 +02:00
Bharathy
2d71a5bca9 Fix leak during concurrent UPDATE/DELETE
When updating and deleting the same tuple while both transactions are
running at the same time, we end up with reference leak. This is because
one of the query in a transaction fails and we take error path. However
we fail to close the table.

This patch fixes the above mentioned problem by closing the required
tables.

Fixes #5674
2023-05-12 11:21:10 +05:30
Ante Kresic
ab22478992 Fix DML decompression issues with bitmap heap scan
Bitmap heap scans are specific in that they store scan state
during node initialization. This means they would not pick up on
any data that might have been decompressed during a DML command
from the compressed chunk. To avoid this, we update the snapshot
on the node scan state and issue a rescan to update the internal state.
2023-05-10 12:54:20 +02:00
Fabrízio de Royes Mello
3dc6824eb5 Add GUC to enable/disable DML decompression 2023-05-05 14:59:13 -03:00
Ante Kresic
6782beb150 Fix index scan handling in DML decompression
We need to use the correct qualifiers for index scans since the
generic scan qualifiers are not populated in this case.
2023-05-05 13:16:57 +02:00
Sven Klemm
9259311275 Fix JOIN handling in UPDATE/DELETE on compressed chunks
When JOINs were present during UPDATE/DELETE on compressed chunks
the code would decompress other hypertables that were not the
target of the UPDATE/DELETE operations and in the case of self-JOINs
potentially decompress chunks not required to be decompressed.
2023-05-04 13:52:14 +02:00
Bharathy
769f9fe609 Fix segfault when deleting from compressed chunk
During UPDATE/DELETE on compressed hypertables, we iterate over plan
tree to collect all scan nodes. For each scan nodes there can be
filter conditions.

Prior to this patch we collect only first filter condition and use
for first chunk which may be wrong. In this patch as and when we
encounter a target scan node, we immediatly process those chunks.

Fixes #5640
2023-05-03 23:19:26 +05:30
Zoltan Haindrich
1d092560f4 Fix on-insert decompression after schema changes
On compressed hypertables 3 schema levels are in use simultaneously
 * main - hypertable level
 * chunk - inheritance level
 * compressed chunk

In the build_scankeys method all of them appear - as slot have their
fields represented as a for a row of the main hypertable.

Accessing the slot by the attribut numbers of the chunks may lead to
indexing mismatches if there are differences between the schemes.

Fixes: #5577
2023-04-27 16:33:36 +02:00
Ante Kresic
910663d0be Reduce decompression during UPDATE/DELETE
When updating or deleting tuples from a compressed chunk, we first
need to decompress the matching tuples then proceed with the operation.
This optimization reduces the amount of data decompressed by using
compressed metadata to decompress only the affected segments.
2023-04-25 15:49:59 +02:00
Ante Kresic
583c36e91e Refactor compression code to reduce duplication 2023-04-20 22:27:34 +02:00
Ante Kresic
a49fdbcffb Reduce decompression during constraint checking
When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
2023-04-20 12:17:12 +02:00
Ante Kresic
84b6783a19 Fix chunk status when inserting into chunks
While executing compression operations in parallel with
inserting into chunks (both operations which can potentially
change the chunk status), we could get into situations where
the chunk status would end up inconsistent. This change re-reads
the chunk status after locking the chunk to make sure it can
decompress data when handling ON CONFLICT inserts correctly.
2023-04-12 10:50:44 +02:00
Bharathy
1fb058b199 Support UPDATE/DELETE on compressed hypertables.
This patch does following:

1. Executor changes to parse qual ExprState to check if SEGMENTBY
   column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
   scan keys and move only those rows which match the WHERE clause
   to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
   UPDATE/DELETE operations on PG versions < 14, we don't support this
   feature on PG12 and PG13.
2023-04-05 17:19:45 +05:30
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
shhnwz
699fcf48aa Stats improvement for Uncompressed Chunks
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.

Fixes #309
2023-03-22 23:51:13 +05:30
Zoltan Haindrich
790b322b24 Fix DEFAULT value handling in decompress_chunk
The sql function decompress_chunk did not filled in
default values during its operation.

Fixes #5412
2023-03-16 09:16:50 +01:00
Sven Klemm
65562f02e8 Support unique constraints on compressed chunks
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
2023-03-13 12:04:38 +01:00
Sven Klemm
c02cb76b38 Don't reindex relation during decompress_chunk
Reindexing a relation requires AccessExclusiveLock which prevents
queries on that chunk. This patch changes decompress_chunk to update
the index during decompression instead of reindexing. This patch
does not change the required locks as there are locking adjustments
needed in other places to make it safe to weaken that lock.
2023-03-13 10:58:26 +01:00
Sven Klemm
8132908c97 Refactor chunk decompression functions
Restructure the code inside decompress_chunk slightly to make core
loop reusable by other functions.
2023-02-06 14:52:06 +01:00
Sven Klemm
b229b3aefd Small decompress_chunk refactor
Refactor the decompression code to move the decompressor
initialization into a separate function.
2023-01-30 16:47:16 +01:00
Sven Klemm
dbe89644b5 Remove no longer used compression code
The recent refactoring of INSERT into compression chunk made this
code obsolete but forgot to remove it in that patch.
2023-01-16 14:18:56 +01:00
shhnwz
601b37daa8 Index support for compress chunk
It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'
2022-12-15 20:26:00 +05:30
Ante Kresic
cbf51803dd Fix index att number calculation
Attribute offset was used by mistake where attribute number was
needed causing wrong values to be fetched when scanning
compressed chunk index.
2022-12-15 11:23:10 +01:00
Matvey Arye
df16815009 Fix memory leak for compression with merge chunks
The RelationInitIndexAccessInfo call leaks cache memory and
seems to be unnecessary.
2022-12-13 08:22:49 +01:00
Alexander Kuzmenkov
1b65297ff7 Fix memory leak with INSERT into compressed hypertable
We used to allocate some temporary data in the ExecutorContext.
2022-11-16 13:58:52 +04:00
Fabrízio de Royes Mello
f1535660b0 Honor usage of OidIsValid() macro
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.

Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
2022-11-03 16:10:50 -03:00
Ante Kresic
2475c1b92f Roll up uncompressed chunks into compressed ones
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.

From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov
313845a882 Enable -Wextra
Our code mostly has warnings about comparison with different
signedness.
2022-10-27 16:06:58 +04:00
Alexander Kuzmenkov
f862212c8c Add clang-tidy warning readability-inconsistent-declaration-parameter-name
Mostly cosmetic stuff. Matched to definition automatically with
--fix-notes.
2022-10-20 19:42:11 +04:00