When we compress a chunk, we create a new compressed chunk for storing
the compressed data. So far, the tuples were just inserted into the
compressed chunk and frozen by a later vacuum run.
However, freezing tuples causes WAL activity can be optimized because
the compressed chunk is created in the same transaction as the tuples.
This patch reduces the WAL activity by storing these tuples directly as
frozen and preventing a freeze operation in the future. This approach is
similar to PostgreSQL's COPY FREEZE.
Since version 2.11.0 we would get a segmentation fault during
decompression when there was an expressional or partial index on the
uncompressed chunk.
This patch fixes this by calling ExecInsertIndexTuples to insert into
indexes during chunk decompression, instead of CatalogIndexInsert.
In addition, when enabling compression on a hypertable, we check the
unique indexes defined on it to provide performance improvement hints
in case the unique index columns are not specified as compression
parameters.
However this check threw an error when expression columns were present
in the index, preventing the user from enabling compression.
This patch fixes this by simply ignoring the expression columns in the
index, since we cannot currently segment by an expression.
Fixes#6205, #6186
EXPLAIN ANALYZE for compressed DML would error out with `bogus varno`
error because we would modify the original expressions of the plan
that were still referenced in nodes instead of adjusting copies and
using those copies in our internal scans.
This patch adds tracking number of batches and tuples that needed
to be decompressed as part of DML operations on compressed hypertables.
These will be visible in EXPLAIN ANALYZE output like so:
QUERY PLAN
Custom Scan (HypertableModify) (actual rows=0 loops=1)
Batches decompressed: 2
Tuples decompressed: 25
-> Insert on decompress_tracking (actual rows=0 loops=1)
-> Custom Scan (ChunkDispatch) (actual rows=2 loops=1)
-> Values Scan on "*VALUES*" (actual rows=2 loops=1)
(6 rows)
When running COPY command into a compressed hypertable, we
could end up using an empty slot for filtering compressed batches.
This happens when a previously created copy buffer for a chunk
does not contain any new tuples for inserting. The fix is to
verify slots before attempting to do anything else.
Fall back to btree operator input type when it is binary compatible with
the column type and no operator for column type could be found. This
should improve performance when using column types like char or varchar
instead of text.
UPDATE query with system attributes in WHERE clause causes
server to crash. This patch fixes this issue by checking for
system attributes and handle cases only for segmentby attributes
in fill_predicate_context().
Fixes#6024
Added logical replication messages (PG14+) as markers for (partial)
decompression events (mutual compression), which makes it possible to
differentiate inserts happening as part of the decompression vs actual
inserts by the user, and filter the former out of the event stream.
While some tools may be interested in all events, synching the pure
"state" (without internal behavior) is required for others.
As of now this PR is missing tests. I wonder if anyone has a good idea
how to create an automatic test for it.
PG16 adds a new boolean parameter to the ExecInsertIndexTuples function
to denote if the index is a BRIN index, which is then used to determine
if the index update can be skipped. The fix also removes the
INDEX_ATTR_BITMAP_ALL enum value.
Adapt these changes by updating the compat function to accomodate the
new parameter added to the ExecInsertIndexTuples function and using an
alternative for the removed INDEX_ATTR_BITMAP_ALL enum value.
postgres/postgres@19d8e23
* Restore default batch context size to fix a performance regression on
sorted batch merge plans.
* Support reverse direction.
* Improve gorilla decompression by computing prefix sums of tag bitmaps
during decompression.
If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
During UPDATE/DELETE on compressed hypertables, we do a sequential
scan which can be improved by supporting index scans.
In this patch for a given UPDATE/DELETE query, if there are any
WHERE conditions specified using SEGMENT BY columns, we use index
scan to fetch all matching rows. Fetched rows will be decompressed
and moved to uncompressed chunk and a regular UPDATE/DELETE is
performed on the uncompressed chunk.
Add a function to decompress a compressed batch entirely in one go, and
use it in some query plans. As a result of decompression, produce
ArrowArrays. They will be the base for the subsequent vectorized
computation of aggregates.
As a side effect, some heavy queries to compressed hypertables speed up
by about 15%. Point queries with LIMIT 1 can regress by up to 1 ms. If
the absolute highest performace is desired for such queries, bulk
decompression can be disabled by a GUC.
This serves as a way to exercise the decompression fuzzing code, which
will be useful when we need to change the decompression functions. Also
this way we'll have a check in CI that uses libfuzzer, and it will be
easier to apply it to other areas of code in the future.
When updating and deleting the same tuple while both transactions are
running at the same time, we end up with reference leak. This is because
one of the query in a transaction fails and we take error path. However
we fail to close the table.
This patch fixes the above mentioned problem by closing the required
tables.
Fixes#5674
Bitmap heap scans are specific in that they store scan state
during node initialization. This means they would not pick up on
any data that might have been decompressed during a DML command
from the compressed chunk. To avoid this, we update the snapshot
on the node scan state and issue a rescan to update the internal state.
When JOINs were present during UPDATE/DELETE on compressed chunks
the code would decompress other hypertables that were not the
target of the UPDATE/DELETE operations and in the case of self-JOINs
potentially decompress chunks not required to be decompressed.
During UPDATE/DELETE on compressed hypertables, we iterate over plan
tree to collect all scan nodes. For each scan nodes there can be
filter conditions.
Prior to this patch we collect only first filter condition and use
for first chunk which may be wrong. In this patch as and when we
encounter a target scan node, we immediatly process those chunks.
Fixes#5640
On compressed hypertables 3 schema levels are in use simultaneously
* main - hypertable level
* chunk - inheritance level
* compressed chunk
In the build_scankeys method all of them appear - as slot have their
fields represented as a for a row of the main hypertable.
Accessing the slot by the attribut numbers of the chunks may lead to
indexing mismatches if there are differences between the schemes.
Fixes: #5577
When updating or deleting tuples from a compressed chunk, we first
need to decompress the matching tuples then proceed with the operation.
This optimization reduces the amount of data decompressed by using
compressed metadata to decompress only the affected segments.
When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
While executing compression operations in parallel with
inserting into chunks (both operations which can potentially
change the chunk status), we could get into situations where
the chunk status would end up inconsistent. This change re-reads
the chunk status after locking the chunk to make sure it can
decompress data when handling ON CONFLICT inserts correctly.
This patch does following:
1. Executor changes to parse qual ExprState to check if SEGMENTBY
column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
scan keys and move only those rows which match the WHERE clause
to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
UPDATE/DELETE operations on PG versions < 14, we don't support this
feature on PG12 and PG13.
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.
This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.
If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.
Fixes#309
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
Reindexing a relation requires AccessExclusiveLock which prevents
queries on that chunk. This patch changes decompress_chunk to update
the index during decompression instead of reindexing. This patch
does not change the required locks as there are locking adjustments
needed in other places to make it safe to weaken that lock.
It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.
Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.
From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.