When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
While executing compression operations in parallel with
inserting into chunks (both operations which can potentially
change the chunk status), we could get into situations where
the chunk status would end up inconsistent. This change re-reads
the chunk status after locking the chunk to make sure it can
decompress data when handling ON CONFLICT inserts correctly.
This patch does following:
1. Executor changes to parse qual ExprState to check if SEGMENTBY
column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
scan keys and move only those rows which match the WHERE clause
to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
UPDATE/DELETE operations on PG versions < 14, we don't support this
feature on PG12 and PG13.
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.
This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.
If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.
Fixes#309
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
Reindexing a relation requires AccessExclusiveLock which prevents
queries on that chunk. This patch changes decompress_chunk to update
the index during decompression instead of reindexing. This patch
does not change the required locks as there are locking adjustments
needed in other places to make it safe to weaken that lock.
It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.
Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.
From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
INSERT into compressed hypertable with number of open chunks greater
than ts_guc_max_open_chunks_per_insert causes segementation fault.
New row which needs to be inserted into compressed chunk has to be
compressed. Memory required as part of compressing a row is allocated
from RowCompressor::per_row_ctx memory context. Once row is compressed,
ExecInsert() is called, where memory from same context is used to
allocate and free it instead of using "Executor State". This causes
a corruption in memory.
Fixes: #4778
Depending on the statistics target, running ANALYZE on a chunk before
compression can cause a lot of random IO operations for chunks that
are bigger than the number of pages ANALYZE needs to read. By moving
that operation after the heap is loaded into memory for sorting,
we increase the chance of hitting cache and reducing disk operations
necessary to execute compression jobs.
When compressing larger chunks, compression sort tends to use
temporary files since memory limits (`work_mem`) are usually
pretty small to fit all the data into memory. On the other hand,
using `maintenance_work_mem` makes more sense since its generally
safer to use a larger value without impacting general resource usage.
This patch fixes a deadlock between chunk decompression and SELECT
queries executed in parallel. The change in
a608d7db614c930213dee8d6a5e9d26a0259da61 requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.
In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.
Fixes#4605
The sequence number of the compressed tuple is per segment by grouping
and should be reset when the grouping changes to prevent overflows with
many segmentby columns.
Sanity check the compression header for sane algorithm
before using it as index into an array. Previously
this would result in a segfault and could happen with
corrupted compressed data.
As part of inserting into a compressed table, the tuple is
materialized, which computes the data size for the tuple using
`heap_compute_data_size`. When computing the data size of the tuple,
columns that are null are not considered and are just ignored. Columns
that are dropped are, however, not explicitly checked and instead the
`heap_compute_data_size` rely on these columns being set to null.
When reading tuples from a compressed table for insert, the null vector
is cleared, meaning that it by default is non-null. Since columns that
are dropped are not explicitly processed, they are expected to have a
defined value, which they do not have, causing a crash when an attempt
to dereference them are made.
This commit fixes this by setting the null vector to all null, and the
code after will overwrite the columns with proper null bits, except the
dropped columns that will be considered null.
Fixes#4251
This patch adds missing heap_freetuple calls in 2 locations.
The missing call in compression.c was a leak making the allocation
live for much longer than needed. This was found by coccinelle.
The windows compiler has problems with the macros in genbki.h
complaining about redefinition of a variable with a different
storage class. Since those specific macros are processed by a
perl script and not relevant for the build process we turn them
into noops for windows.
Remove TTSOpsVirtualP, TTSOpsHeapTupleP, TTSOpsMinimalTupleP and
TTSOpsBufferHeapTupleP macros since they were only needed on PG11
to allow us to define compatibility macros for TupleTableSlot
operations.
Add a test case for copy on distr. hypertables with compressed chunks.
verifies that recompress_chunk and compression policy work as expected.
Additional changes include:
Clean up commented code
Make use of BulkInsertState optional in row compressor
Add test for insert into compressed chunk by a different role
other than the owner
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.
This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).
recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
Compressed chunks with inserts after being compressed have batches
that are not ordered according to compress_orderby for those
chunks we cannot set pathkeys on the DecompressChunk node and we
need an extra sort step if we require ordered output from those
chunks.
Since Oid is unsigned int we have to use %u to print it otherwise
oids >= 2^31 will not work correctly. This also switches the places
that print type oid to use format helper functions to resolve the
oids.
ALTER TABLE <hypertable> RENAME <column_name> TO <new_column_name>
is now supported for hypertables that have compression enabled.
Note: Column renaming is not supported for distributed hypertables.
So this will not work on distributed hypertables that have
compression enabled.
PG13 adds a destination length 4th argument to pg_b64_decode and
pg_b64_encode functions so this patch adds a macro that translates
to the 3 argument and 4 argument calls depending on postgres version.
This patch also adds checking of return values for those functions.
https://github.com/postgres/postgres/commit/cfc40d384a
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.
2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path. It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation. It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.
Fixes#2524
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:
remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.
test/sql/
- triggers.sql: Made required change
tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete
tsl/src/
- compression/compression.c:
- row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
- compressor_finish: Removed (obsolete)
src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
- process_altertable_end_subcmd: Removed (PG will handle case)