This patch drops the catalog table _timescaledb_catalog.hypertable_compression
and stores those settings in _timescaledb_catalog.compression_settings instead.
The storage format is changed and the new table will have 1 entry per relation
instead of 1 entry per column and has no dependancy on hypertables.
All other aspects of compression will remain the same. This is refactoring is
to enable per chunk compression settings in a follow-up patch.
Refactor the compression code to only use the table scan API when
scanning relations instead of a mix of the table and heap scan
APIs. The table scan API is a higher-level API and recommended as it
works for any type of relation and uses table slots directly, which
means that in some cases a full heap tuple need not be materialized.
If we reuse the compressor to recompress multiple sets
of tuples, internal state gets left behind from the previous
run which can contain invalid data. Resetting the compressor
first iteration field between runs fixes this.
This should improve the throughput somewhat.
This commit does several things:
* Simplify loop condition in decompressing the compressed batch by using
the count metadata column.
* Split out a separate function that decompresses the entire compressed
batch and saves the decompressed tuples slot into RowDecompressor.
* Use bulk table insert function for inserting the decompressed rows,
this reduces WAL activity. If we have indexes on uncompressed chunk,
update them one index for entire batch at a time, to reduce load on
shared buffers cache. Before that we used to update all indexes for one
row, then for another, etc.
* Add a test for memory leaks during (de)compression.
* Update the compression_update_delete test to use INFO messages + a
debug GUC instead of DEBUG messages which are flaky.
This gives 10%-30% speedup on tsbench for decompress_chunk and various
compressed DML queries. This is very far from the performance we had in
2.10, but still a nice improvement.
When we compress a chunk, we create a new compressed chunk for storing
the compressed data. So far, the tuples were just inserted into the
compressed chunk and frozen by a later vacuum run.
However, freezing tuples causes WAL activity can be optimized because
the compressed chunk is created in the same transaction as the tuples.
This patch reduces the WAL activity by storing these tuples directly as
frozen and preventing a freeze operation in the future. This approach is
similar to PostgreSQL's COPY FREEZE.
Since version 2.11.0 we would get a segmentation fault during
decompression when there was an expressional or partial index on the
uncompressed chunk.
This patch fixes this by calling ExecInsertIndexTuples to insert into
indexes during chunk decompression, instead of CatalogIndexInsert.
In addition, when enabling compression on a hypertable, we check the
unique indexes defined on it to provide performance improvement hints
in case the unique index columns are not specified as compression
parameters.
However this check threw an error when expression columns were present
in the index, preventing the user from enabling compression.
This patch fixes this by simply ignoring the expression columns in the
index, since we cannot currently segment by an expression.
Fixes#6205, #6186
`_timescaledb_internal.create_compressed_chunk` can be used to create a
compressed chunk with existing compressed data. It did not account for
the fact that the chunk can contain uncompressed data, in which case the
chunk status must be set to partial.
Fixes#5946
This PR adds several GUCs which allow to enable/disable major
timescaledb features:
- enable_hypertable_create
- enable_hypertable_compression
- enable_cagg_create
- enable_policy_create
If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
During chunk creation, the chunk's dimensional CHECK constraints are
created via an "upcall" to PL/pgSQL code. However, creating
dimensional constraints in PL/pgSQL code sometimes fails, especially
during high-concurrency inserts, because PL/pgSQL code scans metadata
using a snapshot that might not see the same metadata as the C
code. As a result, chunk creation sometimes fail during constraint
creation.
To fix this issue, implement dimensional CHECK-constraint creation in
C code. Other constraints (FK, PK, etc.) are still created via an
upcall, but should probably also be rewritten in C. However, since
these constraints don't depend on recently updated metadata, this is
left to a future change.
Fixes#5456
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.
This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.
If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.
Fixes#309
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.
Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.
From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
Depending on the statistics target, running ANALYZE on a chunk before
compression can cause a lot of random IO operations for chunks that
are bigger than the number of pages ANALYZE needs to read. By moving
that operation after the heap is loaded into memory for sorting,
we increase the chance of hitting cache and reducing disk operations
necessary to execute compression jobs.
This patch fixes a deadlock between chunk decompression and SELECT
queries executed in parallel. The change in
a608d7db614c930213dee8d6a5e9d26a0259da61 requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.
In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.
Fixes#4605
This patch introduces a further check to compress_chunk_impl and
decompress_chunk_impl. After all locks are acquired, a check is made
to see if the chunk is still (un-)compressed. If the chunk was
(de-)compressed while waiting for the locks, the (de-)compression
operation is stopped.
In addition, the chunk locks in decompress_chunk_impl
are upgraded to AccessExclusiveLock to ensure the chunk is not deleted
while other transactions are using it.
Fixes: #4480
A chunk in frozen state cannot be dropped.
drop_chunks will skip over frozen chunks without erroring.
Internal api , drop_chunk will error if you attempt to drop
a chunk without unfreezing it.
This PR also adds a new internal API to unfreeze a chunk.
The compression code had 2 files using the same header guard. This
patch renames the file with floating point helper functions to
float_utils.h and renames the other file to compression/api since
that more clearly reflects the purpose of the functions.