24 Commits

Author SHA1 Message Date
Sven Klemm
83cf5605a5 Remove multinode functionality from compression 2023-12-13 23:38:32 +01:00
Sven Klemm
36c36564a8 Refactor compression setting storage
This patch drops the catalog table _timescaledb_catalog.hypertable_compression
and stores those settings in _timescaledb_catalog.compression_settings instead.
The storage format is changed and the new table will have 1 entry per relation
instead of 1 entry per column and has no dependancy on hypertables.
All other aspects of compression will remain the same. This is refactoring is
to enable per chunk compression settings in a follow-up patch.
2023-12-12 21:45:33 +01:00
Erik Nordström
e8b81c2ebe Use table scan API in compression code
Refactor the compression code to only use the table scan API when
scanning relations instead of a mix of the table and heap scan
APIs. The table scan API is a higher-level API and recommended as it
works for any type of relation and uses table slots directly, which
means that in some cases a full heap tuple need not be materialized.
2023-12-12 14:40:52 +01:00
Fabrízio de Royes Mello
48c9edda9c Use NameStr() macro for NameData struct usage 2023-12-09 16:23:01 -03:00
Ante Kresic
645727bfe1 Optimize segmentwise recompression
Instead of recompressing all the segments, try to find segments
which have uncompressed tuples and only recompress those segments.
2023-11-28 10:51:28 +01:00
Ante Kresic
5f421eca9e Reset compressor between recompression runs
If we reuse the compressor to recompress multiple sets
of tuples, internal state gets left behind from the previous
run which can contain invalid data. Resetting the compressor
first iteration field between runs fixes this.
2023-11-24 11:40:49 +01:00
Alexander Kuzmenkov
e5840e3d45 Use bulk heap insert API for decompression
This should improve the throughput somewhat.

This commit does several things:

* Simplify loop condition in decompressing the compressed batch by using
  the count metadata column.
* Split out a separate function that decompresses the entire compressed
  batch and saves the decompressed tuples slot into RowDecompressor.
* Use bulk table insert function for inserting the decompressed rows,
  this reduces WAL activity.  If we have indexes on uncompressed chunk,
update them one index for entire batch at a time, to reduce load on
shared buffers cache. Before that we used to update all indexes for one
row, then for another, etc.
* Add a test for memory leaks during (de)compression.
* Update the compression_update_delete test to use INFO messages + a
  debug GUC instead of DEBUG messages which are flaky.

This gives 10%-30% speedup on tsbench for decompress_chunk and various
compressed DML queries. This is very far from the performance we had in
2.10, but still a nice improvement.
2023-11-16 14:57:06 +01:00
Jan Nidzwetzki
8767de658b Reduce WAL activity by freezing tuples immediately
When we compress a chunk, we create a new compressed chunk for storing
the compressed data. So far, the tuples were just inserted into the
compressed chunk and frozen by a later vacuum run.

However, freezing tuples causes WAL activity can be optimized because
the compressed chunk is created in the same transaction as the tuples.
This patch reduces the WAL activity by storing these tuples directly as
frozen and preventing a freeze operation in the future. This approach is
similar to PostgreSQL's COPY FREEZE.
2023-10-25 13:27:07 +02:00
Konstantina Skovola
7b7722e241 Fix index inserts during decompression
Since version 2.11.0 we would get a segmentation fault during
decompression when there was an expressional or partial index on the
uncompressed chunk.
This patch fixes this by calling ExecInsertIndexTuples to insert into
indexes during chunk decompression, instead of CatalogIndexInsert.

In addition, when enabling compression on a hypertable, we check the
unique indexes defined on it to provide performance improvement hints
in case the unique index columns are not specified as compression
parameters.
However this check threw an error when expression columns were present
in the index, preventing the user from enabling compression.
This patch fixes this by simply ignoring the expression columns in the
index, since we cannot currently segment by an expression.

Fixes #6205, #6186
2023-10-24 16:51:57 +03:00
James Guthrie
01e480d5d6 Account for uncompressed rows in 'create_compressed_chunk'
`_timescaledb_internal.create_compressed_chunk` can be used to create a
compressed chunk with existing compressed data. It did not account for
the fact that the chunk can contain uncompressed data, in which case the
chunk status must be set to partial.

Fixes #5946
2023-08-28 12:53:14 +02:00
Dmitry Simonenko
7aeed663b9 Feature flags for TimescaleDB features
This PR adds several GUCs which allow to enable/disable major
timescaledb features:

- enable_hypertable_create
- enable_hypertable_compression
- enable_cagg_create
- enable_policy_create
2023-08-07 10:11:01 +03:00
Ante Kresic
fb0df1ae4e Insert into indexes during chunk compression
If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
2023-06-26 09:37:12 +02:00
Zoltan Haindrich
b2132f00a7 Make validate_chunk_status accept Chunk as argument
This makes the calls to this method more straightforward and could
help to do better checks inside the method.
2023-06-22 11:47:31 +02:00
Konstantina Skovola
df70f3e050 Remove unused variable in tsl_get_compressed_chunk_index_for_recompression
Commit 72c0f5b25e569015aacb98cc1be3169a1720116d introduced
an unused variable. This patch removes it.
2023-04-06 10:58:57 +03:00
Erik Nordström
a51d21efbe Fix issue creating dimensional constraints
During chunk creation, the chunk's dimensional CHECK constraints are
created via an "upcall" to PL/pgSQL code. However, creating
dimensional constraints in PL/pgSQL code sometimes fails, especially
during high-concurrency inserts, because PL/pgSQL code scans metadata
using a snapshot that might not see the same metadata as the C
code. As a result, chunk creation sometimes fail during constraint
creation.

To fix this issue, implement dimensional CHECK-constraint creation in
C code. Other constraints (FK, PK, etc.) are still created via an
upcall, but should probably also be rewritten in C. However, since
these constraints don't depend on recently updated metadata, this is
left to a future change.

Fixes #5456
2023-03-24 10:55:08 +01:00
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
shhnwz
699fcf48aa Stats improvement for Uncompressed Chunks
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.

Fixes #309
2023-03-22 23:51:13 +05:30
Fabrízio de Royes Mello
f1535660b0 Honor usage of OidIsValid() macro
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.

Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
2022-11-03 16:10:50 -03:00
Ante Kresic
2475c1b92f Roll up uncompressed chunks into compressed ones
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.

From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
2022-11-02 15:14:18 +01:00
Ante Kresic
cc110a33a2 Move ANALYZE after heap scan during compression
Depending on the statistics target, running ANALYZE on a chunk before
compression can cause a lot of random IO operations for chunks that
are bigger than the number of pages ANALYZE needs to read. By moving
that operation after the heap is loaded into memory for sorting,
we increase the chance of hitting cache and reducing disk operations
necessary to execute compression jobs.
2022-09-28 14:40:52 +02:00
Jan Nidzwetzki
de30d190e4 Fix a deadlock in chunk decompression and SELECTs
This patch fixes a deadlock between chunk decompression and SELECT
queries executed in parallel. The change in
a608d7db614c930213dee8d6a5e9d26a0259da61 requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.

In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.

Fixes #4605
2022-09-22 14:37:14 +02:00
Jan Nidzwetzki
a608d7db61 Fix race conditions during chunk (de)compression
This patch introduces a further check to compress_chunk_impl and
decompress_chunk_impl. After all locks are acquired, a check is made
to see if the chunk is still (un-)compressed. If the chunk was
(de-)compressed while waiting for the locks, the (de-)compression
operation is stopped.

In addition, the chunk locks in decompress_chunk_impl
are upgraded to AccessExclusiveLock to ensure the chunk is not deleted
while other transactions are using it.

Fixes: #4480
2022-07-05 15:13:10 +02:00
gayyappan
6c20e74674 Block drop chunk if chunk is in frozen state
A chunk in frozen state cannot be dropped.
drop_chunks will skip over frozen chunks without erroring.
Internal api , drop_chunk will error if you attempt to  drop
a chunk without unfreezing it.

This PR also adds a new internal API to unfreeze a chunk.
2022-06-30 09:56:50 -04:00
Sven Klemm
048d86e7e7 Fix duplicate header guard
The compression code had 2 files using the same header guard. This
patch renames the file with floating point helper functions to
float_utils.h and renames the other file to compression/api since
that more clearly reflects the purpose of the functions.
2022-06-20 09:03:02 +02:00