76 Commits

Author SHA1 Message Date
Ante Kresic
583c36e91e Refactor compression code to reduce duplication 2023-04-20 22:27:34 +02:00
Ante Kresic
a49fdbcffb Reduce decompression during constraint checking
When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
2023-04-20 12:17:12 +02:00
Ante Kresic
84b6783a19 Fix chunk status when inserting into chunks
While executing compression operations in parallel with
inserting into chunks (both operations which can potentially
change the chunk status), we could get into situations where
the chunk status would end up inconsistent. This change re-reads
the chunk status after locking the chunk to make sure it can
decompress data when handling ON CONFLICT inserts correctly.
2023-04-12 10:50:44 +02:00
Bharathy
1fb058b199 Support UPDATE/DELETE on compressed hypertables.
This patch does following:

1. Executor changes to parse qual ExprState to check if SEGMENTBY
   column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
   scan keys and move only those rows which match the WHERE clause
   to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
   UPDATE/DELETE operations on PG versions < 14, we don't support this
   feature on PG12 and PG13.
2023-04-05 17:19:45 +05:30
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
shhnwz
699fcf48aa Stats improvement for Uncompressed Chunks
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.

Fixes #309
2023-03-22 23:51:13 +05:30
Zoltan Haindrich
790b322b24 Fix DEFAULT value handling in decompress_chunk
The sql function decompress_chunk did not filled in
default values during its operation.

Fixes #5412
2023-03-16 09:16:50 +01:00
Sven Klemm
65562f02e8 Support unique constraints on compressed chunks
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
2023-03-13 12:04:38 +01:00
Sven Klemm
c02cb76b38 Don't reindex relation during decompress_chunk
Reindexing a relation requires AccessExclusiveLock which prevents
queries on that chunk. This patch changes decompress_chunk to update
the index during decompression instead of reindexing. This patch
does not change the required locks as there are locking adjustments
needed in other places to make it safe to weaken that lock.
2023-03-13 10:58:26 +01:00
Sven Klemm
8132908c97 Refactor chunk decompression functions
Restructure the code inside decompress_chunk slightly to make core
loop reusable by other functions.
2023-02-06 14:52:06 +01:00
Sven Klemm
b229b3aefd Small decompress_chunk refactor
Refactor the decompression code to move the decompressor
initialization into a separate function.
2023-01-30 16:47:16 +01:00
Sven Klemm
dbe89644b5 Remove no longer used compression code
The recent refactoring of INSERT into compression chunk made this
code obsolete but forgot to remove it in that patch.
2023-01-16 14:18:56 +01:00
shhnwz
601b37daa8 Index support for compress chunk
It allows to override tuplesort with indexscan
if compression setting keys matches with Index keys.
Moreover this feature has Enable/Disable Toggle.
To Disable from the client use the following command,
SET timescaledb.enable_compression_indexscan = 'OFF'
2022-12-15 20:26:00 +05:30
Ante Kresic
cbf51803dd Fix index att number calculation
Attribute offset was used by mistake where attribute number was
needed causing wrong values to be fetched when scanning
compressed chunk index.
2022-12-15 11:23:10 +01:00
Matvey Arye
df16815009 Fix memory leak for compression with merge chunks
The RelationInitIndexAccessInfo call leaks cache memory and
seems to be unnecessary.
2022-12-13 08:22:49 +01:00
Alexander Kuzmenkov
1b65297ff7 Fix memory leak with INSERT into compressed hypertable
We used to allocate some temporary data in the ExecutorContext.
2022-11-16 13:58:52 +04:00
Fabrízio de Royes Mello
f1535660b0 Honor usage of OidIsValid() macro
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.

Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
2022-11-03 16:10:50 -03:00
Ante Kresic
2475c1b92f Roll up uncompressed chunks into compressed ones
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.

From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov
313845a882 Enable -Wextra
Our code mostly has warnings about comparison with different
signedness.
2022-10-27 16:06:58 +04:00
Alexander Kuzmenkov
f862212c8c Add clang-tidy warning readability-inconsistent-declaration-parameter-name
Mostly cosmetic stuff. Matched to definition automatically with
--fix-notes.
2022-10-20 19:42:11 +04:00
Bharathy
38878bee16 Fix segementation fault during INSERT into compressed hypertable.
INSERT into compressed hypertable with number of open chunks greater
than ts_guc_max_open_chunks_per_insert causes segementation fault.
New row which needs to be inserted into compressed chunk has to be
compressed. Memory required as part of compressing a row is allocated
from RowCompressor::per_row_ctx memory context. Once row is compressed,
ExecInsert() is called, where memory from same context is used to
allocate and free it instead of using "Executor State". This causes
a corruption in memory.

Fixes: #4778
2022-10-13 20:48:23 +05:30
Ante Kresic
cc110a33a2 Move ANALYZE after heap scan during compression
Depending on the statistics target, running ANALYZE on a chunk before
compression can cause a lot of random IO operations for chunks that
are bigger than the number of pages ANALYZE needs to read. By moving
that operation after the heap is loaded into memory for sorting,
we increase the chance of hitting cache and reducing disk operations
necessary to execute compression jobs.
2022-09-28 14:40:52 +02:00
Ante Kresic
9c819882f3 Increase memory usage for compression jobs
When compressing larger chunks, compression sort tends to use
temporary files since memory limits (`work_mem`) are usually
pretty small to fit all the data into memory. On the other hand,
using `maintenance_work_mem` makes more sense since its generally
safer to use a larger value without impacting general resource usage.
2022-09-28 14:40:52 +02:00
Jan Nidzwetzki
de30d190e4 Fix a deadlock in chunk decompression and SELECTs
This patch fixes a deadlock between chunk decompression and SELECT
queries executed in parallel. The change in
a608d7db614c930213dee8d6a5e9d26a0259da61 requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.

In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.

Fixes #4605
2022-09-22 14:37:14 +02:00
Sven Klemm
131773a902 Reset compression sequence when group resets
The sequence number of the compressed tuple is per segment by grouping
and should be reset when the grouping changes to prevent overflows with
many segmentby columns.
2022-08-15 13:34:00 +02:00
Sven Klemm
a6107020e6 Fix segfaults in compression code with corrupt data
Sanity check the compression header for sane algorithm
before using it as index into an array. Previously
this would result in a segfault and could happen with
corrupted compressed data.
2022-08-08 22:04:30 +02:00
Alexander Kuzmenkov
a3ef038465 Fix clang-tidy warning bugprone-macro-parentheses 2022-05-26 13:51:36 +05:30
Mats Kindahl
aaffc1d5a6 Set null vector for insert into compressed table
As part of inserting into a compressed table, the tuple is
materialized, which computes the data size for the tuple using
`heap_compute_data_size`. When computing the data size of the tuple,
columns that are null are not considered and are just ignored. Columns
that are dropped are, however, not explicitly checked and instead the
`heap_compute_data_size` rely on these columns being set to null.

When reading tuples from a compressed table for insert, the null vector
is cleared, meaning that it by default is non-null. Since columns that
are dropped are not explicitly processed, they are expected to have a
defined value, which they do not have, causing a crash when an attempt
to dereference them are made.

This commit fixes this by setting the null vector to all null, and the
code after will overwrite the columns with proper null bits, except the
dropped columns that will be considered null.

Fixes #4251
2022-04-26 17:24:02 +02:00
gayyappan
9f64df8567 Add ts_catalog subdirectory
Move files that are related to timescaledb catalog
access to this subdirectory
2022-01-24 16:58:09 -05:00
Sven Klemm
b27c9cbd47 Add missing heap_freetuple calls
This patch adds missing heap_freetuple calls in 2 locations.
The missing call in compression.c was a leak making the allocation
live for much longer than needed. This was found by coccinelle.
2021-10-26 20:48:41 +02:00
Sven Klemm
f686b2af40 Fix various windows compilation problems with PG14
The windows compiler has problems with the macros in genbki.h
complaining about redefinition of a variable with a different
storage class. Since those specific macros are processed by a
perl script and not relevant for the build process we turn them
into noops for windows.
2021-10-14 02:14:37 +02:00
Sven Klemm
265e18627b Adjust code to PG14 reindex_relation changes
PG14 changes the reindex_relation `params` argument from integer
to a struct.

https://github.com/postgres/postgres/commit/a3dc9260
2021-09-08 15:24:46 +02:00
Sven Klemm
d0426ff234 Move all compatibility related files into compat directory 2021-08-28 05:17:22 +02:00
Sven Klemm
5719c50e51 Remove TTSOps pointer macros
Remove TTSOpsVirtualP, TTSOpsHeapTupleP, TTSOpsMinimalTupleP and
TTSOpsBufferHeapTupleP macros since they were only needed on PG11
to allow us to define compatibility macros for TupleTableSlot
operations.
2021-06-03 14:34:31 +02:00
Sven Klemm
fb863f12c7 Remove support for PG11
Remove support for compiling against PostgreSQL 11. This patch also
removes PG11 specific compatibility macros.
2021-06-01 20:21:06 +02:00
gayyappan
ad25f787fc Test support for copy on distributed hypertables with compressed chunks
Add a test case for copy on distr. hypertables with compressed chunks.
verifies that recompress_chunk and compression policy work as expected.
Additional changes include:
Clean up commented code
Make use of BulkInsertState optional in row compressor
Add test for insert into compressed chunk by a different role
other than the owner
2021-05-24 18:03:47 -04:00
gayyappan
4f865f7870 Add recompress_chunk function
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.

This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).

recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
2021-05-24 18:03:47 -04:00
Sven Klemm
5f6e492474 Adjust pathkeys generation for unordered compressed chunks
Compressed chunks with inserts after being compressed have batches
that are not ordered according to compress_orderby for those
chunks we cannot set pathkeys on the DecompressChunk node and we
need an extra sort step if we require ordered output from those
chunks.
2021-05-24 18:03:47 -04:00
gayyappan
d9839b9b61 Support defaults, sequences, check constraints for compressed chunks
Support defaults, sequences and check constraints with inserts
into compressed chunks
2021-05-24 18:03:47 -04:00
gayyappan
93be235d33 Support for inserts into compressed hypertables
Add CompressRowSingleState .
This has functions to compress a single row.
2021-05-24 18:03:47 -04:00
Sven Klemm
d26c744115 Use %u to format Oid instead of %d
Since Oid is unsigned int we have to use %u to print it otherwise
oids >= 2^31 will not work correctly. This also switches the places
that print type oid to use format helper functions to resolve the
oids.
2021-04-14 21:11:20 +02:00
gayyappan
5be6a3e4e9 Support column rename for hypertables with compression enabled
ALTER TABLE <hypertable> RENAME <column_name> TO <new_column_name>
is now supported for hypertables that have compression enabled.

Note: Column renaming is not supported for distributed hypertables.
So this will not work on distributed hypertables that have
compression enabled.
2021-02-19 10:21:50 -05:00
Sven Klemm
002510cb01 Add compatibilty wrapper functions for base64 encoding/decoding
PG13 adds a destination length 4th argument to pg_b64_decode and
pg_b64_encode functions so this patch adds a macro that translates
to the 3 argument and 4 argument calls depending on postgres version.
This patch also adds checking of return values for those functions.

https://github.com/postgres/postgres/commit/cfc40d384a
2020-12-10 18:40:37 +01:00
gayyappan
05319cd424 Support analyze of internal compression table
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
2020-11-11 15:05:14 -05:00
Sven Klemm
97254783d4 Fix segfault in decompress_chunk for chunks with dropped columns
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
2020-11-10 10:13:45 +01:00
Brian Rowe
5acf3343b5 Ensure reltuples are preserved during compression
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path.  It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation.  It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.

Fixes #2524
2020-10-19 07:21:38 -07:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
Sven Klemm
c90397fd6a Remove support for PG9.6 and PG10
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:

remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
2020-06-02 23:48:35 +02:00
Stephen Polcyn
b57d2ac388 Cleanup TODOs and FIXMEs
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.

test/sql/
- triggers.sql: Made required change

tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete

tsl/src/
- compression/compression.c:
  - row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
  - compressor_finish: Removed (obsolete)

src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
  - process_altertable_end_subcmd: Removed (PG will handle case)
2020-05-18 20:16:03 -04:00
Ruslan Fomkin
ed32d093dc Use table_open/close and PG aggregated directive
Fixing more places to use table_open and table_close introduced in
PG12. Unifies PG version directives to use aggregated macro.
2020-04-14 23:12:15 +02:00