106 Commits

Author SHA1 Message Date
Bharathy
38878bee16 Fix segementation fault during INSERT into compressed hypertable.
INSERT into compressed hypertable with number of open chunks greater
than ts_guc_max_open_chunks_per_insert causes segementation fault.
New row which needs to be inserted into compressed chunk has to be
compressed. Memory required as part of compressing a row is allocated
from RowCompressor::per_row_ctx memory context. Once row is compressed,
ExecInsert() is called, where memory from same context is used to
allocate and free it instead of using "Executor State". This causes
a corruption in memory.

Fixes: #4778
2022-10-13 20:48:23 +05:30
Ante Kresic
cc110a33a2 Move ANALYZE after heap scan during compression
Depending on the statistics target, running ANALYZE on a chunk before
compression can cause a lot of random IO operations for chunks that
are bigger than the number of pages ANALYZE needs to read. By moving
that operation after the heap is loaded into memory for sorting,
we increase the chance of hitting cache and reducing disk operations
necessary to execute compression jobs.
2022-09-28 14:40:52 +02:00
Ante Kresic
9c819882f3 Increase memory usage for compression jobs
When compressing larger chunks, compression sort tends to use
temporary files since memory limits (`work_mem`) are usually
pretty small to fit all the data into memory. On the other hand,
using `maintenance_work_mem` makes more sense since its generally
safer to use a larger value without impacting general resource usage.
2022-09-28 14:40:52 +02:00
Jan Nidzwetzki
de30d190e4 Fix a deadlock in chunk decompression and SELECTs
This patch fixes a deadlock between chunk decompression and SELECT
queries executed in parallel. The change in
a608d7db614c930213dee8d6a5e9d26a0259da61 requests an AccessExclusiveLock
for the decompressed chunk instead of the compressed chunk, resulting in
deadlocks.

In addition, an isolation test has been added to test that SELECT
queries on a chunk that is currently decompressed can be executed.

Fixes #4605
2022-09-22 14:37:14 +02:00
Sven Klemm
131773a902 Reset compression sequence when group resets
The sequence number of the compressed tuple is per segment by grouping
and should be reset when the grouping changes to prevent overflows with
many segmentby columns.
2022-08-15 13:34:00 +02:00
Sven Klemm
a6107020e6 Fix segfaults in compression code with corrupt data
Sanity check the compression header for sane algorithm
before using it as index into an array. Previously
this would result in a segfault and could happen with
corrupted compressed data.
2022-08-08 22:04:30 +02:00
Alexander Kuzmenkov
a3ef038465 Fix clang-tidy warning bugprone-macro-parentheses 2022-05-26 13:51:36 +05:30
Mats Kindahl
aaffc1d5a6 Set null vector for insert into compressed table
As part of inserting into a compressed table, the tuple is
materialized, which computes the data size for the tuple using
`heap_compute_data_size`. When computing the data size of the tuple,
columns that are null are not considered and are just ignored. Columns
that are dropped are, however, not explicitly checked and instead the
`heap_compute_data_size` rely on these columns being set to null.

When reading tuples from a compressed table for insert, the null vector
is cleared, meaning that it by default is non-null. Since columns that
are dropped are not explicitly processed, they are expected to have a
defined value, which they do not have, causing a crash when an attempt
to dereference them are made.

This commit fixes this by setting the null vector to all null, and the
code after will overwrite the columns with proper null bits, except the
dropped columns that will be considered null.

Fixes #4251
2022-04-26 17:24:02 +02:00
gayyappan
9f64df8567 Add ts_catalog subdirectory
Move files that are related to timescaledb catalog
access to this subdirectory
2022-01-24 16:58:09 -05:00
Sven Klemm
b27c9cbd47 Add missing heap_freetuple calls
This patch adds missing heap_freetuple calls in 2 locations.
The missing call in compression.c was a leak making the allocation
live for much longer than needed. This was found by coccinelle.
2021-10-26 20:48:41 +02:00
Sven Klemm
f686b2af40 Fix various windows compilation problems with PG14
The windows compiler has problems with the macros in genbki.h
complaining about redefinition of a variable with a different
storage class. Since those specific macros are processed by a
perl script and not relevant for the build process we turn them
into noops for windows.
2021-10-14 02:14:37 +02:00
Sven Klemm
265e18627b Adjust code to PG14 reindex_relation changes
PG14 changes the reindex_relation `params` argument from integer
to a struct.

https://github.com/postgres/postgres/commit/a3dc9260
2021-09-08 15:24:46 +02:00
Sven Klemm
d0426ff234 Move all compatibility related files into compat directory 2021-08-28 05:17:22 +02:00
Sven Klemm
5719c50e51 Remove TTSOps pointer macros
Remove TTSOpsVirtualP, TTSOpsHeapTupleP, TTSOpsMinimalTupleP and
TTSOpsBufferHeapTupleP macros since they were only needed on PG11
to allow us to define compatibility macros for TupleTableSlot
operations.
2021-06-03 14:34:31 +02:00
Sven Klemm
fb863f12c7 Remove support for PG11
Remove support for compiling against PostgreSQL 11. This patch also
removes PG11 specific compatibility macros.
2021-06-01 20:21:06 +02:00
gayyappan
ad25f787fc Test support for copy on distributed hypertables with compressed chunks
Add a test case for copy on distr. hypertables with compressed chunks.
verifies that recompress_chunk and compression policy work as expected.
Additional changes include:
Clean up commented code
Make use of BulkInsertState optional in row compressor
Add test for insert into compressed chunk by a different role
other than the owner
2021-05-24 18:03:47 -04:00
gayyappan
4f865f7870 Add recompress_chunk function
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.

This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).

recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
2021-05-24 18:03:47 -04:00
Sven Klemm
5f6e492474 Adjust pathkeys generation for unordered compressed chunks
Compressed chunks with inserts after being compressed have batches
that are not ordered according to compress_orderby for those
chunks we cannot set pathkeys on the DecompressChunk node and we
need an extra sort step if we require ordered output from those
chunks.
2021-05-24 18:03:47 -04:00
gayyappan
d9839b9b61 Support defaults, sequences, check constraints for compressed chunks
Support defaults, sequences and check constraints with inserts
into compressed chunks
2021-05-24 18:03:47 -04:00
gayyappan
93be235d33 Support for inserts into compressed hypertables
Add CompressRowSingleState .
This has functions to compress a single row.
2021-05-24 18:03:47 -04:00
Sven Klemm
d26c744115 Use %u to format Oid instead of %d
Since Oid is unsigned int we have to use %u to print it otherwise
oids >= 2^31 will not work correctly. This also switches the places
that print type oid to use format helper functions to resolve the
oids.
2021-04-14 21:11:20 +02:00
gayyappan
5be6a3e4e9 Support column rename for hypertables with compression enabled
ALTER TABLE <hypertable> RENAME <column_name> TO <new_column_name>
is now supported for hypertables that have compression enabled.

Note: Column renaming is not supported for distributed hypertables.
So this will not work on distributed hypertables that have
compression enabled.
2021-02-19 10:21:50 -05:00
Sven Klemm
002510cb01 Add compatibilty wrapper functions for base64 encoding/decoding
PG13 adds a destination length 4th argument to pg_b64_decode and
pg_b64_encode functions so this patch adds a macro that translates
to the 3 argument and 4 argument calls depending on postgres version.
This patch also adds checking of return values for those functions.

https://github.com/postgres/postgres/commit/cfc40d384a
2020-12-10 18:40:37 +01:00
gayyappan
05319cd424 Support analyze of internal compression table
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
2020-11-11 15:05:14 -05:00
Sven Klemm
97254783d4 Fix segfault in decompress_chunk for chunks with dropped columns
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
2020-11-10 10:13:45 +01:00
Brian Rowe
5acf3343b5 Ensure reltuples are preserved during compression
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path.  It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation.  It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.

Fixes #2524
2020-10-19 07:21:38 -07:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
Sven Klemm
c90397fd6a Remove support for PG9.6 and PG10
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:

remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
2020-06-02 23:48:35 +02:00
Stephen Polcyn
b57d2ac388 Cleanup TODOs and FIXMEs
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.

test/sql/
- triggers.sql: Made required change

tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete

tsl/src/
- compression/compression.c:
  - row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
  - compressor_finish: Removed (obsolete)

src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
  - process_altertable_end_subcmd: Removed (PG will handle case)
2020-05-18 20:16:03 -04:00
Ruslan Fomkin
ed32d093dc Use table_open/close and PG aggregated directive
Fixing more places to use table_open and table_close introduced in
PG12. Unifies PG version directives to use aggregated macro.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
1ddc62eb5f Refactor header inclusion
Correcting conditions in #ifdefs, adding missing includes, removing
and rearranging existing includes, replacing PG12 with PG12_GE for
forward compatibility. Fixed number of places with relation_close to
table_close, which were missed earlier.
2020-04-14 23:12:15 +02:00
Joshua Lockerman
949b88ef2e Initial support for PostgreSQL 12
This change includes a major refactoring to support PostgreSQL
12. Note that many tests aren't passing at this point. Changes
include, but are not limited to:

- Handle changes related to table access methods
- New way to expand hypertables since expansion has changed in
  PostgreSQL 12 (more on this below).
- Handle changes related to table expansion for UPDATE/DELETE
- Fixes for various TimescaleDB optimizations that were affected by
  planner changes in PostgreSQL (gapfill, first/last, etc.)

Before PostgreSQL 12, planning was organized something like as
follows:

 1. construct add `RelOptInfo` for base and appendrels
 2. add restrict info, joins, etc.
 3. perform the actual planning with `make_one_rel`

For our optimizations we would expand hypertables in the middle of
step 1; since nothing in the query planner before `make_one_rel` cared
about the inheritance children, we didn’t have to be too precises
about where we were doing it.

However, with PG12, and the optimizations around declarative
partitioning, PostgreSQL now does care about when the children are
expanded, since it wants as much information as possible to perform
partition-pruning. Now planning is organized like:

 1. construct add RelOptInfo for base rels only
 2. add restrict info, joins, etc.
 3. expand appendrels, removing irrelevant declarative partitions
 4. perform the actual planning with make_one_rel

Step 3 always expands appendrels, so when we also expand them during
step 1, the hypertable gets expanded twice, and things in the planner
break.

The changes to support PostgreSQL 12 attempts to solve this problem by
keeping the hypertable root marked as a non-inheritance table until
`make_one_rel` is called, and only then revealing to PostgreSQL that
it does in fact have inheritance children. While this strategy entails
the least code change on our end, the fact that the first hook we can
use to re-enable inheritance is `set_rel_pathlist_hook` it does entail
a number of annoyances:

 1. this hook is called after the sizes of tables are calculated, so we
    must recalculate the sizes of all hypertables, as they will not
    have taken the chunk sizes into account
 2. the table upon which the hook is called will have its paths planned
    under the assumption it has no inheritance children, so if it's a
    hypertable we have to replan it's paths

Unfortunately, the code for doing these is static, so we need to copy
them into our own codebase, instead of just using PostgreSQL's.

In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also
changed and are now planned in two stages:

- In stage 1, the statement is planned as if it was a `SELECT` and all
  leaf tables are discovered.
- In stage 2, the original query is planned against each leaf table,
  discovered in stage 1, directly, not part of an Append.

Unfortunately, this means we cannot look in the appendrelinfo during
UPDATE/DELETE planning, in particular to determine if a table is a
chunk, as the appendrelinfo is not at the point we wish to do so
initialized. This has consequences for how we identify operations on
chunks (sometimes for blocking and something for enabling
functionality).
2020-04-14 23:12:15 +02:00
Erik Nordström
a4fb0cec3f Cleanup compression-related errors
This change fixes a number of typos and issues with inconsistent
formatting for compression-related code. A couple of other fixes for
variable names, etc. have also been applied.
2020-03-11 13:27:16 +01:00
Joshua Lockerman
07841670a7 Fix issues discovered by coverity
This commit fixes issues reported by coverity. Of these, the only real
issue is an integer overflow in bitarray, which can never happen in its
current usages. This also adds a PG_USED_FOR_ASSERTS_ONLY for a
variable only used for Assert.
2019-10-29 19:02:58 -04:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
6687189a6c Free memory earlier in decompress_chunk
This was supposed to be part of an earlier commit, but seems to have
been lost. This should reduce peak memory usage of that function.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
fac8eca0b3 Free Memory Earlier in decompress_chunk
This commit alters decompress_chunk to free memory as soon as possible
instead of waiting until the function ends. This should decrease peak
memory usage from roughly the size of the dataset to roughly the size
of the a single compressed row.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
0606aeba9e Reduce Peak Memory Usage for compress_chunk
Before this PR some state (most notably deTOASTed values) would persist
across compressed rows during compress_chunk, despite the fact that
they were no longer needed. This increased peak memory usage of
compress_chunk. This commit adds a MemoryContext that is reset after
each compressed row is inserted, ensuring that state needed for only
one row does not hang around longer than needed.
2019-10-29 19:02:58 -04:00
Matvey Arye
6465a4e85a Switch to using get_attnum function
This is a fix for a rebase on master since `attno_find_by_attname`
was removed.
2019-10-29 19:02:58 -04:00
Matvey Arye
8250714a29 Add fixes for Windows
- Fix declaration of functions wrt TSDLLEXPORT consistency
- Empty structs need to be created with '{ 0 }' syntax.
- Alignment sentinels have to use uint64 instead of a struct
  with a 0-size member
- Add some more ORDER BY clauses in the tests to constrain
  the order of results
- Add ANALYZE after running compression in
  transparent-decompression test
2019-10-29 19:02:58 -04:00
Matvey Arye
5c891f732e Add sequence id metadata col to compressed table
Add a sequence id to the compressed table. This id increments
monotonically for each compressed row in a way that follows
the order by clause. We leave gaps to allow for the
possibility to fill in rows due to e.g. inserts down
the line.

The sequence id is global to the entire chunk and does not reset
for each segment-by-group-change since this has the potential
to allow some micro optimizations when ordering by a segment by
columns as well.

The sequence number is a INT32, which allows up to 200 billion
uncompressed rows per chunk to be supported (assuming 1000 rows
per compressed row and a gap of 10). Overflow is checked in the
code and will error if this is breached.
2019-10-29 19:02:58 -04:00
Matvey Arye
b4a7108492 Integrate segment meta into compression
This commit integrates the SegmentMetaMinMax into the
compression logic. It adds metadata columns to the compressed table
and correctly sets it upon compression.

We also fix several errors with datum detoasting in SegmentMetaMinMax
2019-10-29 19:02:58 -04:00
Matvey Arye
be199bec70 Add type cache
Add a type cache to get the OID corresponding to a particular
defined SQL type.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
8b273a5187 Fix flush when num-rows overflow
We should only free the segment-bys when we're changing groups not when
we've got too many rows to compress, in that case we'll need them.
2019-10-29 19:02:58 -04:00
gayyappan
6832ed2ca5 Modify storage type for toast columns
This PR modifies the toast type for compressed columns based on
the algorithm used for compression.
2019-10-29 19:02:58 -04:00
Matvey Arye
0059360522 Fix indexes during compression and decompression
This rebuilds indexes during compression and decompression. Previously,
indexes were not updated during these operations. We also fix
a small bug with orderby and segmentby handling of empty strings/
lists.

Finally, we add some more tests.
2019-10-29 19:02:58 -04:00
Matvey Arye
cdf6fcb69a Allow altering compression options
We now allow changing the compression options on a hypertable
as long as there are no existing compressed chunks.
2019-10-29 19:02:58 -04:00
Matvey Arye
f6573f9247 Add a metadata count column to compressed table
This is useful, if some or all compressed columns are NULL.
The count reflects the number of uncompressed rows that are
in the compressed row. Stored as a 32-bit integer.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
Sven Klemm
bdc599793c Add helper function to get decompression iterator init function 2019-10-29 19:02:58 -04:00