This change continues the refactor of transparent decompression in
order to make it more modular. A new structure called
DecompressContext is introduced that holds the state necessary to do
execution-time decompression. The context can be passed around to
related code without passing on the full DecompressChunkState
node. However, this is not yet fully realized and DecompressChunkState
is still passed around across many modules. This will be addressed in
follow-on changes.
Refactor DecompressChunk to make it more modular. This is the first
step of a bigger refactor that aims to create better separation of
concerns across the code that implements transparent decompression.
Currently, the code is only semi-modular, and all state is kept
directly in the DecompressChunk scan node. As a result, this object
has to be passed into all the related modules, which makes it harder
to call that code from other places without a reference to the "big"
scan node.
The goal of this change (and upcoming changes) is to make each module
hold its own state relevant to that module. DecompressChunk can then
be composed of these distinct modules instead of being a monolith.
This is a small refactoring to create a new old format Continuous
Aggregate named `conditions_summary_weekly` instead of drop and create
again the same `conditions_summary_daily`.
It will be necessary for the upcoming patch for the removal of the old
format where we'll need to manually create old format Continuous
Aggregates by restoring it from an SQL script in order to keep this
regression tests working.
Signed-off-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
This release contains performance improvements, an improved hypertable DDL API
and bug fixes since the 2.12.2 release. We recommend that you upgrade at the next
available opportunity.
In addition, it includes these noteworthy features:
* Full PostgreSQL 16 support for all existing features
* Vectorized aggregation execution for sum()
* Track chunk creation time used in retention/compression policies
**Deprecation notice: Multi-node support**
TimescaleDB 2.13 is the last version that will include multi-node support. Multi-node
support in 2.13 is available for PostgreSQL 13, 14 and 15. Learn more about it
[here](docs/MultiNodeDeprecation.md).
If you want to migrate from multi-node TimescaleDB to single-node TimescaleDB read the
[migration documentation](https://docs.timescale.com/migrate/latest/multi-node-to-timescale-service/).
**PostgreSQL 13 deprecation announcement**
We will continue supporting PostgreSQL 13 until April 2024. Sooner to that time, we will
announce the specific version of TimescaleDB in which PostgreSQL 13 support will not be
included going forward.
**Starting from TimescaleDB 2.13.0**
* No Amazon Machine Images (AMI) are published. If you previously used AMI, please
use another [installation method](https://docs.timescale.com/self-hosted/latest/install/)
* Continuous Aggregates are materialized only (non-realtime) by default
**Features**
* #5575 Add chunk-wise sorted paths for compressed chunks
* #5761 Simplify hypertable DDL API
* #5890 Reduce WAL activity by freezing compressed tuples immediately
* #6050 Vectorized aggregation execution for sum()
* #6062 Add metadata for chunk creation time
* #6077 Make Continous Aggregates materialized only (non-realtime) by default
* #6177 Change show_chunks/drop_chunks using chunk creation time
* #6178 Show batches/tuples decompressed during DML operations in EXPLAIN output
* #6185 Keep track of catalog version
* #6227 Use creation time in retention/compression policy
* #6307 Add SQL function cagg_validate_query
**Bugfixes**
* #6188 Add GUC for setting background worker log level
* #6222 Allow enabling compression on hypertable with unique expression index
* #6240 Check if worker registration succeeded
* #6254 Fix exception detail passing in compression_policy_execute
* #6264 Fix missing bms_del_member result assignment
* #6275 Fix negative bitmapset member not allowed in compression
* #6280 Potential data loss when compressing a table with a partial index that matches compression order.
* #6289 Add support for startup chunk exclusion with aggs
* #6290 Repair relacl on upgrade
* #6297 Fix segfault when creating a cagg using a NULL width in time bucket function
* #6305 Make timescaledb_functions.makeaclitem strict
* #6332 Fix typmod and collation for segmentby columns
* #6339 Fix tablespace with constraints
* #6343 Enable segmentwise recompression in compression policy
**Thanks**
* @fetchezar for reporting an issue with compression policy error messages
* @jflambert for reporting the background worker log level issue
* @torazem for reporting an issue with compression and large oids
* @fetchezar for reporting an issue in the compression policy
* @lyp-bobi for reporting an issue with tablespace with constraints
* @pdipesh02 for contributing to the implementation of the metadata for chunk creation time,
the generalized hypertable API, and show_chunks/drop_chunks using chunk creation time
* @lkshminarayanan for all his work on PG16 support
This is a very known flaky test where we need to create another database
as a template of the TEST_DBNAME but some background workers didn't had
enough time to properly shutdown and disconnect from the database.
Fixed it by forcing other processes to terminate just after waiting for
background workers to discinnect from the database.
Closes#4766
Signed-off-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
This patch fixes ts_array_position when the entry was not found.
Previously it would return the position of the last element in
that case. This patch also adds a NULL check to ts_array_length
and an Assert for TEXT array to ts_array_is_member and
ts_array_position. Current code should not be affected by any
of these changes.
If we reuse the compressor to recompress multiple sets
of tuples, internal state gets left behind from the previous
run which can contain invalid data. Resetting the compressor
first iteration field between runs fixes this.
If a hypertable uses a non-default tablespace for its primary or
unique constraints with additional DEFERRABLE or INITIALLY DEFERRED
characteristics then any chunk creation will fail with syntax error. We
now set the tablespace via a separate command for such constraints for
the chunks.
Fixes#6338
Change compression policy to use segmentwise
recompression when possible to increase performance.
Segmentwise recompression decompresses rows into memory,
thus reducing IO load when recompressing, making it
much faster for bigger chunks.
One test query of the parallel test did not contain an ORDER BY.
Therefore, the test result was not deterministic. This patch adds the
missing ORDER BY.
Remove checks that will always return true in current implementation
to make refactoring easier. Aggref->args will always be list of
TargetEntry, and the bulk_decompression check will always return true
as well.
Slight refactoring to decouple toast storage options from
Hypertable_compression and instead base it on the default
algorithm for the uncompressed datatype of a column. This
should result in the exact same storage option used as
before.
In e3437786ad we disabled the MN test for PR. This leads to the problem
that a PR can break MN tests and get merged. This PR enables MN test for
one PG version to make the PR author aware of any MN-related problems.
In e90280a we added support for ChunkAppend startup chunk exclusion with
a custom scan below a partial aggregation. This PR changes the logic and
adds support for more nodes below the partial aggregation (e.g.,
IndexScans).
We added a workaround for segmentby columns with incorrect typmod
and collation in c73c5a74b9 but did not adjust pre-existing relations.
This patch will fix any existing relations where the segmentby columns
of compressed chunks have incorrect typmod and collation and remove
the code workaround.
This reverts commit bebd1ab42940aae7ee4817621f1b498788704867. We have
discovered that the BGW slot is not freed in all cases. In this case, no
more new workers can be created. So, the patch is rolled back until the
bug has been corrected.
Change the code to have less direct references to FormData_hypertable_compression.
The patch also renames SegmentFilter to BatchFilter to make the purpose clearer.
Foreign tables add an extra "wholerow" ROWID_VAR to the HypertableModify
scan's targetlist. It causes adjust_appendrel_attrs() to assert when
the Var has been previously modified by ts_replace_rowid_vars(). This
patch keeps the original unaltered targetlist letting
adjust_appendrel_attrs() properly replace these ROWID_VARs for the
chunks.
Tests `util` and `repair` both used the same user name, so when
executing in the same parallel suite they could cause conflict.
Instead, use different role names for different tests.
The retention and compression policies can now use drop_created_before
and compress_created_before arguments respectively to specify chunk
selection using their creation times.
We don't support creation times for CAggs, yet.
This should improve the throughput somewhat.
This commit does several things:
* Simplify loop condition in decompressing the compressed batch by using
the count metadata column.
* Split out a separate function that decompresses the entire compressed
batch and saves the decompressed tuples slot into RowDecompressor.
* Use bulk table insert function for inserting the decompressed rows,
this reduces WAL activity. If we have indexes on uncompressed chunk,
update them one index for entire batch at a time, to reduce load on
shared buffers cache. Before that we used to update all indexes for one
row, then for another, etc.
* Add a test for memory leaks during (de)compression.
* Update the compression_update_delete test to use INFO messages + a
debug GUC instead of DEBUG messages which are flaky.
This gives 10%-30% speedup on tsbench for decompress_chunk and various
compressed DML queries. This is very far from the performance we had in
2.10, but still a nice improvement.
In b7e04f17 we removed the useless table lock on hypertable size
functions and without this explicit lock the SELECT grant is not
necessary anymore and doing it break some MN regression tests.
Also removed the tests for PG16 since MN is not supported.
In ae21ee96 we fixed a race condition when running a query to get the
hypertable sizes and one or more chunks was dropped in a concurrent
session leading to exception because the chunks does not exist.
In fact the table lock introduced is useless because we also added
proper joins with Postgres catalog tables to ensure that the relation
exists in the database when calculating the sizes. And even worse with
this table lock now dropping chunks wait for the functions that
calculate the hypertable sizes.
Fixed it by removing the useless table lock and also added isolation
tests to make sure we'll not end up with race conditions again.
In 068534e31730154b894dc8e4fb5315054e1ae51c we make the dist_util
regression test version specific. However, the solo test declaration for
this test was not adjusted, which makes this test flaky. This PR fixes
the declaration.
In the sanitizer workflow we're trying to upload sanitizer output
logs from `${{ github.workspace }}/sanitizer` but in the ASAN_OPTIONS,
LSAN_OPTIONS and UBSAN_OPTIONS we were setting output logs to another
place.
Example of workflow that files were not found in the provided path:
https://github.com/timescale/timescaledb/actions/runs/6830847083
Workflow actions does not move issue to Done column when there is a
comment and the issue is then closed. This commit deals with that by
handling the closed event and move it to the Done column as well as
removing labels that can interfere with processing.
With this function is possible to execute the Continuous Aggregate query
validation over an arbitrary query string, without the need to actually
create the Continuous Aggregate.
It can be used, for example, to check for most frequent queries maybe
using `pg_stat_statements`, validate them and check if there are queries
that potenttialy can turned into a Continuous Aggregate.
We will decompress the compressed columns on demand, skipping them if
the vectorized quals don't pass for the entire batch. This allows us to
avoid reading some columns, saving on IO. The number of batches that are
entirely filtered out is reflected in EXPLAIN ANALYZE as 'Batches
Removed by Filters'.
When creating a Continuous Aggregate using a NULL `bucket_width` in the
`time_bucket` function it lead to a segfault, for example:
CREATE MATERIALIZED VIEW cagg WITH (timescaledb.continuous) AS
SELECT time_bucket(NULL, time), count(*)
FROM metrics
GROUP BY 1;
Fixed it by raising an ERROR if a NULL `bucked_width` is used during the
Continuous Aggregate query validation.
Currently, MN is not supported on PG16. Therefore, the creation of
distributed restore points fails on PG16. This patch disables the CI
test for this PG version.
If users have accidentally been removed from `pg_authid` as a result of
bugs where dropping a user did not revoke privileges from all tables
where the had privileges, it will not be possible to create new chunks
since these require the user to be found when copying the privileges
for the parent table (either compressed hypertable or normal
hypertable).
To fix the situation, we repair the `pg_class` table when updating the
extension by modifying the `relacl` for relations and remove any user
that do not have an entry in `pg_authid`.
A repair function `_timescaledb_functions.repair_relation_acls` is
added that will perform the job. A `makeaclitem` from PG16 that accepts
a list of comma and used as part of the repair is also added as
`_timescaledb_functions.makeaclitem`.
The CI tests on Windows log the creation of a new WAL file in a
non-deterministic way. This message causes the regression tests to fail.
This PR removed these messages from the test output.
This patch adds the support for the dynamic detection of the data type
for a vectorized aggregate. In addition, it removes the hard-coded
integer data type and initializes the decompression_map properly. This
also fixes an invalid memory access.
In ba9b81854c8c94005793bccff29433f6086e5274 we added support for
chunk-wise aggregates. The pushdown of the aggregate breaks the startup
exclusion logic of the ChunkAppend node. This PR adds the support for
startup chunk exclusion with chunk-wise aggs.
Fixes: #6282