This release contains significant performance improvements when working with compressed data, extended join
support in continuous aggregates, and the ability to define foreign keys from regular tables towards hypertables.
We recommend that you upgrade at the next available opportunity.
In TimescaleDB v2.16.0 we:
* Introduce multiple performance focused optimizations for data manipulation operations (DML) over compressed chunks.
Improved upsert performance by more than 100x in some cases and more than 1000x in some update/delete scenarios.
* Add the ability to define chunk skipping indexes on non-partitioning columns of compressed hypertables
TimescaleDB v2.16.0 extends chunk exclusion to use those skipping (sparse) indexes when queries filter on the relevant columns,
and prune chunks that do not include any relevant data for calculating the query response.
* Offer new options for use cases that require foreign keys defined.
You can now add foreign keys from regular tables towards hypertables. We have also removed
some really annoying locks in the reverse direction that blocked access to referenced tables
while compression was running.
* Extend Continuous Aggregates to support more types of analytical queries.
More types of joins are supported, additional equality operators on join clauses, and
support for joins between multiple regular tables.
**Highlighted features in this release**
* Improved query performance through chunk exclusion on compressed hypertables.
You can now define chunk skipping indexes on compressed chunks for any column with one of the following
integer data types: `smallint`, `int`, `bigint`, `serial`, `bigserial`, `date`, `timestamp`, `timestamptz`.
After you call `enable_chunk_skipping` on a column, TimescaleDB tracks the min and max values for
that column. TimescaleDB uses that information to exclude chunks for queries that filter on that
column, and would not find any data in those chunks.
* Improved upsert performance on compressed hypertables.
By using index scans to verify constraints during inserts on compressed chunks, TimescaleDB speeds
up some ON CONFLICT clauses by more than 100x.
* Improved performance of updates, deletes, and inserts on compressed hypertables.
By filtering data while accessing the compressed data and before decompressing, TimescaleDB has
improved performance for updates and deletes on all types of compressed chunks, as well as inserts
into compressed chunks with unique constraints.
By signaling constraint violations without decompressing, or decompressing only when matching
records are found in the case of updates, deletes and upserts, TimescaleDB v2.16.0 speeds
up those operations more than 1000x in some update/delete scenarios, and 10x for upserts.
* You can add foreign keys from regular tables to hypertables, with support for all types of cascading options.
This is useful for hypertables that partition using sequential IDs, and need to reference those IDs from other tables.
* Lower locking requirements during compression for hypertables with foreign keys
Advanced foreign key handling removes the need for locking referenced tables when new chunks are compressed.
DML is no longer blocked on referenced tables while compression runs on a hypertable.
* Improved support for queries on Continuous Aggregates
`INNER/LEFT` and `LATERAL` joins are now supported. Plus, you can now join with multiple regular tables,
and you can have more than one equality operator on join clauses.
**PostgreSQL 13 support removal announcement**
Following the deprecation announcement for PostgreSQL 13 in TimescaleDB v2.13,
PostgreSQL 13 is no longer supported in TimescaleDB v2.16.
The Currently supported PostgreSQL major versions are 14, 15 and 16.
Skip the OSM chunk when doing hypertable expansion for FK lookup
queries. OSM chunks are considered archived data and we dont want
to incur the performance hit of querying OSM data on modifications
on the FK reference table.
We applied our sort transformation for interval calculation too
aggressively even in situations where it is not safe to do so, leading
to potentially incorrectly sorted output or `mergejoin input data is out
of order` error messages.
Fixes#7097
Using `REASSIGN OWNED BY` for background jobs do not work because it
does not change the owner of the job. This commit fixes this by
capturing the utility command and makes the necessary changes to the
`bgw_job` table.
It also factors out background jobs DDL tests into a separate file.
When creating a CAgg using a column on the projection that is not part
of the `GROUP BY` clause but is functionally dependent of the primary
key of the referenced table is leading to a problem in dump/restore
because the wrong dependencies created changing the order and way dump
is generated.
Fixed it by copying the `Query` data structure of the `direct view` and
changing the necessary properties instead of creating it from scratch.
When creating index paths for compressed chunks, root->eq_classes
can contain a lot of entries from other relations which slow down
plan time. By filtering them to include only ECs which are from
that relation, we can improve plan times significantly when
lots of chunks are involved in the query.
Refactored the Hierarchical Continuous Aggregate regression tests
including more columns in JOIN tests and also added an `ORDER BY`
clause to definition to avoid flake tests when querying and show
the result rows.
Creating or changing to realtime a Continuous Aggregate with multiple
joins was leading to a segfault.
Fixed it by dealing properly with the `varno` when creating the `Quals`
for the union view in realtime mode.
Also get rid of some left over when we relaxed the CAggs join
restrictions in #7111.
Don't copy foreign key constraints to the individual chunks and
instead modify the lookup query to propagate to individual chunks
to mimic how postgres does this for partitioned tables.
This patch also removes the requirement for foreign key columns
to be segmentby columns.
The last step of compressing chunk is cleanup the uncompressed chunk and
currently it is done by a `TRUNCATE` that requires an
`AccessExclusiveLock` preventing concurrent sessions even `SELECT` data
from the hypertable.
With this PR will be possible to execute a `DELETE` instead of a
`TRUNCATE` on the uncompressed chunk relaxing the lock to
`RowExclusiveLock`. This new behavior is controled by a new GUC
`timescaledb.enable_delete_after_compression` that is `false` by
default.
The side effect of enabling this behavior will be more WAL generation
because we'll delete each row from uncompressed chunk and also bloat due
to a lot of dead tuples created.
Having c function references in the versioned part of the sql
scripts introduces linking requirements to the update script
potentially preventing version updates. To prevent this we can
have a dummy function in latest-dev.sql since it will get over-
written as the final step of the extension update.
Some more complex queries using CAggs on CTEs was not properly applying
the `cagg_watermark` constify optimization because we restricted it to
more simple queries.
Simplified the code and only restrict SELECT queries to apply the
optimization.
In #6767 we allowed users to track job execution history by turning on
the new GUC `timescaledb.enable_job_execution_logging`.
Unfortunately we defined this GUC as PGC_USERSET context but the right
context should be PGC_SIGHUP since it won't work when setting at session
and/or database level so we should restrict it to be used only by `ALTER
SYSTEM SET` or changing the config files.
In current code skip_current_tuple will never be NULL pointer
in ON CONFLICT DO NOTHING case but add an additional check nonetheless
to make it safe against future refactoring.
Remove some Continuous Aggregates JOIN restrictions by allowing:
* INNER/LEFT join;
* LATERAL join;
* JOIN between 1 hypertable and N tables, foreign tables, views or
materialized views;
* Remove restriction of only ONE equality operator on JOIN clause.
On INSERT into compressed chunks with unique constraints we can
check for conflict without decompressing when no ON CONFLICT clause
is present and we only have one unique constraint. With ON CONFLICT
clause with DO NOTHING we can just skip the INSERT if we detect conflict
and return early. Only for ON CONFLICT DO UPDATE/UPSERT do we need
to decompress when there is a constraint conflict.
Doing the optimization in the presence of multiple constraints is
also possible but not part of this patch.
The `transparent_decompression` test is flaky because incremental sort
is chosen most times but normal sort is picked up as well when under
load.
Making the test less flaky by turning off incremental sort. The test
exists to test that DecompressChunk works as intended.
Allow users to specify that ranges (min/max values) be tracked
for a specific column using the enable_column_stats() API. We
will store such min/max ranges in a new timescaledb catalog table
_timescaledb_catalog.chunk_column_stats. As of now we support tracking
min/max ranges for smallint, int, bigint, serial, bigserial, date,
timestamp, timestamptz data types. Support for other stats for bloom
filters etc. will be added in the future.
We add an entry of the form (ht_id, invalid_chunk_id, col, -INF, +INF)
into this catalog to indicate that min/max values need to be calculated
for this column in a given hypertable for chunks. We also iterate
through existing chunks and add -INF, +INF entries for them in the
catalog. This allows for selection of these chunks by default since no
min/max values have been calculated for them.
This actual min-max start/end range is calculated later. One of the
entry points is during compression for now. The range is stored in
start (inclusive) and end (exclusive) form. If DML happens into a
compressed chunk then as part of marking it as partial, we also mark
the corresponding catalog entries as "invalid". So partial chunks do
not get excluded further. When recompression happens we get the new
min/max ranges from the uncompressed portion and try to reconcile the
ranges in the catalog based on these new values. This is safe to do in
case of INSERTs and UPDATEs. In case of DELETEs, since we are deleting
rows, it's possible that the min/max ranges change, but as of now we
err on the side of caution and retain the earlier values which can be
larger than the actual range.
We can thus store the min/max values for such columns in this catalog
table at the per-chunk level. Note that these min/max range values do
not participate in partitioning of the data. Such data ranges will be
used for chunk pruning if the WHERE clause of an SQL query specifies
ranges on such a column.
Note that Executor startup time chunk exclusion logic is also able to
use this metadata effectively.
A "DROP COLUMN" on a column with a statistics tracking enabled on it
ends up removing all relevant entries from the catalog tables.
A "decompress_chunk" on a compressed chunk removes its entries from the
"chunk_column_stats" catalog table since now it's available for DML.
Also a new "disable_column_stats" API has been introduced to allow
removal of min/max entries from the catalog for a specific column.
This patch changes our build process to no longer link against
openssl directly but instead rely on postgres linking it.
Linking to openssl directly is causing problems when the openssl
version we link against does not match the version postgres links
against. While this is easy to prevent where we fully control the
build process it is repeatedly causing problems e.g. in ABI tests.
This patch changes only changes the behaviour for non-Windows as
we are running into linker problems on Windows with this change.
Until we can find a workaround for those problems Windows binaries
we still link OpenSSL directly.
With these changes TimescaleDB can be built against PG17. Doing
so still requires -DEXPERIMENTAL=ON..
Co-authored-by: Aleksander Alekseev <aleksander@timescale.com>
PG17 changed attstattarget to be NULLABLE and changed the default
to NULL. This patch changes the pg_attribute to produce the same
result against PG17 and previous versions.
4f622503d6
Only decompress batches for compressed UPDATE/DELETE when the batch
actually has tuples that match the query constraints. This will
work even for columns we have no metadata on.
Move compression dml code into separate file, moves code dealing
with ScanKey into separate file and move compression algorithms code
into dedicated subdirectory.
Previously for INSERTs into compressed chunks with unique constraints
we would decompress the batch which would contain the tuple matching
according to the constraints. This patch will skip the decompressing
if the batch does not contain an actual matching tuples. This patch
adds the optimization for INSERT with unique constraints.
Similar optimizations for UPDATE and DELETE will be added in followup
patches.
In #6377 we fixed an `ORDER BY/GROUP BY expression not found in
targetlist` by using the `root->processed_groupClause` instead of
`parse->groupClause` due to an optimization introduced in PG16 that
removes redundant grouping and distinct columns.
But looks like we didn't change all necessary places, specially our
HashAggregate optimization.
This release contains bug fixes since the 2.15.2 release.
Best practice is to upgrade at the next available opportunity.
**Migrating from self-hosted TimescaleDB v2.14.x and earlier**
After you run `ALTER EXTENSION`, you must run [this SQL
script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.15.X-fix_hypertable_foreign_keys.sql).
For more details, see the following pull request
[#6797](https://github.com/timescale/timescaledb/pull/6797).
If you are migrating from TimescaleDB v2.15.0, v2.15.1 or v2.15.2, no
changes are required.
**Bugfixes**
* #7061: Fix the handling of multiple unique indexes in a compressed
INSERT.
* #7080: Fix the `corresponding equivalence member not found` error.
* #7088: Fix the leaks in the DML functions.
* #7035: Fix the error when acquiring a tuple lock on the OSM chunks on
the replica.
**Thanks**
* @Kazmirchuk for reporting the issue about leaks with the functions in
DML.
If plpgsql functions are used in DML queries then we were leaking 8KB
for every invocation of that function. This can quickly add up.
The issue was that the "CurTransactionContext" was not getting cleaned
up after every invocation. The reason was that we were inadvertantly
allocating a temporary list in that context. Postgres then thought that
this CurTransactionContext needs to be re-used further and kept it
around. We now use a proper memory context to avoid this.
Fixes#7053
When querying compressed data we determine whether the requested ORDER
can be applied to the underlying query on the compressed data itself.
This happens twice. The first time we decide whether we can push down
the sort and then we do a recheck when we setup the sort metadata.
Unfortunately those two checks did not agree. The initial check concluded
it is possible but the recheck disagreed. This was due to a bug when
checking the query properties we mixed up the attnos and used attnos
from uncompressed chunk and compressed chunk in the same bitmapset.
If a segmentby column with equality constraint was present in the WHERE
clause whose attno was identical to a compressed attno of a different
column that was part of the ORDER BY the recheck would fail.
This patch removes the recheck and relies on the initial assesment
when building sort metadata.
We added a few diagnostic log messages in the compression/decompression
code paths some time ago and they have been useful in identifying
hotspots in the actual activities. Adding a few more for recompression
now. The row_compressor_append_sorted_rows function which is also used
in recompression is already logged so we need just a few log messages
here.
We must never use index column names to try to match relation column
names between different relations as index column names are independent
of relation column names and can get out of sync due to column renames.
This is a small refactoring for getting time bucket function Oid from
a view definition. It will be necessary for a following PRs for
completely remove the uncessary catalog metadata table
`continuous_aggs_bucket_function`.
Also added a new SQL function `cagg_get_bucket_function_info` to return
all `time_bucket` information based on a user view definition.
* cagg_watermark_concurrent_update is very dependent on the chunk
numbers, and should be ran first.
* telemetry_stats should do VACUUM and REINDEX before getting the
statistics, to avoid dependency on how the index was build
* cagg_migrate_function is missing some orderbys