Changing from using the `compress_using` parameter with a table access
method name to use the boolean parameter `hypercore_use_access_method`
instead to avoid having to provide a name when using the table access
method for compression.
The `compress_chunk()` function can now be used to create hyperstores
by passing the option `compress_using => 'hyperstore'` to the
function.
Using the `compress_chunk()` function is an alternative to using
`ALTER TABLE my_hyperstore SET ACCESS METHOD` that is compatible with
the existing way of compressing hypertable chunks. It will also make
it easier to support hyperstore compression via compression policies.
Additionally, implement "fast migration" to hyperstore when a table is
already compressed. In that case, simply update the PG catalog to say
that the table is using hyperstore as TAM without rewriting the
table. This fast migration works with both with `...SET ACCESS METHOD`
and `compress_chunk()`.
Implement vacuum by internally calling vacuum on both the compressed
and non-compressed relations.
Since hyperstore indexes are defined on the non-compressed relation,
vacuuming the compressed relation won't clean up compressed tuples
from those indexes. To handle this, a proxy index is defined on each
compressed relation in order to direct index vacuum calls to the
corresponding indexes on the hyperstore relation. The proxy index also
translates the encoded TIDs stored in the index to proper TIDs for the
compressed relation.
Add check that SELECT FOR UPDATE does not crash as well as an isolation
test to make sure that it locks rows properly.
Also adds a debug function to check if a TID is for a compressed tuple.
Implement the table-access method API around compression in order to
have, among other things, seamless index support on compressed data.
The current functionality is rudimentary but common operations work,
including sequence scans.
This release adds support for PostgreSQL 17, significantly improves the
performance of continuous aggregate refreshes,
and contains performance improvements for analytical queries and delete
operations over compressed hypertables.
We recommend that you upgrade at the next available opportunity.
**Highlighted features in TimescaleDB v2.17.0**
* Full PostgreSQL 17 support for all existing features. TimescaleDB
v2.17 is available for PostgreSQL 14, 15, 16, and 17.
* Significant performance improvements for continuous aggregate
policies: continuous aggregate refresh is now using
`merge` instead of deleting old materialized data and re-inserting.
This update can decrease dramatically the amount of data that must be
written on the continuous aggregate in the
presence of a small number of changes, reduce the `i/o` cost of
refreshing a continuous aggregate, and generate fewer
Write-Ahead Logs (`WAL`).
Overall, continuous aggregate policies will be more lightweight, use
less system resources, and complete faster.
* Increased performance for real-time analytical queries over compressed
hypertables:
we are excited to introduce additional Single Instruction, Multiple Data
(`SIMD`) vectorization optimization to our
engine by supporting vectorized execution for queries that group by
using the `segment_by` column(s) and
aggregate using the basic aggregate functions (`sum`, `count`, `avg`,
`min`, `max`).
Stay tuned for more to come in follow-up releases! Support for grouping
on additional columns, filtered aggregation,
vectorized expressions, and `time_bucket` is coming soon.
* Improved performance of deletes on compressed hypertables when a large
amount of data is affected.
This improvement speeds up operations that delete whole segments by
skipping the decompression step.
It is enabled for all deletes that filter by the `segment_by` column(s).
**PostgreSQL 14 deprecation announcement**
We will continue supporting PostgreSQL 14 until April 2025. Closer to
that time, we will announce the specific
version of TimescaleDB in which PostgreSQL 14 support will not be
included going forward.
**Features**
* #6882: Allow delete of full segments on compressed chunks without
decompression.
* #7033: Use `merge` statement on continuous aggregates refresh.
* #7126: Add functions to show the compression information.
* #7147: Vectorize partial aggregation for `sum(int4)` with grouping on
`segment by` columns.
* #7204: Track additional extensions in telemetry.
* #7207: Refactor the `decompress_batches_scan` functions for easier
maintenance.
* #7209: Add a function to drop the `osm` chunk.
* #7275: Add support for the `returning` clause for `merge`.
* #7200: Vectorize common aggregate functions like `min`, `max`, `sum`,
`avg`, `stddev`, `variance` for compressed columns
of arithmetic types, when there is grouping on `segment by` columns or
no grouping.
**Bug fixes**
* #7187: Fix the string literal length for the `compressed_data_info`
function.
* #7191: Fix creating default indexes on chunks when migrating the data.
* #7195: Fix the `segment by` and `order by` checks when dropping a
column from a compressed hypertable.
* #7201: Use the generic extension description when building `apt` and
`rpm` loader packages.
* #7227: Add an index to the `compression_chunk_size` catalog table.
* #7229: Fix the foreign key constraints where the index and the
constraint column order are different.
* #7230: Do not propagate the foreign key constraints to the `osm`
chunk.
* #7234: Release the cache after accessing the cache entry.
* #7258: Force English in the `pg_config` command executed by `cmake` to
avoid the unexpected building errors.
* #7270: Fix the memory leak in compressed DML batch filtering.
* #7286: Fix the index column check while searching for the index.
* #7290: Add check for null offset for continuous aggregates built on
top of continuous aggregates.
* #7301: Make foreign key behavior for hypertables consistent.
* #7318: Fix chunk skipping range filtering.
* #7320: Set the license specific extension comment in the install
script.
**Thanks**
* @MiguelTubio for reporting and fixing the Windows build error.
* @posuch for reporting the misleading extension description in the generic loader packages.
* @snyrkill for discovering and reporting the issue with continuous
aggregates built on top of continuous aggregates.
---------
Signed-off-by: Pallavi Sontakke <pallavi@timescale.com>
Signed-off-by: Yannis Roussos <iroussos@gmail.com>
Signed-off-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>
Co-authored-by: Yannis Roussos <iroussos@gmail.com>
Co-authored-by: atovpeko <114177030+atovpeko@users.noreply.github.com>
Co-authored-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>
Sequence numbers were an optimization for ordering batches based on the
orderby configuration setting. It was used for ordered append and
avoiding sorting compressed data when it matched the query ordering.
However, with enabling changes to compressed data, bookkeeping of
sequence numbers is becoming more of a hassle. Removing them and
using the metadata columns for ordering reduces that burden while
keeping all the existing optimizations that relied on the sequences
in place.
During upgrade the function `remove_dropped_chunk_metadata` is used to
update the metadata tables and remove data for chunks marked as
dropped. The function iterates of the chunks of the provided hypertable
and internally does a sequence scan of `compression_chunk_size` table
to locate the `compressed_chunk_id`, resulting in quadratic execution
time. This is usually not noticed for small number of chunks, but for
large number of chunks this becomes a problem.
This commit fixes this by adding an index to `compression_chunk_size`
catalog table, turning the sequence scan into an index scan.
Add a function that can be used on a compressed data value to show
some metadata information, such as the compression algorithm used and
the presence of any null values.
Don't copy foreign key constraints to the individual chunks and
instead modify the lookup query to propagate to individual chunks
to mimic how postgres does this for partitioned tables.
This patch also removes the requirement for foreign key columns
to be segmentby columns.
Allow users to specify that ranges (min/max values) be tracked
for a specific column using the enable_column_stats() API. We
will store such min/max ranges in a new timescaledb catalog table
_timescaledb_catalog.chunk_column_stats. As of now we support tracking
min/max ranges for smallint, int, bigint, serial, bigserial, date,
timestamp, timestamptz data types. Support for other stats for bloom
filters etc. will be added in the future.
We add an entry of the form (ht_id, invalid_chunk_id, col, -INF, +INF)
into this catalog to indicate that min/max values need to be calculated
for this column in a given hypertable for chunks. We also iterate
through existing chunks and add -INF, +INF entries for them in the
catalog. This allows for selection of these chunks by default since no
min/max values have been calculated for them.
This actual min-max start/end range is calculated later. One of the
entry points is during compression for now. The range is stored in
start (inclusive) and end (exclusive) form. If DML happens into a
compressed chunk then as part of marking it as partial, we also mark
the corresponding catalog entries as "invalid". So partial chunks do
not get excluded further. When recompression happens we get the new
min/max ranges from the uncompressed portion and try to reconcile the
ranges in the catalog based on these new values. This is safe to do in
case of INSERTs and UPDATEs. In case of DELETEs, since we are deleting
rows, it's possible that the min/max ranges change, but as of now we
err on the side of caution and retain the earlier values which can be
larger than the actual range.
We can thus store the min/max values for such columns in this catalog
table at the per-chunk level. Note that these min/max range values do
not participate in partitioning of the data. Such data ranges will be
used for chunk pruning if the WHERE clause of an SQL query specifies
ranges on such a column.
Note that Executor startup time chunk exclusion logic is also able to
use this metadata effectively.
A "DROP COLUMN" on a column with a statistics tracking enabled on it
ends up removing all relevant entries from the catalog tables.
A "decompress_chunk" on a compressed chunk removes its entries from the
"chunk_column_stats" catalog table since now it's available for DML.
Also a new "disable_column_stats" API has been introduced to allow
removal of min/max entries from the catalog for a specific column.
This is a small refactoring for getting time bucket function Oid from
a view definition. It will be necessary for a following PRs for
completely remove the uncessary catalog metadata table
`continuous_aggs_bucket_function`.
Also added a new SQL function `cagg_get_bucket_function_info` to return
all `time_bucket` information based on a user view definition.
In #6624 we refactored the time bucket catalog table to make it more
generic and save information for all Continuous Aggregates. Previously
it stored only variable bucket size information.
The problem is we used the `regprocedure` type to store the OID of the
given time bucket function but unfortunately it is not supported by
`pg_upgrade`.
Fixed it by changing the column to TEXT and resolve to/from OID using
builtin `regprocedurein` and `format_procedure_qualified` functions.
Fixes#6935
We shouldnt reuse job ids to make it easy to recognize the job
log entries for a job. We also need to keep the old job around
to not break loading dumps from older versions.
The function time_bucket_ng is deprecated. This PR adds a migration path
for existing CAggs. Since time_bucket and time_bucket_ng use different
origin values, a custom origin is set if needed to let time_bucket
create the same buckets as created by time_bucket_ng so far.
In #6767 we introduced the ability to track job execution history
including succeeded and failed jobs.
The new metadata table `_timescaledb_internal.bgw_job_stat_history` has
two JSONB columns `config` (store config information) and `error_data`
(store the ErrorData information). The problem is that this approach is
not flexible for future history recording changes so this PR refactor
the current implementation to use only one JSONB column named `data`
that will store more job information in that form:
{
"job": {
"owner": "fabrizio",
"proc_name": "error",
"scheduled": true,
"max_retries": -1,
"max_runtime": "00:00:00",
"proc_schema": "public",
"retry_period": "00:05:00",
"initial_start": "00:05:00",
"fixed_schedule": true,
"schedule_interval": "00:00:30"
},
"config": {
"bar": 1
},
"error_data": {
"domain": "postgres-16",
"lineno": 841,
"context": "SQL statement \"SELECT 1/0\"\nPL/pgSQL function error(integer,jsonb) line 3 at PERFORM",
"message": "division by zero",
"filename": "int.c",
"funcname": "int4div",
"proc_name": "error",
"sqlerrcode": "22012",
"proc_schema": "public",
"context_domain": "plpgsql-16"
}
}
In #4678 we added an interface for troubleshoting job failures by
logging it in the metadata table `_timescaledb_internal.job_errors`.
With this PR we extended the existing interface to also store succeeded
executions. A new GUC named `timescaledb.enable_job_execution_logging`
was added to control this new behavior and the default value is `false`.
We renamed the metadata table to `_timescaledb_internal.bgw_job_stat_history`
and added a new view `timescaledb_information.job_history` to users that
have enough permissions can check the job execution history.
This changes the behavior of the CAgg catalog tables. From now on, all
CAggs that use a time_bucket function create an entry in the catalog
table continuous_aggs_bucket_function. In addition, the duplicate
bucket_width attribute is removed from the catalog table continuous_agg.
Historically, we have used an empty string for undefined values in the
catalog table continuous_aggs_bucket_function. Since #6624, the optional
arguments can be NULL. This patch cleans up the empty strings and
changes the logic to work with NULL values.
So far, bucket_origin was defined as a Timestamp but used as a
TimestampTz in many places. This commit changes this and unifies the
usage of the variable.
The catalog table continuous_aggs_bucket_function is currently only used
for variable bucket sizes. Information about the fixed-size buckets is
stored in the table continuous_agg only. This causes some problems
(e.g., we have redundant fields for the bucket_size, fixes size buckets
with offsets are not supported, ...).
This commit is the first in a row of commits that refactor the catalog
for the CAgg time_bucket function. The goals are:
* Remove the CAgg redundant attributes in the catalog
* Create an entry in continuous_aggs_bucket_function for all CAggs
that use time_bucket
This first commit refactors the continuous_aggs_bucket_function table
and prepares it for more generic use. Not all attributes are used yet,
but these will change in follow-up PRs.
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.
In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.
Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.
Closes#6570
This patch deprecates the recompress_chunk procedure as all that
functionality is covered by compress_chunk now. This patch also adds a
new optional boolean argument to compress_chunk to force applying
changed compression settings to existing compressed chunks.
If a lot of chunks are involved then the current pl/pgsql function
to compute the size of each chunk via a nested loop is pretty slow.
Additionally, the current functionality makes a system call to get the
file size on disk for each chunk everytime this function is called.
That again slows things down. We now have an approximate function which
is implemented in C to avoid the issues in the pl/pgsql function.
Additionally, this function also uses per backend caching using the
smgr layer to compute the approximate size cheaply. The PG cache
invalidation clears off the cached size for a chunk when DML happens
into it. That size cache is thus able to get the latest size in a
matter of minutes. Also, due to the backend caching, any long running
session will only fetch latest data for new or modified chunks and can
use the cached data (which is calculated afresh the first time around)
effectively for older chunks.
Logging and caching related tables from the timescaledb extension
should not be dumped using pg_dump. Our scripts specify a few such
unwanted tables. Apart from being unnecessary, the "job_errors" had
some restricted permissions causing additional problems in pg_dump.
We now don't include such tables for dumping.
Fixes#5449
This patch changes the dump configuration for
_timescaledb_catalog.metadata to include all entries. To allow loading
logical dumps with this configuration an insert trigger is added that
turns uniqueness conflicts into updates to not block the restore.
This patch implements changes to the compressed hypertable to allow per
chunk configuration. To enable this the compressed hypertable can no
longer be in an inheritance tree as the schema of the compressed chunk
is determined by the compression settings. While this patch implements
all the underlying infrastructure changes, the restrictions for changing
compression settings remain intact and will be lifted in a followup patch.
The extension state is not easily accessible in release builds, which
makes debugging issue with the loader very difficult. This commit
introduces a new schema `_timescaledb_debug` and makes the function
`ts_extension_get_state` available also in release builds as
`_timescaledb_debug.extension_state`.
See #1682
Remove the code used by multinode to handle remote connections.
This patch completely removes tsl/src/remote and any remaining
distributed hypertable checks.