Use ObjectId datum functions in compression settings instead of int32
functions when storing `regclass` types.
Also fix a minor issue where an array for Datum information was using
the wrong size.
This release contains performance improvements and bug fixes since
the 2.14.2 release. We recommend that you upgrade at the next
available opportunity.
In addition, it includes these noteworthy features:
* Support `time_bucket` with `origin` and/or `offset` on Continuous Aggregate
* Compression improvements:
- Improve expression pushdown
- Add minmax sparse indexes when compressing columns with btree indexes
- Make compression use the defaults functions
- Vectorize filters in WHERE clause that contain text equality operators and LIKE expressions
**Deprecation warning**
* Starting on this release will not be possible to create Continuous Aggregate using `time_bucket_ng` anymore and it will be completely removed on the upcoming releases.
* Recommend users to [migrate their old Continuous Aggregate format to the new one](https://docs.timescale.com/use-timescale/latest/continuous-aggregates/migrate/) because it support will be completely removed in next releases prevent them to migrate.
* This is the last release supporting PostgreSQL 13.
**For on-premise users and this release only**, you will need to run [this SQL script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.15.X-fix_hypertable_foreign_keys.sql) after running `ALTER EXTENSION`. More details can be found in the pull request [#6786](https://github.com/timescale/timescaledb/pull/6797).
**Features**
* #6382 Support for time_bucket with origin and offset in CAggs
* #6696 Improve defaults for compression segment_by and order_by
* #6705 Add sparse minmax indexes for compressed columns that have uncompressed btree indexes
* #6754 Allow DROP CONSTRAINT on compressed hypertables
* #6767 Add metadata table `_timestaledb_internal.bgw_job_stat_history` for tracking job execution history
* #6798 Prevent usage of deprecated time_bucket_ng in CAgg definition
* #6810 Add telemetry for access methods
* #6811 Remove no longer relevant timescaledb.allow_install_without_preload GUC
* #6837 Add migration path for CAggs using time_bucket_ng
* #6865 Update the watermark when truncating a CAgg
**Bugfixes**
* #6617 Fix error in show_chunks
* #6621 Remove metadata when dropping chunks
* #6677 Fix snapshot usage in CAgg invalidation scanner
* #6698 Define meaning of 0 retries for jobs as no retries
* #6717 Fix handling of compressed tables with primary or unique index in COPY path
* #6726 Fix constify cagg_watermark using window function when querying a CAgg
* #6729 Fix NULL start value handling in CAgg refresh
* #6732 Fix CAgg migration with custom timezone / date format settings
* #6752 Remove custom autovacuum setting from compressed chunks
* #6770 Fix plantime chunk exclusion for OSM chunk
* #6789 Fix deletes with subqueries and compression
* #6796 Fix a crash involving a view on a hypertable
* #6797 Fix foreign key constraint handling on compressed hypertables
* #6816 Fix handling of chunks with no contraints
* #6820 Fix a crash when the ts_hypertable_insert_blocker was called directly
* #6849 Use non-orderby compressed metadata in compressed DML
* #6867 Clean up compression settings when deleting compressed cagg
* #6869 Fix compressed DML with constraints of form value OP column
* #6870 Fix bool expression pushdown for queries on compressed chunks
**Thanks**
* @brasic for reporting a crash when the ts_hypertable_insert_blocker was called directly
* @bvanelli for reporting an issue with the jobs retry count
* @djzurawsk For reporting error when dropping chunks
* @Dzuzepppe for reporting an issue with DELETEs using subquery on compressed chunk working incorrectly.
* @hongquan For reporting a 'timestamp out of range' error during CAgg migrations
* @kevcenteno for reporting an issue with the show_chunks API showing incorrect output when 'created_before/created_after' was used with time-partitioned columns.
* @mahipv For starting working on the job history PR
* @rovo89 For reporting constify cagg_watermark not working using window function when querying a CAgg
When adding a CAgg refresh policy using `start_offset => '2 months'` and
`end_offset => '0 months'` was failing because the policy refresh window
is too small and it should cover at least two buckets in the valid time
range of timestamptz.
The problem was when calculating the bucket width for the variable
bucket size (like months) we're assuming 31 days for each month but when
converting the end offset to integer using `interval_to_int64` we use 30
days per month. Fixed it by aligning the variable bucket size
calculation to also use 30 days.
We shouldnt reuse job ids to make it easy to recognize the job
log entries for a job. We also need to keep the old job around
to not break loading dumps from older versions.
This PR introduce the release notes header template using Jinja [1].
Also improved the script to merge changelogs to include the upcoming
.unreleased/RELEASE_NOTES_HEADER.md.j2 where we'll actually write the
release notes header for the next release.
`NOT column` or `column = false` result in different expressions than
other boolean expressions which were not handled in the qual pushdown
code. This patch enables pushdown for these expressions and also enables
pushdown for OR expressions on compressed chunks.
UPDATE/DELETE operations with constraints where the column is on the
right side of the expression and the value on the left side e.g.
'a' > column were not handled correctly when operating on compressed chunks.
When deleting a cagg with compression on the materialization hypertable
the compression settings for that hypertable would not get removed when
dropping the cagg.
In #5261 we cached the Continuous Aggregate watermark value in a
metadata table to improve performance avoiding compute the watermark at
planning time.
Manually DML operations on a CAgg are not recommended and instead the
user should use the `refresh_continuous_aggregate` procedure. But we
handle `TRUNCATE` over CAggs generating the necessary invalidation logs
so make sense to also update the watermark.
The function continuous_agg_migrate_to_time_bucket contains a variable
that is used only for asserts. This PR marks this variable as
PG_USED_FOR_ASSERTS_ONLY.
The function time_bucket_ng is deprecated. This PR adds a migration path
for existing CAggs. Since time_bucket and time_bucket_ng use different
origin values, a custom origin is set if needed to let time_bucket
create the same buckets as created by time_bucket_ng so far.
This intentionally doesnt follow the postgres guidelines for message and
detail because by default users won't see the detail and we want to
present a clear message without requiring enabling additional verbosity.
The CAgg error hint regarding adding indexes to non-finalized CAggs was
proposing to recreate the whole CAgg. However, pointing to the migration
function should be the preferred method. This PR changes the error
wording.
Currently the additional metadata derived from index columns are
only used for the qualifier pushdown in querying but not the
decompression during compressed DML. This patch makes use of this
metadata for compressed DML as well.
This will lead to considerable speedup when deleting or updating
compressed chunks with filters on non-segmentby columns.
So far, we have loaded the timezone information not into the correct
memory context when we fetched a job from the database. Therefore, this
value was stored in the scratch_mctx and removed too early. This PR
moved the value into the desired memory context.
When the `timescaledb.enable_job_execution_logging` is OFF we should
track only errors but we're not saving the related PID.
Fixed it by using MyProcPid when inserting new tuple in
`_timescaledb_internal.bgw_job_stat_history`.
In #6767 and #6831 we introduced the ability to track job execution
history including succeeded and failed jobs.
We migrate records from the old `_timescaledb_internal.job_errors` to
the new `_timescaledb_internal.bgw_job_stat_history` table but we miss
to get the job information into the JSONB field where we store detailed
information about the job execution.
Currently we finish the execution of some process utility statements
and don't execute other hooks in the chain.
Because that reason neither `ts_stat_statements` and
`pg_stat_statements` are able to track some utility statements, for
example COPY ... FROM.
To be able to track it on `ts_stat_statements` we're introducing some
callbacks in order to hook `pgss_store` from TimescaleDB and store
information about the execution of those statements.
In this PR we're also adding a new GUC `enable_tss_callbacks=true` to
enable or disable the ability to hook `ts_stat_statements` from
TimescaleDB.
In #6767 we introduced the ability to track job execution history
including succeeded and failed jobs.
The new metadata table `_timescaledb_internal.bgw_job_stat_history` has
two JSONB columns `config` (store config information) and `error_data`
(store the ErrorData information). The problem is that this approach is
not flexible for future history recording changes so this PR refactor
the current implementation to use only one JSONB column named `data`
that will store more job information in that form:
{
"job": {
"owner": "fabrizio",
"proc_name": "error",
"scheduled": true,
"max_retries": -1,
"max_runtime": "00:00:00",
"proc_schema": "public",
"retry_period": "00:05:00",
"initial_start": "00:05:00",
"fixed_schedule": true,
"schedule_interval": "00:00:30"
},
"config": {
"bar": 1
},
"error_data": {
"domain": "postgres-16",
"lineno": 841,
"context": "SQL statement \"SELECT 1/0\"\nPL/pgSQL function error(integer,jsonb) line 3 at PERFORM",
"message": "division by zero",
"filename": "int.c",
"funcname": "int4div",
"proc_name": "error",
"sqlerrcode": "22012",
"proc_schema": "public",
"context_domain": "plpgsql-16"
}
}
Previously we only handled the case of OSM chunk expanded as a child of
hypertable, so in the case of direct select it segfaulted while trying
to access an fdw_private which is managed by OSM.
Since #6325 we constify the watermark value of a CAgg during planning
time. Therefore, the planner calls the watermark function only once.
This PR removes the old code to cache the watermark value and speed up
multiple calls of the watermark function.
Use the same logic as PR 6773 while updating hypertable catalog tuples.
PR 6773 addresses chunk catalog updates. We first lock the tuple and
then modify the values and update the locked tuple. Replace
ts_hypertable_update with field specific APIs and use
hypertable_update_catalog_tuple calls consistently.
When foreign key support for compressed chunks was added we moved
the FK constraint from the uncompressed chunk to the compressed chunk as
part of compress_chunk and moved it back as part of decompress_chunk.
With the addition of partially compressed chunks in 2.10.x this approach
was no longer sufficient and the FK constraint needs to be present on
both the uncompressed and the compressed chunk.
While this patch will fix future compressed chunks a migration has to be
run after upgrading timescaledb to migrate existing chunks affected by
this.
The following code will fix any affected hypertables:
```
CREATE OR REPLACE FUNCTION pg_temp.constraint_columns(regclass, int2[]) RETURNS text[] AS
$$
SELECT array_agg(attname) FROM unnest($2) un(attnum) LEFT JOIN pg_attribute att ON att.attrelid=$1 AND att.attnum = un.attnum;
$$ LANGUAGE SQL SET search_path TO pg_catalog, pg_temp;
DO $$
DECLARE
ht_id int;
ht regclass;
chunk regclass;
con_oid oid;
con_frelid regclass;
con_name text;
con_columns text[];
chunk_id int;
BEGIN
-- iterate over all hypertables that have foreign key constraints
FOR ht_id, ht in
SELECT
ht.id,
format('%I.%I',ht.schema_name,ht.table_name)::regclass
FROM _timescaledb_catalog.hypertable ht
WHERE
EXISTS (
SELECT FROM pg_constraint con
WHERE
con.contype='f' AND
con.conrelid=format('%I.%I',ht.schema_name,ht.table_name)::regclass
)
LOOP
RAISE NOTICE 'Hypertable % has foreign key constraint', ht;
-- iterate over all foreign key constraints on the hypertable
-- and check that they are present on every chunk
FOR con_oid, con_frelid, con_name, con_columns IN
SELECT con.oid, con.confrelid, con.conname, pg_temp.constraint_columns(con.conrelid,con.conkey)
FROM pg_constraint con
WHERE
con.contype='f' AND
con.conrelid=ht
LOOP
RAISE NOTICE 'Checking constraint % %', con_name, con_columns;
-- check that the foreign key constraint is present on the chunk
FOR chunk_id, chunk IN
SELECT
ch.id,
format('%I.%I',ch.schema_name,ch.table_name)::regclass
FROM _timescaledb_catalog.chunk ch
WHERE
ch.hypertable_id=ht_id
LOOP
RAISE NOTICE 'Checking chunk %', chunk;
IF NOT EXISTS (
SELECT FROM pg_constraint con
WHERE
con.contype='f' AND
con.conrelid=chunk AND
con.confrelid=con_frelid AND
pg_temp.constraint_columns(con.conrelid,con.conkey) = con_columns
) THEN
RAISE WARNING 'Restoring constraint % on chunk %', con_name, chunk;
PERFORM _timescaledb_functions.constraint_clone(con_oid, chunk);
INSERT INTO _timescaledb_catalog.chunk_constraint(chunk_id, dimension_slice_id, constraint_name, hypertable_constraint_name) VALUES (chunk_id, NULL, con_name, con_name);
END IF;
END LOOP;
END LOOP;
END LOOP;
END
$$;
DROP FUNCTION pg_temp.constraint_columns(regclass, int2[]);
```
This PR is a little too big, but it proved difficult to split into parts
because they are all dependent.
* Move the vectorized aggregation into a separate plan node, which
simplifies working with targetlist in DecompressChunk node.
* Add a post-planning hook that replaces the normal partial aggregation
node with the vectorized aggregation node. The advantage of this
compared to planning on Path stage is that we know which columns support
bulk decompression and which filters are vectorized.
* Use the compressed batch API in vectorized aggregation. This
simplifies the code.
* Support vectorized aggregation after vectorized filters.
* Add a simple generic interface for vectorized aggregate functions. For
now the only function is still `sum(int4)`.
* The parallel plans are now used more often, maybe because the old code
didn't add costs for aggregation and just used the costs from
DecompressChunk, so the costs of parallel plans were less different. The
current code does the cost-based planning for normal aggregates, and
then after planning replaces them with vectorized, so now we basically
follow the plan choice that Postgres makes for the usual aggregation.
There is overhead associated with copying the heap tuple and (un)pinning
the respective heap buffers, which becomes apparent in vectorized
aggregation.
Instead of this, it is enough to copy the by-reference segmentby values
to the per-batch context.
Also we have to copy in the rare case where the compressed data is
inlined into the compressed row and not toasted.
It is not possible to automatically backport pull requests that makes
modifications to workflow files and since the codespell action has a
hard-coded list of ignored words as options, this means that any
changes to the ignored codespell words cannot be backported.
This pull request fixes this by using a ignore words file instead,
which means that adding new words does not require changing a workflow
file and hence the pull requests can be automatically backported.
The ts_hypertable_insert_blocker function was accessing data from the
trigger context before it was tested that a trigger context actually
exists. This led to a crash when the function was called directly.
Fixes: #6819
When a catalog corruption occurs, and a chunk does not contain any
dimension slices, we crash in ts_dimension_slice_cmp(). This patch adds
a proper check and errors out before the code path is called.
Add telemetry for tracking access methods used, number of pages for
each access method, and number of instances using each access method.
Also introduces a type-based function `ts_jsonb_set_value_by_type` that
can generate correct JSONB based on the PostgreSQL type. It will
generate "bare" values for numerics, and strings for anything else
using the output function for the type.
To test this for string values, we update `ts_jsonb_add_interval` to
use this new function, which is calling the output function for the
type, just like `ts_jsonb_set_value_by_type`.
With a recent change, we updated the lock on decompress_chunk
to take an AccessExclusiveLock on the uncompressed chunk at
the start of this potentially long running operation. Reducing
this lock to ExclusiveLock would enable reads to execute while
we are decompressing the chunk. AccessExclusive lock will be
taken on the compressed chunk at the end of the operation,
during its removal.
The function timescaledb_experimental.time_bucket_ng() has been
deprecated for two years. This PR removes it from the list of bucketing
functions supported in a CAgg. Existing CAggs using this function will
still be supported; however, no new CAggs using this function can be
created.
We don't really need it if we systematically use restrict on the
read/write objects.
This is a minor refactoring to avoid confusion, shouldn't actually
change any behavior or code generation.