Allow users to specify that ranges (min/max values) be tracked
for a specific column using the enable_column_stats() API. We
will store such min/max ranges in a new timescaledb catalog table
_timescaledb_catalog.chunk_column_stats. As of now we support tracking
min/max ranges for smallint, int, bigint, serial, bigserial, date,
timestamp, timestamptz data types. Support for other stats for bloom
filters etc. will be added in the future.
We add an entry of the form (ht_id, invalid_chunk_id, col, -INF, +INF)
into this catalog to indicate that min/max values need to be calculated
for this column in a given hypertable for chunks. We also iterate
through existing chunks and add -INF, +INF entries for them in the
catalog. This allows for selection of these chunks by default since no
min/max values have been calculated for them.
This actual min-max start/end range is calculated later. One of the
entry points is during compression for now. The range is stored in
start (inclusive) and end (exclusive) form. If DML happens into a
compressed chunk then as part of marking it as partial, we also mark
the corresponding catalog entries as "invalid". So partial chunks do
not get excluded further. When recompression happens we get the new
min/max ranges from the uncompressed portion and try to reconcile the
ranges in the catalog based on these new values. This is safe to do in
case of INSERTs and UPDATEs. In case of DELETEs, since we are deleting
rows, it's possible that the min/max ranges change, but as of now we
err on the side of caution and retain the earlier values which can be
larger than the actual range.
We can thus store the min/max values for such columns in this catalog
table at the per-chunk level. Note that these min/max range values do
not participate in partitioning of the data. Such data ranges will be
used for chunk pruning if the WHERE clause of an SQL query specifies
ranges on such a column.
Note that Executor startup time chunk exclusion logic is also able to
use this metadata effectively.
A "DROP COLUMN" on a column with a statistics tracking enabled on it
ends up removing all relevant entries from the catalog tables.
A "decompress_chunk" on a compressed chunk removes its entries from the
"chunk_column_stats" catalog table since now it's available for DML.
Also a new "disable_column_stats" API has been introduced to allow
removal of min/max entries from the catalog for a specific column.
This is a small refactoring for getting time bucket function Oid from
a view definition. It will be necessary for a following PRs for
completely remove the uncessary catalog metadata table
`continuous_aggs_bucket_function`.
Also added a new SQL function `cagg_get_bucket_function_info` to return
all `time_bucket` information based on a user view definition.
In #6624 we refactored the time bucket catalog table to make it more
generic and save information for all Continuous Aggregates. Previously
it stored only variable bucket size information.
The problem is we used the `regprocedure` type to store the OID of the
given time bucket function but unfortunately it is not supported by
`pg_upgrade`.
Fixed it by changing the column to TEXT and resolve to/from OID using
builtin `regprocedurein` and `format_procedure_qualified` functions.
Fixes#6935
We shouldnt reuse job ids to make it easy to recognize the job
log entries for a job. We also need to keep the old job around
to not break loading dumps from older versions.
The function time_bucket_ng is deprecated. This PR adds a migration path
for existing CAggs. Since time_bucket and time_bucket_ng use different
origin values, a custom origin is set if needed to let time_bucket
create the same buckets as created by time_bucket_ng so far.
In #6767 we introduced the ability to track job execution history
including succeeded and failed jobs.
The new metadata table `_timescaledb_internal.bgw_job_stat_history` has
two JSONB columns `config` (store config information) and `error_data`
(store the ErrorData information). The problem is that this approach is
not flexible for future history recording changes so this PR refactor
the current implementation to use only one JSONB column named `data`
that will store more job information in that form:
{
"job": {
"owner": "fabrizio",
"proc_name": "error",
"scheduled": true,
"max_retries": -1,
"max_runtime": "00:00:00",
"proc_schema": "public",
"retry_period": "00:05:00",
"initial_start": "00:05:00",
"fixed_schedule": true,
"schedule_interval": "00:00:30"
},
"config": {
"bar": 1
},
"error_data": {
"domain": "postgres-16",
"lineno": 841,
"context": "SQL statement \"SELECT 1/0\"\nPL/pgSQL function error(integer,jsonb) line 3 at PERFORM",
"message": "division by zero",
"filename": "int.c",
"funcname": "int4div",
"proc_name": "error",
"sqlerrcode": "22012",
"proc_schema": "public",
"context_domain": "plpgsql-16"
}
}
In #4678 we added an interface for troubleshoting job failures by
logging it in the metadata table `_timescaledb_internal.job_errors`.
With this PR we extended the existing interface to also store succeeded
executions. A new GUC named `timescaledb.enable_job_execution_logging`
was added to control this new behavior and the default value is `false`.
We renamed the metadata table to `_timescaledb_internal.bgw_job_stat_history`
and added a new view `timescaledb_information.job_history` to users that
have enough permissions can check the job execution history.
This changes the behavior of the CAgg catalog tables. From now on, all
CAggs that use a time_bucket function create an entry in the catalog
table continuous_aggs_bucket_function. In addition, the duplicate
bucket_width attribute is removed from the catalog table continuous_agg.
Historically, we have used an empty string for undefined values in the
catalog table continuous_aggs_bucket_function. Since #6624, the optional
arguments can be NULL. This patch cleans up the empty strings and
changes the logic to work with NULL values.
So far, bucket_origin was defined as a Timestamp but used as a
TimestampTz in many places. This commit changes this and unifies the
usage of the variable.
The catalog table continuous_aggs_bucket_function is currently only used
for variable bucket sizes. Information about the fixed-size buckets is
stored in the table continuous_agg only. This causes some problems
(e.g., we have redundant fields for the bucket_size, fixes size buckets
with offsets are not supported, ...).
This commit is the first in a row of commits that refactor the catalog
for the CAgg time_bucket function. The goals are:
* Remove the CAgg redundant attributes in the catalog
* Create an entry in continuous_aggs_bucket_function for all CAggs
that use time_bucket
This first commit refactors the continuous_aggs_bucket_function table
and prepares it for more generic use. Not all attributes are used yet,
but these will change in follow-up PRs.
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.
In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.
Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.
Closes#6570
This patch deprecates the recompress_chunk procedure as all that
functionality is covered by compress_chunk now. This patch also adds a
new optional boolean argument to compress_chunk to force applying
changed compression settings to existing compressed chunks.
If a lot of chunks are involved then the current pl/pgsql function
to compute the size of each chunk via a nested loop is pretty slow.
Additionally, the current functionality makes a system call to get the
file size on disk for each chunk everytime this function is called.
That again slows things down. We now have an approximate function which
is implemented in C to avoid the issues in the pl/pgsql function.
Additionally, this function also uses per backend caching using the
smgr layer to compute the approximate size cheaply. The PG cache
invalidation clears off the cached size for a chunk when DML happens
into it. That size cache is thus able to get the latest size in a
matter of minutes. Also, due to the backend caching, any long running
session will only fetch latest data for new or modified chunks and can
use the cached data (which is calculated afresh the first time around)
effectively for older chunks.
Logging and caching related tables from the timescaledb extension
should not be dumped using pg_dump. Our scripts specify a few such
unwanted tables. Apart from being unnecessary, the "job_errors" had
some restricted permissions causing additional problems in pg_dump.
We now don't include such tables for dumping.
Fixes#5449
This patch changes the dump configuration for
_timescaledb_catalog.metadata to include all entries. To allow loading
logical dumps with this configuration an insert trigger is added that
turns uniqueness conflicts into updates to not block the restore.
This patch implements changes to the compressed hypertable to allow per
chunk configuration. To enable this the compressed hypertable can no
longer be in an inheritance tree as the schema of the compressed chunk
is determined by the compression settings. While this patch implements
all the underlying infrastructure changes, the restrictions for changing
compression settings remain intact and will be lifted in a followup patch.
The extension state is not easily accessible in release builds, which
makes debugging issue with the loader very difficult. This commit
introduces a new schema `_timescaledb_debug` and makes the function
`ts_extension_get_state` available also in release builds as
`_timescaledb_debug.extension_state`.
See #1682
Remove the code used by multinode to handle remote connections.
This patch completely removes tsl/src/remote and any remaining
distributed hypertable checks.
This patch drops the catalog table _timescaledb_catalog.hypertable_compression
and stores those settings in _timescaledb_catalog.compression_settings instead.
The storage format is changed and the new table will have 1 entry per relation
instead of 1 entry per column and has no dependancy on hypertables.
All other aspects of compression will remain the same. This is refactoring is
to enable per chunk compression settings in a follow-up patch.
If a hypertable uses a non-default tablespace for its primary or
unique constraints with additional DEFERRABLE or INITIALLY DEFERRED
characteristics then any chunk creation will fail with syntax error. We
now set the tablespace via a separate command for such constraints for
the chunks.
Fixes#6338
The retention and compression policies can now use drop_created_before
and compress_created_before arguments respectively to specify chunk
selection using their creation times.
We don't support creation times for CAggs, yet.
With this function is possible to execute the Continuous Aggregate query
validation over an arbitrary query string, without the need to actually
create the Continuous Aggregate.
It can be used, for example, to check for most frequent queries maybe
using `pg_stat_statements`, validate them and check if there are queries
that potenttialy can turned into a Continuous Aggregate.
If users have accidentally been removed from `pg_authid` as a result of
bugs where dropping a user did not revoke privileges from all tables
where the had privileges, it will not be possible to create new chunks
since these require the user to be found when copying the privileges
for the parent table (either compressed hypertable or normal
hypertable).
To fix the situation, we repair the `pg_class` table when updating the
extension by modifying the `relacl` for relations and remove any user
that do not have an entry in `pg_authid`.
A repair function `_timescaledb_functions.repair_relation_acls` is
added that will perform the job. A `makeaclitem` from PG16 that accepts
a list of comma and used as part of the repair is also added as
`_timescaledb_functions.makeaclitem`.
- Updated show_chunks, drop_chunks APIs to get the affected
chunks using chunk creation time metadata based on the
"date/time/interval" like boundary specified for the INTEGER
columns.
- We honor "integer_now" function if it's specified so as to keep
backwards compatibility with the existing behavior
Co-authored-by: Dipesh Pandit <dipesh@timescale.com>
When we compress a chunk, we create a new compressed chunk for storing
the compressed data. So far, the tuples were just inserted into the
compressed chunk and frozen by a later vacuum run.
However, freezing tuples causes WAL activity can be optimized because
the compressed chunk is created in the same transaction as the tuples.
This patch reduces the WAL activity by storing these tuples directly as
frozen and preventing a freeze operation in the future. This approach is
similar to PostgreSQL's COPY FREEZE.
- Added creation_time attribute to timescaledb catalog table "chunk".
Also, updated corresponding view timescaledb_information.chunks to
include chunk_creation_time attribute.
- A newly created chunk is assigned the creation time during chunk
creation to handle new partition range for give dimension (Time/
SERIAL/BIGSERIAL/INT/...).
- In case of an already existing chunk, the creation time is updated as
part of running upgrade script. The current timestamp (now()) at the
time of upgrade has been assigned as chunk creation time.
- Similarly, downgrade script is updated to drop the attribute
creation_time from catalog table "chunk".
- All relevant queries/views/test output have been updated accordingly.
Co-authored-by: Nikhil Sontakke <nikhil@timescale.com>
The current hypertable creation interface is heavily focused on a time
column, but since hypertables are focused on partitioning of not only
time columns, we introduce a more generic API that support different
types of keys for partitioning.
The new interface introduced new versions of create_hypertable,
add_dimension, and a replacement function `set_partitioning_interval`
that replaces `set_chunk_time_interval`. The new functions accept an
instance of dimension_info that can be constructed using constructor
functions `by_range` and `by_hash`, allowing a more versatile and
future-proof API.
For examples:
SELECT create_hypertable('conditions', by_range('time'));
SELECT add_dimension('conditions', by_hash('device'));
The old API remains, but will eventually be deprecated.
This commit introduces a function `hypertable_osm_range_update`
in the _timescaledb_functions schema. This function is meant to serve
as an API call for the OSM extension to update the time range
of a hypertable's OSM chunk with the min and max values present
in the contiguous time range its tiered chunks span.
If the range is not contiguous, then it must be set to the invalid
range an OSM chunk is assigned upon creation.
A new status field is also introduced in the hypertable catalog
table to keep track of whether the ranges covered by tiered and
non-tiered chunks overlap.
When there is no overlap detected then it is possible to apply the
Ordered Append optimization in the presence of OSM chunks.
With timescaledb 2.12 all the functions present in _timescaledb_internal
were moved into the _timescaledb_functions schema to improve schema
security. This patch adds a compatibility layer so external callers
of these internal functions will not break and allow for more
flexibility when migrating.