Since the job error log can contain information from many different
sources and also from many different jobs it is important to ensure
that visibility of the job error log entries is restricted to job
owners.
This commit extend the view `timescaledb_information.job_errors` with
role-based checks so that a user can only see entries for jobs that she
has permission to view and restrict the permissions to
`_timescaledb_internal.job_errors` so that users only can view the job
error log through the view. A special case is added so that the
superuser and the database owner can see all log entries, even if there
is no associated job id with the log entry.
Closes#5217
Enable users create Hierarchical Continuous Aggregates (aka Continuous
Aggregates on top of another Continuous Aggregates).
With this PR users can create levels of aggregation granularity in
Continuous Aggregates making the refresh process even faster.
A problem with this feature can be in upper levels we can end up with
the "average of averages". But to get the "real average" we can rely on
"stats_aggs" TimescaleDB Toolkit function that calculate and store the
partials that can be finalized with other toolkit functions like
"average" and "sum".
Closes#1400
Add a new function, `alter_data_node()`, which can be used to change
the data node's configuration originally set up via `add_data_node()`
on the access node.
The new functions introduces a new option "available" that allows
configuring the availability of the data node. Setting
`available=>false` means that the node should no longer be used for
reads and writes. Only read "failover" is implemented as part of this
change, however.
To fail over reads, the alter data node function finds all the chunks
for which the unavailable data node is the "primary" query target and
"fails over" to a chunk replica on another data node instead. If some
chunks do not have a replica to fail over to, a warning will be
raised.
When a data node is available again, the function can be used to
switch back to using the data node for queries.
Closes#2104
This patch adds two new fields to the telemetry report,
`stats_by_job_type` and `errors_by_sqlerrcode`. Both report results
grouped by job type (different types of policies or
user defined action).
The patch also adds a new field to the `bgw_job_stats` table,
`total_duration_errors` to separate the duration of the failed runs
from the duration of successful ones.
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.
From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
Currently, the next start of a scheduled background job is
calculated by adding the `schedule_interval` to its finish
time. This does not allow scheduling jobs to execute at fixed
times, as the next execution is "shifted" by the job duration.
This commit introduces the option to execute a job on a fixed
schedule instead. Users are expected to provide an initial_start
parameter on which subsequent job executions are aligned. The next
start is calculated by computing the next time_bucket of the finish
time with initial_start origin.
An `initial_start` parameter is added to the compression, retention,
reorder and continuous aggregate `add_policy` signatures. By passing
that upon policy creation users indicate the policy will execute on
a fixed schedule, or drifting schedule if `initial_start` is not
provided.
To allow users to pick a drifting schedule when registering a UDA,
an additional parameter `fixed_schedule` is added to `add_job`
to allow users to specify the old behavior by setting it to false.
Additionally, an optional TEXT parameter, `timezone`, is added to both
add_job and add_policy signatures, to address the 1-hour shift in
execution time caused by DST switches. As internally the next start of
a fixed schedule job is calculated using time_bucket, the timezone
parameter allows using timezone-aware buckets to calculate
the next start.
This commit gives more visibility into job failures by making the
information regarding a job runtime error available in an extension
table (`job_errors`) that users can directly query.
This commit also adds an infromational view on top of the table for
convenience.
To prevent the `job_errors` table from growing too large,
a retention job is also set up with a default retention interval
of 1 month. The retention job is registered with a custom check
function that requires that a valid "drop_after" interval be provided
in the config field of the job.
The primary key for compression_chunk_size was defined as chunk_id,
compressed_chunk_id but other places assumed chunk_id is actually
unique and would error when it was not. Since it makes no sense
to have multiple entries per chunk since that reference would be
to a no longer existing chunk the primary key is changed to chunk_id
only with this patch.
Timescale 2.7 released a new version of Continuous Aggregate (#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.
When upgrading to Timescale 2.7, new created Continuous Aggregates
are using the new format, but existing Continuous Aggregates keep
using the format they were defined with.
Created a procedure to upgrade existing Continuous Aggregates from
the old format to the new format, by calling a simple procedure:
test=# CALL cagg_migrate('conditions_summary_daily');
Closes#4424
Old patch was using old validation functions, but there are already
validation functions that both read and validate the policy, so using
those. Also removing the old `job_config_check` function since that is
no longer use and instead adding a `job_config_check` that calls the
checking function with the configuration.
OSM chunks manage their ranges and the timescale
catalog has dummy ranges for these dimensions.
So the chunk exclusion logic cannot rely on the
timescaledb catalog metadata to exclude an OSM chunk.
Add a new metadata table `dimension_partition` which explicitly and
statefully details how a space dimension is split into partitions, and
(in the case of multi-node) which data nodes are responsible for
storing chunks in each partition. Previously, partition and data nodes
were assigned dynamically based on the current state when creating a
chunk.
This is the first in a series of changes that will add more advanced
functionality over time. For now, the metadata table simply writes out
what was previously computed dynamically in code. Future code changes
will alter the behavior to do smarter updates to the partitions when,
e.g., adding and removing data nodes.
The idea of the `dimension_partition` table is to minimize changes in
the partition to data node mappings across various events, such as
changes in the number of data nodes, number of partitions, or the
replication factor, which affect the mappings. For example, increasing
the number of partitions from 3 to 4 currently leads to redefining all
partition ranges and data node mappings to account for the new
partition. Complete repartitioning can be disruptive to multi-node
deployments. With stateful mappings, it is possible to split an
existing partition without affecting the other partitions (similar to
partitioning using consistent hashing).
Note that the dimension partition table expresses the current state of
space partitions; i.e., the space-dimension constraints and data nodes
to be assigned to new chunks. Existing chunks are not affected by
changes in the dimension partition table, although an external job
could rewrite, move, or copy chunks as desired to comply with the
current dimension partition state. As such, the dimension partition
table represents the "desired" space partitioning state.
Part of #4125
In `src/ts_catalog/catalog.c` we explicit define some constraints and
indexes names into `catalog_table_index_definitions` array, but in our
pre-install SQL script for schema definition we don't, so let's be more
explicit here and prevent future surprises.
First step to remove the re-aggregation for Continuous Aggregates
is to remove the `chunk_id` from the materialization hypertable.
Also added new metadata column named `finalized` to `continuous_cagg`
catalog table in order to store information about the new following
finalized version of Continuous Aggregates that will not need the
partials anymore. This flag is important to maintain backward
compatibility with previous Continuous Aggregate implementation that
requires the `chunk_id` to refresh data properly.
Postgres will prepend pg_temp to the effective search_path if it
is not present in the search_path. While pg_temp will never be
used to look up functions or operators unless explicitly requested
pg_temp will be used to look up relations. Putting pg_temp in
search_path makes sure objects in pg_temp will be considered last
and pg_temp cannot be used to mask existing objects.
Improve the performance of metadata scanning during hypertable
expansion.
When a hypertable is expanded to include all children chunks, only the
chunks that match the query restrictions are included. To find the
matching chunks, the planner first scans for all matching dimension
slices. The chunks that reference those slices are the chunks to
include in the expansion.
This change optimizes the scanning for slices by avoiding repeated
open/close of the dimension slice metadata table and index.
At the same time, related dimension slice scanning functions have been
refactored along the same line.
An index on the chunk constraint metadata table is also changed to
allow scanning on dimension_slice_id. Previously, dimension_slice_id
was the second key in the index, which made scans on this key less
efficient.
This patch locks down search_path in extension install and update
scripts to only contain pg_catalog, this requires that any reference
in those scripts is fully qualified. Additionally we add explicit
create commands to all update scripts for objects added to the
public schema. This change will make update scripts fail if a
function with identical signature already exists when installing
or upgrading instead reusing the existing object.
TimescaleDB was vulnerable to a privilege escalation attack in
the extension installation script. An attacker could precreate
objects normally owned by the extension and get those objects
used in the installation script since the script would only try
to create them if they did not already exist. Thanks to Pedro
Gallegos for reporting the problem.
This patch changes the schema, table and function creation to fail
and abort the installation when the object already exists instead
of using the existing object.
Security: CVE-2022-24128
This patch allows using time_bucket_ng("N month", ...) in CAGGs. Users can also
specify years, or months AND years. CAGGs on top of distributed hypertables
are supported as well.
A chunk copy/move operation is carried out in stages and it can
fail in any of them. We track the last completed stage in the
"chunk_copy_operation" catalog table. In case of failure, a
"chunk_copy_cleanup" function can be invoked to bring the chunk back
to its original state on the source datanode and all transient objects
like replication slot, publication, subscription, empty chunk, metadata
updates, etc are cleaned up.
Includes test case changes for each and every stage induced failure.
To avoid confusion between chunk copy activity and chunk copy operation
this patch also consistently uses "operation" everywhere now instead of
"activity"
Remove copy_chunk_data() function and code needed to support it,
such as the 'transactional' argument.
Rework copy chunk logic using separate stages.
Introduce copy_chunk() API function as an internal wrapper for
the move_chunk().
The building blocks required for implementing end-to-end copy/move
chunk functionality have now been wrapped in a procedure.
A procedure is required because multiple transactions are needed to
carry out the activity across the access node and the involved two data
nodes.
The following steps are encapsulated in this procedure
1) Create an empty chunk table on the destination data node
2) Copy the data from the src data node chunk to this newly created
destination node chunk. This is done via inbuilt PostgreSQL logical
replication functionality
3) Attach this chunk to the hypertable on the dst data node
4) Remove this chunk from the src data node to complete the move if
requested
A new catalog table "chunk_copy_activity" has been added to track
the progress of the above stages. A unique id gets assigned to each
activity and it is updated with the completed stages as things
progress.
The `replication_factor` is set to `-1` on hypertables that are
created on data nodes as part of a larger distributed
hypertable. However, the check constraint on the hypertable metadata
table doesn't allow such values, causing update scripts to fail when
this check constraint is recreated as part of updating to version
`2.0.0-rc4`.
The reason it is possible to insert violating rows is because check
constraints aren't validated when inserting data using PostgreSQL's
internal catalog functions (in C). Therefore, the violating row can
exist until one tries to update a data node to `2.0.0-rc4`, at which
point the update script tries to recreate the `hypertable` metadata
table due to other changes that were made to the table.
This change fixes the check constraint to account for `-1` as a valid
value, and also changes the update scripts to account for the new
check constraint so that updates to the latest version will no longer
fail.
1. Add compression_state column for hypertable catalog
by renaming compressed column for the hypertable catalog
table. compression_state is a tri-state column.
This column indicates if the hypertable has
compression enabled (value = 1) or if it is an internal
compression table (value = 2).
2. Save compression settings on access node when compression
is turned on for a distributed hypertable
For a distributed hypertable, that has compression enabled,
compression_state is set. We don't create any internal tables
on the access node.
Fixes#2660
The `modification_time` column is hard to maintain with any level of
consistency over merges and splits of invalidation ranges so this
commit removes it from the invalidation log entries for both
hypertables and continuous aggregates. If the modification time is
needed in the future, we need to re-introduce it in a manner that can
maintain it over both merges and splits.
THe function `ts_get_now_internal` is also removed since it is not used
any more.
Part of #2521
This patch splits the timescaledb_fdw sql file into two parts to
separate the idempotent parts from the non-idempotent ones so
the function definitions can be included in the regular update
script.
This change removes the catalog options `refresh_lag`,
`max_interval_per_job` and `ignore_invalidation_older_than`, which are
no longer used.
Closes#2396
The parameter `cascade_to_materialization` is removed from
`drop_chunks` and `add_drop_chunks_policy` as well as associated tables
and test functions.
Fixes#2137
This patch adds a proc_name, proc_schema, hypertable_id index to
bgw_job. 3 functions using the new index are added as well:
ts_bgw_job_find_by_proc
ts_bgw_job_find_by_hypertable_id
ts_bgw_job_find_by_proc_and_hypertable_id
These functions are required for migrating the existing policies
to store their configuration in bgw_job directly.
This commit removes the `cascade` option from the function
`drop_chunks` and `add_drop_chunk_policy`, which will now never cascade
drops to dependent objects. The tests are fixed accordingly and
verbosity turned up to ensure that the dependent objects are printed in
the error details.
The timescale clustering code so far has been written referring to the
remote databases as 'servers'. This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest. In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database. Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.
As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes. This change has updated the code to rename
those instances.
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.
Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.
Here are some examples.
Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`
Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`
Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`
This change adds the `force` argument to `detach_server` as well. If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
This commit adds the ability to resolve whether or not 2PC
transactions have been committed or aborted and also adds a heal
function to resolve transactions that have been prepared but not
committed or rolled back.
This commit also removes the server id of the primary key on the
remote_txn table and adds another index. This was done because the
`remote_txn_persistent_record_exists` should not rely on the server
being contacted but should rather just check for the existance of the
id. This makes the resolution safe to setups where two frontend server
definitions point to the same database. While this may not be a
properly configured setup, it's better if the resolution process is
robust to this case.
The remote_txn table records commit decisions for 2pc transactions.
A successful 2pc transaction will have one row per remote connection
recorded in this table. In effect it is a mapping between the
distributed transaction and an identifier for each remote connection.
The records are needed to protect against crashes after a
frontend send a `COMMIT TRANSACTION` to one node
but not all nodes involved in the transaction. Towards this end,
the commitment of remote_txn rows represent a crash-safe irrevocable
promise that all participating datanodes will eventually get a `COMMIT
TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`.
The irrevocable nature of the commit of these records means that this
can only happen after the system is sure all participating transactions
will succeed. Thus it can only happen after all datanodes have succeeded
on a `PREPARE TRANSACTION` and will happen as part of the frontend's
transaction commit..
The remote transaction ID is used in two phase commit. It is the
identifier sent to the datanodes in PREPARE TRANSACTION and related
postgresql commands.
This is the first in a series of commits for adding two phase
commit support to our distributed txn infrastructure.