The materialised hypertable resides in the _timescaledb.internal schema
which resulted in permission error at the time of manual index creation
by non super user. To solve this, it now switches to timescaledb user
before index creation of CAgg.
Fixes#4735
This commit gives more visibility into job failures by making the
information regarding a job runtime error available in an extension
table (`job_errors`) that users can directly query.
This commit also adds an infromational view on top of the table for
convenience.
To prevent the `job_errors` table from growing too large,
a retention job is also set up with a default retention interval
of 1 month. The retention job is registered with a custom check
function that requires that a valid "drop_after" interval be provided
in the config field of the job.
After migrate a Continuous Aggregate from the old format to the new
using `cagg_migrate` procedure we end up with the following problems:
* Refresh policy is not copied from the OLD to the NEW cagg;
* Compression setting is not copied from the OLD to the NEW cagg.
Fixed it by properly copying the refresh policy and setting the
`timescaledb.compress=true` flag to the new CAGG.
Fix#4710
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."
This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().
Fixes#4398
Don't include the used database ports into test output as this
will lead to failing tests when running against a local instance
or against a preconfigured cloud instance.
Using multiple different configurations in a single target will
not work when running against a local instance or when running
against a preconfigured cloud instance. With recent adjustments
to the test cleanup this should not be needed anymore and if we
really need different configuration we should make it a separate
target to make it compatible with instances configured outside
of pg_regress.
Patch #4425 introduced regression test failures, namely, a crash
in function `ts_bgw_job_update_by_id`. The failures are due to the
COMMIT statement in the custom check procedure. This patch removes
that particular test case from bgw_custom.
The out of background worker test in bgw_db_scheduler is flaky and
fails very often, especially in the 32bit environment and on windows.
This patch removes that specific test from bgw_db_scheduler. If we
want to test this specific part of the scheduler this should be
better rewritten in an isolation test.
Removed the underline character prefix '_' from the parameter names of
the procedure `cagg_migrate`. The new signature is:
cagg_migrate(
IN cagg regclass,
IN override boolean DEFAULT false,
IN drop_old boolean DEFAULT false
)
Timescale 2.8 released a migration path from the old format of
Continuous Aggregate to the new format (#4552).
Unfortunately it lacks of proper tests when a non-superuser execute the
migration. For a non-superuser execute the migration properly it
requires SELECT/INSERT/UPDATE permissions in some catalog objects:
* _timescaledb_catalog.continuous_agg_migrate_plan
* _timescaledb_catalog.continuous_agg_migrate_plan_step
* _timescaledb_catalog.continuous_agg_migrate_plan_step_step_id_seq
Improved the regression tests to cover the lack of permissions in the
catalog objects for non-superusers.
The scheduler detects the following three types of job failures:
1.Jobs that fail to launch (due to shortage of background workers)
2.Jobs that throw a runtime error
3.Jobs that crash due to a process crashing
In cases 2 and 3, additive backoff is applied in calculating the next
start time of a failed job.
In case 1 we previously retried to launch all jobs that failed to launch
simultaneously.
This commit introduces exponential backoff in case 1,
randomly selecting a wait time in [2, 2 + 2^f] seconds at microsecond granularity.
The aim is to reduce the collision probability for jobs that compete
for a background worker. The maximum backoff value is 1 minute.
It does not change the behavior for cases 2 and 3.
Fixes#4562
When using a custom ENUM data type for compressed hypertable on the GROUP BY
clause raises an error.
Fixed it by generating scan paths for the query by checking if the SEGMENT BY
column is a custom ENUM type and then report a valid error message.
Fixes#3481
This PR introduces a new `distributed` argument to the
create_hypertable() function as well as two new GUC's to
control its default behaviour: timescaledb.hypertable_distributed_default
and timescaledb.hypertable_replication_factor_default.
The main idea of this change is to allow automatic creation
of the distributed hypertables by default.
Timescale 2.7 released a new version of Continuous Aggregate (#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.
When upgrading to Timescale 2.7, new created Continuous Aggregates
are using the new format, but existing Continuous Aggregates keep
using the format they were defined with.
Created a procedure to upgrade existing Continuous Aggregates from
the old format to the new format, by calling a simple procedure:
test=# CALL cagg_migrate('conditions_summary_daily');
Closes#4424
Previously users had no way to update the check function
registered with add_job. This commit adds a parameter check_config
to alter_job to allow updating the check function field.
Also, previously the signature expected from a check was of
the form (job_id, config) and there was no validation
that the check function given had the correct signature.
This commit removes the job_id as it is not required and
also checks that the check function has the correct signature
when it is registered with add_job, preventing an error being
thrown at job runtime.
Old patch was using old validation functions, but there are already
validation functions that both read and validate the policy, so using
those. Also removing the old `job_config_check` function since that is
no longer use and instead adding a `job_config_check` that calls the
checking function with the configuration.
The OSM chunk registers a dummy primary dimension range
in the TimescaleDB catalog. Use the max interval of
the dimension instead of the min interval i.e
use range like [Dec 31 294246 PST, infinity).
Otherwise, policies can try to apply the policy on an OSM
chunk.
Add test with policies for OSM chunks
Fix ALTER TABLE RENAME TO command execution on a distributed
hypertable, make sure data node list is set and command is
executed on the data nodes.
Fix#4491
OSM chunks manage their ranges and the timescale
catalog has dummy ranges for these dimensions.
So the chunk exclusion logic cannot rely on the
timescaledb catalog metadata to exclude an OSM chunk.
Not cleaning up created databases will prevent multiple
regresschecklocal runs against the same instance cause it will
block recreating the test users as they are still referenced in
those databases.
Make truncating a uncompressed chunk drop the data for the case where
they reside in a corresponding compressed chunk.
Generate invalidations for Continuous Aggregates after TRUNCATE, so
as to have consistent refresh operations on the materialization
hypertable.
Fixes#4362
Change chunk_utils_internal test to not use oid but instead use the
role name. Using the oid can lead to failing tests when oid assignment
is different especially when run with regresschecklocal-t.
The sequence number of the compressed tuple is per segment by grouping
and should be reset when the grouping changes to prevent overflows with
many segmentby columns.
If a default privilege is configured and applied to a given Continuous
Aggregate during it creation just the user view has the ACL properly
configured but the underlying materialization hypertable no leading to
permission errors.
Fixed it by copying the privileges from the user view to the
materialization hypertable during the Continous Aggregate creation.
Fixes#4555
At the time of adding or updating policies, it is
checked if the policies are compatible with each
other and to those already on the CAgg.
These checks are:
- refresh and compression policies should not overlap
- refresh and retention policies should not overlap
- compression and retention policies should not overlap
Co-authored-by: Markos Fountoulakis <markos@timescale.com>
-Add infinity for refresh window range
Now to create open ended refresh policy
use +/- infinity for end_offset and star_offset
respectivly for the refresh policy.
-Add remove_all_policies function
This will remove all the policies on a given
CAgg.
-Remove parameter refresh_schedule_interval
-Fix downgrade scripts
-Fix IF EXISTS case
Co-authored-by: Markos Fountoulakis <markos@timescale.com>
This simplifies the process of adding the policies
for the CAggs. Now, with one single sql statements
all the policies can be added for a given CAgg.
Similarly, all the policies can be removed or modified
via single sql statement only.
This also adds a new function as well as a view to show all
the policies on a continuous aggregate.
When a table is added to an inheritance hierrachy, PG checks
if all check constraints are present on this table. When a OSM chunk
is added as a child of a hypertable with constraints,
make sure that all check constraints are replicated on the child OSM
chunk as well.
Add a new metadata table `dimension_partition` which explicitly and
statefully details how a space dimension is split into partitions, and
(in the case of multi-node) which data nodes are responsible for
storing chunks in each partition. Previously, partition and data nodes
were assigned dynamically based on the current state when creating a
chunk.
This is the first in a series of changes that will add more advanced
functionality over time. For now, the metadata table simply writes out
what was previously computed dynamically in code. Future code changes
will alter the behavior to do smarter updates to the partitions when,
e.g., adding and removing data nodes.
The idea of the `dimension_partition` table is to minimize changes in
the partition to data node mappings across various events, such as
changes in the number of data nodes, number of partitions, or the
replication factor, which affect the mappings. For example, increasing
the number of partitions from 3 to 4 currently leads to redefining all
partition ranges and data node mappings to account for the new
partition. Complete repartitioning can be disruptive to multi-node
deployments. With stateful mappings, it is possible to split an
existing partition without affecting the other partitions (similar to
partitioning using consistent hashing).
Note that the dimension partition table expresses the current state of
space partitions; i.e., the space-dimension constraints and data nodes
to be assigned to new chunks. Existing chunks are not affected by
changes in the dimension partition table, although an external job
could rewrite, move, or copy chunks as desired to comply with the
current dimension partition state. As such, the dimension partition
table represents the "desired" space partitioning state.
Part of #4125
This change allows to create new dimensions even with
existing chunks.
It does not modify any existing data or do migration,
instead it creates full-range (-inf/inf) dimension slice for
existing chunks in order to be compatible with newly created
dimension.
All new chunks created after this will follow logic of the new
dimension and its partitioning.
Fix: #2818
Users often execute TopN like queries over Continuous Aggregates and
now with the release 2.7 such queries are even faster because we
remove the re-aggregation and don't store partials anymore.
Also the previous PR #4430 gave us the ability to create indexes
direct on the aggregated columns leading to performance improvements.
But there are a noticable performance difference between
`Materialized-Only` and `Real-Time` Continuous Aggregates for TopN
queries.
Enabling the ORDER BY clause in the Continuous Aggregates definition
result in:
1) improvements of the User Experience that can use this so commom
clause in SELECT queries
2) performance improvements because we give the planner a chance to
use the MergeAppend node by producing ordered datasets.
Closes#4456
When a query has multiple distributed hypertables the row-by-by
fetcher cannot be used. This patch changes the fetcher selection
logic to throw a better error message in those situations.
Previously the following error would be produced in those situations:
unexpected PQresult status 7 when starting COPY mode
Enables adding a boolean column with default value to a compressed table.
This limitation was occurring due to the internal representation of default
boolean values like 'True' or 'False', hence more checks are added for this.
Fixes#4486
On macOS zcat expects the file to end in .Z appending that extension
when the supplied filename does not have it. Leading to the following
error for the dist_copy_long test:
zcat: can't stat: data/prices-10k-random-1.tsv.gz
(data/prices-10k-random-1.tsv.gz.Z): No such file or directory
This patch changes the dist_copy_long test to use the shell to read
the file instead and use input redirection so zcat never sees the
filename.
The current check where we deem a DN incompatible if it's on a newer
version is exactly the opposite of what we want it to be. Fix that and
also add relevant test cases.
A chunk in frozen state cannot be dropped.
drop_chunks will skip over frozen chunks without erroring.
Internal api , drop_chunk will error if you attempt to drop
a chunk without unfreezing it.
This PR also adds a new internal API to unfreeze a chunk.