When a compression_policy is executed by a background worker, the policy
should continue to execute even if compressing or decompressing one of
the chunks fails.
Fixes: #4610
Calling `ts_dist_cmd_invoke_on_data_nodes_using_search_path()` function
without an active transaction allows connection invalidation event
happen between applying `search_path` and the actual command
execution, which leads to an error.
This change introduces a way to ignore connection cache invalidations
using `remote_connection_cache_invalidation_ignore()` function.
This work is based on @nikkhils original fix and the problem research.
Fix#4022
This commit gives more visibility into job failures by making the
information regarding a job runtime error available in an extension
table (`job_errors`) that users can directly query.
This commit also adds an infromational view on top of the table for
convenience.
To prevent the `job_errors` table from growing too large,
a retention job is also set up with a default retention interval
of 1 month. The retention job is registered with a custom check
function that requires that a valid "drop_after" interval be provided
in the config field of the job.
After migrate a Continuous Aggregate from the old format to the new
using `cagg_migrate` procedure we end up with the following problems:
* Refresh policy is not copied from the OLD to the NEW cagg;
* Compression setting is not copied from the OLD to the NEW cagg.
Fixed it by properly copying the refresh policy and setting the
`timescaledb.compress=true` flag to the new CAGG.
Fix#4710
Instances upgraded to 2.8.0 will end up with a wrong check constraint
in catalog table `continuous_aggregate_migrate_plan_step`.
Fixed it by removing and adding the constraint with the correct checks.
Fix#4727
Removed the underline character prefix '_' from the parameter names of
the procedure `cagg_migrate`. The new signature is:
cagg_migrate(
IN cagg regclass,
IN override boolean DEFAULT false,
IN drop_old boolean DEFAULT false
)
The primary key for compression_chunk_size was defined as chunk_id,
compressed_chunk_id but other places assumed chunk_id is actually
unique and would error when it was not. Since it makes no sense
to have multiple entries per chunk since that reference would be
to a no longer existing chunk the primary key is changed to chunk_id
only with this patch.
This patch adds a new time_bucket_gapfill function that
allows bucketing in a specific timezone.
You can gapfill with explicit timezone like so:
`SELECT time_bucket_gapfill('1 day', time, 'Europe/Berlin') ...`
Unfortunately this introduces an ambiguity with some previous
call variations when an untyped start/finish argument was passed
to the function. Some queries might need to be adjusted and either
explicitly name the positional argument or resolve the type ambiguity
by casting to the intended type.
This release adds major new features since the 2.7.2 release.
We deem it moderate priority for upgrading.
This release includes these noteworthy features:
* time_bucket now supports bucketing by month, year and timezone
* Improve performance of bulk SELECT and COPY for distributed hypertables
* 1 step CAgg policy management
* Migrate Continuous Aggregates to the new format
**Features**
* #4188 Use COPY protocol in row-by-row fetcher
* #4307 Mark partialize_agg as parallel safe
* #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE
* #4384 Add schedule_interval to policies
* #4390 Faster lookup of chunks by point
* #4393 Support intervals with day component when constifying now()
* #4397 Support intervals with month component when constifying now()
* #4405 Support ON CONFLICT ON CONSTRAINT for hypertables
* #4412 Add telemetry about replication
* #4415 Drop remote data when detaching data node
* #4416 Handle TRUNCATE TABLE on chunks
* #4425 Add parameter check_config to alter_job
* #4430 Create index on Continuous Aggregates
* #4439 Allow ORDER BY on continuous aggregates
* #4443 Add stateful partition mappings
* #4484 Use non-blocking data node connections for COPY
* #4495 Support add_dimension() with existing data
* #4502 Add chunks to baserel cache on chunk exclusion
* #4545 Add hypertable distributed argument and defaults
* #4552 Migrate Continuous Aggregates to the new format
* #4556 Add runtime exclusion for hypertables
* #4561 Change get_git_commit to return full commit hash
* #4563 1 step CAgg policy management
* #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill
* #4642 Add timezone support to time_bucket
**Bugfixes**
* #4359 Create composite index on segmentby columns
* #4374 Remove constified now() constraints from plan
* #4416 Handle TRUNCATE TABLE on chunks
* #4478 Synchronize chunk cache sizes
* #4486 Adding boolean column with default value doesn't work on compressed table
* #4512 Fix unaligned pointer access
* #4519 Throw better error message on incompatible row fetcher settings
* #4549 Fix dump_meta_data for windows
* #4553 Fix timescaledb_post_restore GUC handling
* #4573 Load TSL library on compressed_data_out call
* #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function
* #4577 Fix segfaults in compression code with corrupt data
* #4580 Handle default privileges on CAggs properly
* #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA
* #4583 Fix partitioning functions
* #4589 Fix rename for distributed hypertable
* #4601 Reset compression sequence when group resets
* #4611 Fix a potential OOM when loading large data sets into a hypertable
* #4624 Fix heap buffer overflow
* #4627 Fix telemetry initialization
* #4631 Ensure TSL library is loaded on database upgrades
* #4646 Fix time_bucket_ng origin handling
* #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause.
**Thanks**
* @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function
* @Creatation for reporting an issue with renaming hypertables
* @janko for reporting an issue when adding bool column with default value to compressed hypertable
* @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk
* @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates
* @mwahlhuetter for reporting error in joins in RETURNING clause
* @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable
* @PBudmark for reporting an issue with dump_meta_data.sql on Windows
* @ssmoss for reporting an issue with time_bucket_ng origin handling
This PR introduces a new `distributed` argument to the
create_hypertable() function as well as two new GUC's to
control its default behaviour: timescaledb.hypertable_distributed_default
and timescaledb.hypertable_replication_factor_default.
The main idea of this change is to allow automatic creation
of the distributed hypertables by default.
Timescale 2.7 released a new version of Continuous Aggregate (#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.
When upgrading to Timescale 2.7, new created Continuous Aggregates
are using the new format, but existing Continuous Aggregates keep
using the format they were defined with.
Created a procedure to upgrade existing Continuous Aggregates from
the old format to the new format, by calling a simple procedure:
test=# CALL cagg_migrate('conditions_summary_daily');
Closes#4424
Previously users had no way to update the check function
registered with add_job. This commit adds a parameter check_config
to alter_job to allow updating the check function field.
Also, previously the signature expected from a check was of
the form (job_id, config) and there was no validation
that the check function given had the correct signature.
This commit removes the job_id as it is not required and
also checks that the check function has the correct signature
when it is registered with add_job, preventing an error being
thrown at job runtime.
Old patch was using old validation functions, but there are already
validation functions that both read and validate the policy, so using
those. Also removing the old `job_config_check` function since that is
no longer use and instead adding a `job_config_check` that calls the
checking function with the configuration.
OSM chunks manage their ranges and the timescale
catalog has dummy ranges for these dimensions.
So the chunk exclusion logic cannot rely on the
timescaledb catalog metadata to exclude an OSM chunk.
At the time of adding or updating policies, it is
checked if the policies are compatible with each
other and to those already on the CAgg.
These checks are:
- refresh and compression policies should not overlap
- refresh and retention policies should not overlap
- compression and retention policies should not overlap
Co-authored-by: Markos Fountoulakis <markos@timescale.com>
-Add infinity for refresh window range
Now to create open ended refresh policy
use +/- infinity for end_offset and star_offset
respectivly for the refresh policy.
-Add remove_all_policies function
This will remove all the policies on a given
CAgg.
-Remove parameter refresh_schedule_interval
-Fix downgrade scripts
-Fix IF EXISTS case
Co-authored-by: Markos Fountoulakis <markos@timescale.com>
This simplifies the process of adding the policies
for the CAggs. Now, with one single sql statements
all the policies can be added for a given CAgg.
Similarly, all the policies can be removed or modified
via single sql statement only.
This also adds a new function as well as a view to show all
the policies on a continuous aggregate.
Add a new metadata table `dimension_partition` which explicitly and
statefully details how a space dimension is split into partitions, and
(in the case of multi-node) which data nodes are responsible for
storing chunks in each partition. Previously, partition and data nodes
were assigned dynamically based on the current state when creating a
chunk.
This is the first in a series of changes that will add more advanced
functionality over time. For now, the metadata table simply writes out
what was previously computed dynamically in code. Future code changes
will alter the behavior to do smarter updates to the partitions when,
e.g., adding and removing data nodes.
The idea of the `dimension_partition` table is to minimize changes in
the partition to data node mappings across various events, such as
changes in the number of data nodes, number of partitions, or the
replication factor, which affect the mappings. For example, increasing
the number of partitions from 3 to 4 currently leads to redefining all
partition ranges and data node mappings to account for the new
partition. Complete repartitioning can be disruptive to multi-node
deployments. With stateful mappings, it is possible to split an
existing partition without affecting the other partitions (similar to
partitioning using consistent hashing).
Note that the dimension partition table expresses the current state of
space partitions; i.e., the space-dimension constraints and data nodes
to be assigned to new chunks. Existing chunks are not affected by
changes in the dimension partition table, although an external job
could rewrite, move, or copy chunks as desired to comply with the
current dimension partition state. As such, the dimension partition
table represents the "desired" space partitioning state.
Part of #4125
In the session timescaledb_post_restore() was called the value for
timescaledb.restoring might not be changed because the reset_val
for the GUC was still on. We have to use explicit SET in this
session to adjust the GUC.
This release is a patch release. We recommend that you upgrade at the
next available opportunity.
Among other things this release fixes several memory leaks, handling
of TOASTed values in GapFill and parameter handling in prepared statements.
**Bugfixes**
* #4517 Fix prepared statement param handling in ChunkAppend
* #4522 Fix ANALYZE on dist hypertable for a set of nodes
* #4526 Fix gapfill group comparison for TOASTed values
* #4527 Handle stats properly for range types
* #4532 Fix memory leak in function telemetry
* #4534 Use explicit memory context with hash_create
* #4538 Fix chunk creation on hypertables with non-default statistics
**Thanks**
* @3a6u9ka, @bgemmill, @hongquan, @stl-leonid-kalmaev and @victor-sudakov for reporting a memory leak
* @hleung2021 and @laocaixw for reporting an issue with parameter handling in prepared statements
A chunk in frozen state cannot be dropped.
drop_chunks will skip over frozen chunks without erroring.
Internal api , drop_chunk will error if you attempt to drop
a chunk without unfreezing it.
This PR also adds a new internal API to unfreeze a chunk.
This PR introduces a new SQL function to associate a
hypertable or continuous agg with a custom job. If
this dependency is setup, the job is automatically
deleted when the hypertable/cagg is dropped.
Add _timescaledb_internal.attach_osm_table_chunk.
This treats a pre-existing foreign table as a
hypertable chunk by adding dummy metadata to the
catalog tables.
In `src/ts_catalog/catalog.c` we explicit define some constraints and
indexes names into `catalog_table_index_definitions` array, but in our
pre-install SQL script for schema definition we don't, so let's be more
explicit here and prevent future surprises.
Add a parameter `drop_remote_data` to `detach_data_node()` which
allows dropping the hypertable on the data node when detaching
it. This is useful when detaching a data node and then immediately
attaching it again. If the data remains on the data node, the
re-attach will fail with an error complaining that the hypertable
already exists.
The new parameter is analogous to the `drop_database` parameter of
`delete_data_node`. The new parameter is `false` by default for
compatibility and ensures that a data node can be detached without
requiring communicating with the data node (e.g., if the data node is
not responding due to a failure).
Closes#4414
Add a parameter `schedule_interval` to retention and
compression policies to allow users to define the schedule
interval. Fall back to previous default if no value is
specified.
Fixes#3806
Postgres knows whether a given aggregate is parallel-safe, and creates
parallel aggregation plans based on that. The `partialize_agg` is a
wrapper we use to perform partial aggregation on data nodes. It is a
pure function that produces serialized aggregation state as a result.
Being pure, it doesn't influence parallel safety. This means we don't
need to mark it parallel-unsafe to artificially disable the parallel
plans for partial aggregation. They will be chosen as usual based on
the parallel-safety of the underlying aggregate function.
This release adds major new features since the 2.6.1 release.
We deem it moderate priority for upgrading.
This release includes these noteworthy features:
* Optimize continuous aggregate query performance and storage
* The following query clauses and functions can now be used in a continuous
aggregate: FILTER, DISTINCT, ORDER BY as well as [Ordered-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)
and [Hypothetical-Set Aggregate](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE)
* Optimize now() query planning time
* Improve COPY insert performance
* Improve performance of UPDATE/DELETE on PG14 by excluding chunks
This release also includes several bug fixes.
If you are upgrading from a previous version and were using compression
with a non-default collation on a segmentby-column you should recompress
those hypertables.
**Features**
* #4045 Custom origin's support in CAGGs
* #4120 Add logging for retention policy
* #4158 Allow ANALYZE command on a data node directly
* #4169 Add support for chunk exclusion on DELETE to PG14
* #4209 Add support for chunk exclusion on UPDATE to PG14
* #4269 Continuous Aggregates finals form
* #4301 Add support for bulk inserts in COPY operator
* #4311 Support non-superuser move chunk operations
* #4330 Add GUC "bgw_launcher_poll_time"
* #4340 Enable now() usage in plan-time chunk exclusion
**Bugfixes**
* #3899 Fix segfault in Continuous Aggregates
* #4225 Fix TRUNCATE error as non-owner on hypertable
* #4236 Fix potential wrong order of results for compressed hypertable with a non-default collation
* #4249 Fix option "timescaledb.create_group_indexes"
* #4251 Fix INSERT into compressed chunks with dropped columns
* #4255 Fix option "timescaledb.create_group_indexes"
* #4259 Fix logic bug in extension update script
* #4269 Fix bad Continuous Aggregate view definition reported in #4233
* #4289 Support moving compressed chunks between data nodes
* #4300 Fix refresh window cap for cagg refresh policy
* #4315 Fix memory leak in scheduler
* #4323 Remove printouts from signal handlers
* #4342 Fix move chunk cleanup logic
* #4349 Fix crashes in functions using AlterTableInternal
* #4358 Fix crash and other issues in telemetry reporter
**Thanks**
* @abrownsword for reporting a bug in the telemetry reporter and testing the fix
* @jsoref for fixing various misspellings in code, comments and documentation
* @yalon for reporting an error with ALTER TABLE RENAME on distributed hypertables
* @zhuizhuhaomeng for reporting and fixing a memory leak in our scheduler
Postgres changes the internal state format for numeric aggregates
which we materialize in caggs in PG14. This will invalidate the
affected columns when upgrading from PG13 to PG14. This patch
adds a warning to the update script when we encounter this
configuration.
The non-superuser needs to have REPLICATION privileges atleast. A
new function "subscription_cmd" has been added to allow running
subscription related commands on datanodes. This function implicitly
upgrades to the bootstrapped superuser and then performs subscription
creation/alteration/deletion commands. It only accepts subscriptions
related commands and errors out otherwise.
Allow users to specify an explicit "operation_id" while carrying out
a copy/move operation. If it's specified then that is used as the
identifier for the copy/move operation. Otherwise, am implicit id as
before gets created and used.
Add an internal api to drop a single chunk.
This function drops the storage and metadata
associated with the chunk.
Note that chunk dependencies are not affected.
e.g. Continuous aggs are not updated when this chunk
is dropped.
This is an internal function to freeze a chunk
for PG14 and later.
This function sets a chunk status to frozen.
Operations that modify the chunk data
(like insert, update, delete) are not
supported. Frozen chunks can be dropped.
Additionally, chunk status is cached as part of
classify_relation.
First step to remove the re-aggregation for Continuous Aggregates
is to remove the `chunk_id` from the materialization hypertable.
Also added new metadata column named `finalized` to `continuous_cagg`
catalog table in order to store information about the new following
finalized version of Continuous Aggregates that will not need the
partials anymore. This flag is important to maintain backward
compatibility with previous Continuous Aggregate implementation that
requires the `chunk_id` to refresh data properly.
Postgres will prepend pg_temp to the effective search_path if it
is not present in the search_path. While pg_temp will never be
used to look up functions or operators unless explicitly requested
pg_temp will be used to look up relations. Putting pg_temp in
search_path makes sure objects in pg_temp will be considered last
and pg_temp cannot be used to mask existing objects.
This was not caught earlier and is not currently caught by CI
because the check for unqualified casts is currently only in main
branch of pgspot and not yet in the tagged version we use as
part of PR checks.
post_update_cagg_try_repair was created with `CREATE OR REPLACE`
instead of CREATE. Additionally the procedure was created in the
public schema. This patch adjusts the procedure to be created
with `CREATE` and in a timescaledb internal schema.
Found by pgspot.
The query to get the list of saved privileges during extension
upgrade had a bug and only applying the classoid restriction
for a subset of the entries when it should have been applied to
all returned rows leading to a failure during extension update
when init privileges for other classoids existed on any of the
relevant objects.
Add the missing variables to the finalization view of Continuous
Aggregates and the corresponding columns to the materialization table.
Cover the case of targets that contain Aggref nodes and Var nodes
that are outside of the Aggref nodes at the same time.
Stop rebuilding the Continuous Aggregate view with ALTER MATERIALIZED
VIEW. Attempt to repair the view at post-update time instead, and fail
gracefully if it is not possible to do so without raw hypertable schema
or data modifications.
Stop rebuilding the Continuous Aggregate view when switching realtime
aggregation on and off. Instead, manipulate the User View by either:
1. removing the UNION ALL right-hand side and the WHERE clause when
disabling realtime aggregation
2. adding the Direct View to the right of a UNION ALL operator and
defining WHERE clauses with the relevant watermark checks when
enabling realtime aggregation
Fixes#3898
This release is patch release. We recommend that you upgrade at the next available opportunity.
**Bugfixes**
* #3974 Fix remote EXPLAIN with parameterized queries
* #4122 Fix segfault on INSERT into distributed hypertable
* #4142 Ignore invalid relid when deleting hypertable
* #4159 Fix ADD COLUMN IF NOT EXISTS error on compressed hypertable
* #4161 Fix memory handling during scans
* #4186 Fix owner change for distributed hypertable
* #4192 Abort sessions after extension reload
* #4193 Fix relcache callback handling causing crashes
**Thanks**
* @abrownsword for reporting a crash in the telemetry reporter
* @daydayup863 for reporting issue with remote explain