Invalidate the catalog snapshot in the scanner to ensure that any
lookups into `pg_catalog` uses a snapshot that is consistent with the
snapshot used to scan TimescaleDB metadata.
This fixes an issue where a chunk could be looked up without having a
proper relid filled in, causing an assertion failure
(`ASSERT_IS_VALID_CHUNK`). When a chunk is scanned and found (in
`chunk_tuple_found()`), the Oid of the chunk table is filled in using
`get_relname_relid()`, which could return InvalidOid due to use of a
different snapshot when scanning `pg_class`. Calling
`InvalidateCatalogSnapshot()` before starting the metadata scan in
`Scanner` ensures the pg_catalog snapshot used is refreshed.
Due to the difficulty of reproducing this MVCC issue, no regression or
isolation test is provided, but it is easy to hit this bug when doing
highly concurrent COPY:s into a distributed hypertable.
The functions `PQconndefaults` and `PQmakeEmptyPGresult` calls
`malloc` and can return NULL if it fails to allocate memory for the
defaults and the empty result. It is checked with an `Assert`, but this
will be removed in production builds.
Replace the `Assert` with an checks to generate an error in production
builds rather than trying to de-reference the pointer and cause a
crash.
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
This patch changes the way user-defined FDW options (e.g., startup
costs, per-tuple costs) are handled. So far, these values were retrieved
in apply_fdw_and_server_options() but reset to default values afterward.
Problem:
When the guc timescaledb.license = 'timescale' is set in the conf file
and a SIGHUP is sent to postgress process and a reload of the tsl
module is triggered.
This reload happens in 2 phases 1. tsl_module_load is called which
will load the module only if not already loaded and 2.The
ts_module_init is called for every ts_license_guc_assign_hook
irrespective of if it is new load.This ts_module_init initialization
function also registers a on_proc_exit function to be called on exit.
The list of on_proc_exit methods are maintained in a fixed array
on_proc_exit_list of size MAX_ON_EXITS (20) which gets filled up on
repeated SIGHUPs and hence an error.
Fix:
The fix is to make the ts_module_init() register the on_proc_exit
callback, only in case the module is reloaded and not in every init
call.
Closes#5233
The copy fetcher fetches tuples in batches. When the last element in the
batch is the file trailer, the trailer was not handled correctly. The
existing logic did not perform a PQgetCopyData in that case. Therefore
the state of the fetcher was not set to EOF and the copy operation was
not correctly finished at this point.
Fixes: #5323
WHERE clause with SEGMENTBY column of type text/bytea
non-equality operators are not pushed down to Seq Scan
node of compressed chunk. This patch fixes this issue.
Fixes#5286
Make `partialize_agg()` support parallel query execution. To make this
work, the finalize node need combine the individual partials from each
parallel worker, but the final step that turns the resulting partial
into the finished aggregate should not happen. Thus, in the case of
distributed hypertables, each data node can run a parallel query to
compute a partial, and the access node can later combine and finalize
these partials into the final aggregate. Esssentially, there will be
one combine step (minus final) on each data node, and then another one
plus final on the access node.
To implement this, the finalize aggregate plan is simply modified to
elide the final step, and to reserialize the partial. It is only
possible to do this at the plan stage; if done at the path stage, the
PostgreSQL planner will hit assertions that assume that the node has
certain values (e.g., it doesn't expect combine Paths to skip the
final step).
Previously we used date_part("epoch", interval) and integer division
internally to determine whether the top cagg's interval is a
multiple of its parent's.
This led to precision loss and wrong results
in the case of intervals with sub-second components.
Fixed by using the `ts_interval_value_to_internal` function to convert
intervals to appropriate integer representation for division.
Fixes#5277
This release contains bug fixes since the 2.10.0 release.
We recommend that you upgrade at the next available opportunity.
**Bugfixes**
* #5159 Support Continuous Aggregates names in hypertable_(detailed_)size
* #5226 Fix concurrent locking with chunk_data_node table
* #5317 Fix some incorrect memory handling
* #5336 Use NameData and namestrcpy for names
* #5343 Set PortalContext when starting job
* #5360 Fix uninitialized bucket_info variable
* #5362 Make copy fetcher more async
* #5364 Fix num_chunks inconsistency in hypertables view
* #5367 Fix column name handling in old-style continuous aggregates
* #5378 Fix multinode DML HA performance regression
* #5384 Fix Hierarchical Continuous Aggregates chunk_interval_size
**Thanks**
* @justinozavala for reporting an issue with PL/Python procedures in the background worker
* @Medvecrab for discovering an issue with copying NameData when forming heap tuples.
* @pushpeepkmonroe for discovering an issue in upgrading old-style
continuous aggregates with renamed columns
* @pushpeepkmonroe for discovering an issue in upgrading old-style continuous aggregates with renamed columns
Concurrent insert into dist hypertable after a data node marked as
unavailable would produce 'tuple concurrently deleted` error.
The problem occurs because of missing tuple level locking during
scan and concurrent delete from chunk_data_node table afterwards,
which should be treated as `SELECT … FOR UPDATE` case instead.
Based on the fix by @erimatnor.
Fix#5153
When a Continuous Aggregate is created the `chunk_interval_size` is
defined my the `chunk_interval_size` of the original hypertable
multiplied by a fixed factor of 10.
The problem is currently when we create a Hierarchical Continuous
Aggregate the same factor is applied and it lead to an exponential
`chunk_interval_size`.
Fixed it by just copying the `chunk_interval_size` from the base
Continuous Aggregate for an Hierachical Continuous Aggreagate.
Fixes#5382
For continuous aggregates with the old-style partial aggregates
renaming columns that are not in the group-by clause will generate an
error when upgrading to a later version. The reason is that it is
implicitly assumed that the name of the column is the same as for the
direct view. This holds true for new-style continous aggregates, but is
not always true for old-style continuous aggregates. In particular,
columns that are not part of the `GROUP BY` clause can have an
internally generated name.
This commit fixes that by extracting the name of the column from the
partial view and use that when renaming the partial view column and the
materialized table column.
Make the copy fetcher more asynchronous by separating the sending of
the request for data from the receiving of the response. By doing
that, the async append node can send the request to each data node
before it starts reading the first response. This can massively
improve the performance because the response isn't returned until the
remote node has finished executing the query and is ready to return
the first tuple.
Different num_chunks values reported by
timescaledb_information.hypertables and
timescaledb_information.chunks.
View definition of hypertables was
not filtering dropped and osm_chunks.
Fixes#5338
This patch adds the functionality that is needed to perform distributed,
parallel joins on reference tables on access nodes. This code allows the
pushdown of a join if:
* (1) The setting "ts_guc_enable_per_data_node_queries" is enabled
* (2) The outer relation is a distributed hypertable
* (3) The inner relation is marked as a reference table
* (4) The join is a left join or an inner join
This PR introduces a timeout argument and a new logic to the
timescale_internal.ping_data_node() function which allows
to handle io timeouts for nodes being unresponsive.
Fix#5312
While running TimescaleDB under valgrind I've found
two cases of incorrect memory handling.
Case 1: When creating timescaledb extension, during the insertion of
metadata there is some junk in memory that is not zeroed
before writing there.
Changes in metadata.c fix this.
Case 2: When executing GRANT smth ON ALL TABLES IN SCHEMA some_schema
and deconstructing this statement into granting to individual tables,
process of copying names of those tables is wrong.
Currently, you aren't copying the data itself, but an address to data
on a page in some buffer. There's a problem - when the page in this
buffer changes, copied address would lead to wrong data.
Changes in process_utility.c fix this by allocating memory and then
copying needed relname there.
Fixes#5311
When executing functions, SPI assumes that `TopTransactionContext` is
used for atomic execution contexts and `PortalContext` is used for
non-atomic contexts. Since jobs need to be able to commit and start
transactions, they are executing in a non-atomic context hence
`PortalContext` will be used, but `PortalContext` is not set when
starting the job. This is not a problem for PL/PgSQL executor, but for
other executors (such as PL/Python) it would be.
This commit fixes the issue by setting the `PortalContext` variable to
the portal context created for the portal and restores it (to NULL)
after execution.
Fixes#5326
This release contains new features and bug fixes since the 2.9.3
release.
This release is high priority for upgrade. We strongly recommend that
you upgrade as soon as possible.
**Features**
* #5241 Allow RETURNING clause when inserting into compressed chunks
* #5245 Manage life-cycle of connections via memory contexts
* #5246 Make connection establishment interruptible
* #5253 Make data node command execution interruptible
* #5243 Enable real-time aggregation for continuous aggregates with
joins
* #5262 Extend enabling compression on a continuous aggregrate with
'compress_segmentby' and 'compress_orderby' parameters
**Bugfixes**
* #4926 Fix corruption when inserting into compressed chunks
* #5218 Add role-level security to job error log
* #5214 Fix use of prepared statement in async module
* #5290 Compression can't be enabled on continuous aggregates when
segmentby/orderby columns need quotation
* #5239 Fix next_start calculation for fixed schedules
Using `strlcpy` to copy variables holding PostgreSQL names can cause
issues since names are fixed-size types of length 64. This means that
any data that follows the initial null-terminated string will also be
part of the data.
Instead of using `const char*` for PostgreSQL names, use `NameData`
type for PostgreSQL names and use `namestrcpy` to copy them rather
than `strlcpy`.
This patch fixes several issues with next_start calculation.
- Previously, the offset was added twice in some cases.
This is fixed by this patch.
- Additionally, schedule intervals with month components
were not handled correctly.
Internally, time_bucket with origin is used to calculate
the next start. However, in the case of month intervals, the
timestamp calculated for a bucket is always aligned on the first
day of the month, regardless of origin.
Therefore, previously the result was aligned with origin by adding
the difference between origin and its respective time bucket.
This difference was computed as a fixed length interval in terms
of days and time. That computation led to incorrect computation of
next start occasionally, for example when a job should be executed on
the last day of a month.
That is fixed by adding an appropriate interval of months to
initial_start and letting Postgres handle this computation properly.
Fixes#5216
This release contains bug fixes since the 2.9.2 release.
This release is high priority for upgrade. We strongly recommend that you
upgrade as soon as possible.
**Bugfixes**
* #4804 Skip bucketing when start or end of refresh job is null
* #5108 Fix column ordering in compressed table index not following the order of a multi-column segment by definition
* #5187 Don't enable clang-tidy by default
* #5255 Fix year not being considered as a multiple of day/month in hierarchical continuous aggregates
* #5259 Lock down search_path in SPI calls
The function to execute remote commands on data nodes used a blocking
libpq API that doesn't integrate with PostgreSQL interrupt handling,
making it impossible for a user or statement timeout to cancel a
remote command.
Refactor the remote command execution function to use a non-blocking
API and integrate with PostgreSQL signal handling via WaitEventSets.
Partial fix for #4958.
Refactor remote command execution function
Previously all intervals were converted to seconds using "epoch"
with date_part. However, this treats a year as 365.25 days to
account for leap years, leading to the unexpected situation that
a year is not a multiple of a day or a month.
Fixed by treating month-only intervals as multiples of 30 days.
Fixes#5231
Broken code caused the async connection module to never send queries
using prepared statements. Instead, queries were sent using the
parameterized query statement instead.
Fix this so that prepared statements are used when created.
Refactor the data node connection establishment so that it is
interruptible, e.g., by ctrl-c or `statement_timeout`.
Previously, the connection establishment used blocking libpq calls. By
instead using asynchronous connection APIs and integrating with
PostgreSQL interrupt handling, the connection establishment can be
canceled by an interrupt caused by a statement timeout or a user.
Fixes#2757
Tie the life cycle of a data node connection to the memory context it
is created on. Previously, a data node connection was automatically
closed at the end of a transaction, although often a connection needs
to live beyond a single transaction. For example, a connection cache
is maintained for data node connections, and, for such cases, a flag
was set on a connection to avoid closing it automatically.
Instead of tying connections to transactions, they are now life-cycle
managed via memory contexts. This simplifies the handling of
connections and avoids having to create exceptions to closing
connections at transaction end.
Since the job error log can contain information from many different
sources and also from many different jobs it is important to ensure
that visibility of the job error log entries is restricted to job
owners.
This commit extend the view `timescaledb_information.job_errors` with
role-based checks so that a user can only see entries for jobs that she
has permission to view and restrict the permissions to
`_timescaledb_internal.job_errors` so that users only can view the job
error log through the view. A special case is added so that the
superuser and the database owner can see all log entries, even if there
is no associated job id with the log entry.
Closes#5217
Recent refactorings in the INSERT into compressed chunk code
path allowed us to support this feature but the check to
prevent users from using this feature was not removed as part
of that patch. This patch removes the blocker code and adds a
minimal test case.
When start or end for a refresh job is null, then it gives an
error while bucketing because start and end are already min and
max timestamp value before bucketing. Hence, skip bucketing for
this case.
Partly fixes#4117
Enable the support of having join in the query used for creating
the continuous aggregates. It has follwoing restrictions-
1. Join can involve only one hypertable and one normal table
2. Join should be a inner join
3. Join condition can only be equality
This release contains bug fixes since the 2.9.1 release.
We recommend that you upgrade at the next available opportunity.
**Bugfixes**
* #5114 Fix issue with deleting data node and dropping the database on multi-node
* #5133 Fix creating a CAgg on a CAgg where the time column is in a different order of the original hypertable
* #5152 Fix adding column with NULL constraint to compressed hypertable
* #5170 Fix CAgg on CAgg variable bucket size validation
* #5180 Fix default data node availability status on multi-node
* #5181 Fix ChunkAppend and ConstraintAwareAppend with TidRangeScan child subplan
* #5193 Fix repartition behavior when attaching data node on multi-node
Previous attempt to fix it (PR #5130) was not entirely correct because
the bucket width calculation for interval width was wrong.
Fixed it by properly calculate the bucket width for intervals using the
Postgres internal function `interval_part` to extract the epoch of the
interval and execute the validations. For integer widths use the already
calculated bucket width.
Fixes#5158, #5168
When deleting a data node with the option `drop_database=>true`, the
database is deleted even if the command fails.
Fix this behavior by dropping the remote database at the end of the
delete data node operation so that other checks fail first.
During the creation of a CAgg on top of another CAgg we check if the
bucket width is multiple of the parent and for this arithmetic we made
an assumption that picking just the `month` part of the `interval` for
variable bucket size was enought.
This assumption was wrong for bucket sizes that doesn't have the month
part (i.e: using timezones) leading to division by zero error because in
that case the `month` part of an `interval` is equal to 0 (zero).
Fixed it by properly calculating the bucket width for variable sized
buckets.
Fixes#5126
Creating a CAgg on a CAgg where the time column is in a different
order of the original hypertable was raising the following exception:
`ERROR: time bucket function must reference a hypertable dimension
column`
The problem was during the validation we're initializing internal data
structure with the wrong hypertable metadata.
Fixes#5131
Adding new column with NULL constraint to a compressed hypertable is
raising an error but it make no sense because NULL constraints in
Postgres does nothing, I mean it is useless and exist just for
compatibility with other database systems:
https://www.postgresql.org/docs/current/ddl-constraints.html#id-1.5.4.6.6
Fixed it by ignoring the NULL constraints when we check for `ALTER TABLE
.. ADD COLUMN ..` to a compressed hypertable.
Fixes#5151
This release contains bug fixes since the 2.9.0 release.
This release is high priority for upgrade. We strongly recommend that you
upgrade as soon as possible.
**Bugfixes**
* #5072 Fix CAgg on CAgg bucket size validation
* #5101 Fix enabling compression on caggs with renamed columns
* #5106 Fix building against PG15 on Windows
* #5117 Fix postgres server restart on background worker exit
* #5121 Fix privileges for job_errors in update script
This patch changes INSERTs into compressed chunks to no longer
be immediately compressed but stored in the uncompressed chunk
instead and later merged with the compressed chunk by a separate
job.
This greatly simplifies the INSERT-codepath as we no longer have
to rewrite the target of INSERTs and on-the-fly compress leading
to a roughly 2x improvement on INSERT rate into compressed chunk.
Additionally this improves TRIGGER-support for INSERTs into
compressed chunks.
This is a necessary refactoring to allow UPSERT/UPDATE/DELETE on
compressed chunks in follow-patches.
Add 2.9.0 to update test scripts and adjust downgrade scripts for
2.9.0. Additionally adjust CHANGELOG to sync with the actual release
CHANGELOG and add PG15 to CI.
Issue occurs in extended query protocol mode only where every
query goes through PREPARE and EXECUTE phase. First time ANALYZE
is executed, a list of relations to be vaccumed is extracted and
saved in a list. This list is referenced in parsetree node. Once
execution of ANALYZE is complete, this list is cleaned up, however
reference to the same is not cleaned up in parsetree node. When
second time ANALYZE is executed, segfault happens as we access an
invalid memory location.
Fixed the issue by restoring the actual value in parsetree node
once ANALYZE completes its execution.
Fixes#4857
This release adds major new features since the 2.8.1 release.
We deem it moderate priority for upgrading.
This release includes these noteworthy features:
* Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate)
* Improve `time_bucket_gapfill` function allowing specifying timezone to bucket
* Use `alter_data_node()` to change the data node configuration. This function introduces the option to configure the availability of the data node.
This release also includes several bug fixes.
**Features**
* #4476 Batch rows on access node for distributed COPY
* #4567 Exponentially backoff when out of background workers
* #4650 Show warnings when not following best practices
* #4664 Introduce fixed schedules for background jobs
* #4668 Hierarchical Continuous Aggregates
* #4670 Add timezone support to time_bucket_gapfill
* #4678 Add interface for troubleshooting job failures
* #4718 Add ability to merge chunks while compressing
* #4786 Extend the now() optimization to also apply to CURRENT_TIMESTAMP
* #4820 Support parameterized data node scans in joins
* #4830 Add function to change configuration of a data nodes
* #4966 Handle DML activity when datanode is not available
* #4971 Add function to drop stale chunks on a datanode
**Bugfixes**
* #4663 Don't error when compression metadata is missing
* #4673 Fix now() constification for VIEWs
* #4681 Fix compression_chunk_size primary key
* #4696 Report warning when enabling compression on hypertable
* #4745 Fix FK constraint violation error while insert into hypertable which references partitioned table
* #4756 Improve compression job IO performance
* #4770 Continue compressing other chunks after an error
* #4794 Fix degraded performance seen on timescaledb_internal.hypertable_local_size() function
* #4807 Fix segmentation fault during INSERT into compressed hypertable
* #4822 Fix missing segmentby compression option in CAGGs
* #4823 Fix a crash that could occur when using nested user-defined functions with hypertables
* #4840 Fix performance regressions in the copy code
* #4860 Block multi-statement DDL command in one query
* #4898 Fix cagg migration failure when trying to resume
* #4904 Remove BitmapScan support in DecompressChunk
* #4906 Fix a performance regression in the query planner by speeding up frozen chunk state checks
* #4910 Fix a typo in process_compressed_data_out
* #4918 Cagg migration orphans cagg policy
* #4941 Restrict usage of the old format (pre 2.7) of continuous aggregates in PostgreSQL 15.
* #4955 Fix cagg migration for hypertables using timestamp without timezone
* #4968 Check for interrupts in gapfill main loop
* #4988 Fix cagg migration crash when refreshing the newly created cagg
**Thanks**
* @jflambert for reporting a crash with nested user-defined functions.
* @jvanns for reporting hypertable FK reference to vanilla PostgreSQL partitioned table doesn't seem to work
* @kou for fixing a typo in process_compressed_data_out
* @xvaara for helping reproduce a bug with bitmap scans in transparent decompression
* @byazici for reporting a problem with segmentby on compressed caggs
* @tobiasdirksen for requesting Continuous aggregate on top of another continuous aggregate
* @xima for reporting a bug in Cagg migration