Allow the calls of time_bucket_gapfill to be executed at the
data nodes for improved query performance. With this, time_bucket_gapfill
is pushed to data nodes in the following conditions,
1. when only one data node has all the chunks
2. when space dimension does not overlap across data nodes
3. when group-by matches space dimension
The function `tsl_finalize_agg_ffunc` modified the aggregation state by
setting `trans_value` to the final result when computing the final
value. Since the state can be re-used several times, there could be
several calls to the finalization function, and the finalization
function would be confused when passed a final value instead of a
aggregation state transition value.
This commit fixes this by not modifying the `trans_value` when
computing the final value and instead just returns it (or the original
`trans_value` if there is no finalization function).
Fixes#3248
Currently only IMMUTABLE constraints will exclude chunks from an UPDATE plan,
with this patch STABLE expressions will be used to exclude chunks as well.
This is a big performance improvement as chunks not matching partitioning
column constraints don't have to be scanned for UPDATEs.
Since the codepath for UPDATE is different for PG < 14 this patch only adds
the optimization for PG14.
With this patch the plan for UPDATE on hypertables looks like this:
Custom Scan (HypertableModify) (actual rows=0 loops=1)
-> Update on public.metrics_int2 (actual rows=0 loops=1)
Update on public.metrics_int2 metrics_int2_1
Update on _timescaledb_internal._hyper_1_1_chunk metrics_int2
Update on _timescaledb_internal._hyper_1_2_chunk metrics_int2
Update on _timescaledb_internal._hyper_1_3_chunk metrics_int2
-> Custom Scan (ChunkAppend) on public.metrics_int2 (actual rows=0 loops=1)
Output: '123'::text, metrics_int2.tableoid, metrics_int2.ctid
Startup Exclusion: true
Runtime Exclusion: false
Chunks excluded during startup: 3
-> Seq Scan on public.metrics_int2 metrics_int2_1 (actual rows=0 loops=1)
Output: metrics_int2_1.tableoid, metrics_int2_1.ctid
Filter: (metrics_int2_1."time" = length(version()))
Before, we would complain that we don't support fetching the system
columns with per-data node queries enabled, but still execute the code
that fetches it. Don't do this and complain earlier.
Also remove unused code from compression_api. The function
policy_compression_get_verbose_log was unused. Moved it to
policy_utils and renamed to policy_get_verbose_log so that it can
be used by all policies.
We needlessly form/deform the heap tuples currently. Sometimes we do
need this when we have row marks and need a ctid (UPDATE RETURNING),
but not in this case. The implementation has three parts:
1. Change data fetcher interface to store a tuple into given slot
instead of returning a heap tuple.
2. Expose the creation of virtual tuple in tuple factory.
3. Use these facilities in row-by-row fetcher.
This gives some small speedup. It will become more important in the
future, as other parts of row-by-row fetcher are optimized.
This patch adds a .git-blame-ignore-revs that contains a list of
commits with bulk formatting changes to be ignored with git blame.
This file will be used by GitHub but to use it locally you need
to tell git about it eg. with the following command:
`git config blame.ignoreRevsFile .git-blame-ignore-revs`
Stop throwing exception with message "column of relation already exists"
when running the command ALTER TABLE ... ADD COLUMN IF NOT EXISTS ...
on compressed hypertables.
Fix#4087
Fix a crash that could corrupt indexes when running VACUUM FULL
pg_class.
The crash happens when caches are queried/updated within a cache
invalidation function, which can lead to corruption and recursive
cache invalidations.
To solve the issue, make sure the relcache invalidation callback is
simple and never invokes the relcache or syscache directly or
indirectly.
Some background: The extension is preloaded and thus have planner
hooks installed irrespective of whether the extension is actually
installed or not in the current database. However, the hooks need to
be disabled as long as the extension is not installed. To avoid always
having to dynamically check for the presence of the extension, the
state is cached in the session.
However, the cached state needs to be updated if the extension changes
(altered/dropped/created). Therefore, the relcache invalidation
callback mechanism is (ab)used in TimescaleDB to signal updates to the
extension state across all active backends.
The signaling is implemented by installing a dummy table as part of
the extension and any invalidation on the relid for that table signals
a change in the extension state. However, as of this change, the
actual state is no longer determined in the callback itself, since it
requires use of the relcache and causes the bad behavior. Therefore,
the only thing that remains in the callback after this change is to
reset the extension state.
The actual state is instead resolved on-demand, but can still be
cached when the extension is in the installed state and the dummy
table is present with a known relid. However, if the extension is not
installed, the extension state can no longer be cached as there is no
way to signal other backends that the state should be reset when they
don't know the dummy table's relid, and cannot resolve it from within
the callback itself.
Fixes#3924
If a session is started and loads (and caches, by OID) functions in the
extension to use them in, for example, a `SELECT` query on a continuous
aggregate, the extension will be marked as loaded internally.
If an `ALTER EXTENSION` is then executed in a separate session, it will
update `pg_extension` to hold the new version, and any other sessions
will see this as the new version, including the session that already
loaded the previous version of the shared library.
Since the pre-update session has loaded some functions from the old
version already, running the same queries with the old named functions
will trigger a reload of the new version of the shared library to get
the new functions (same name, but different OID), but since this has
already been loaded in a different version, it will trigger an error
that GUC variables are re-defined.
Further queries after that will then corrupt the database causing a
crash.
This commit fixes this by recording the version loaded rather than if
it has been loaded and check that the version did not change after a
query has been analyzed (in the `post_analyze_hook`). If the version
changed, it will generate a fatal error to force an abort of the
session.
Fixes#4191
Functions `elog` and `ereport` are unsafe to use in signal handlers
since they call `malloc`. This commit removes them from signal
handlers.
Fixes#4200
Currently only IMMUTABLE constraints will exclude chunks from a DELETE plan,
with this patch STABLE expressions will be used to exclude chunks as well.
This is a big performance improvement as chunks not matching partitioning
column constraints don't have to be scanned for DELETEs.
Additionally this improves usability of DELETEs on hypertables with some
chunks compressed. Previously you weren't able to do DELETE on those hypertables
which had non-IMMUTABLE constraints. Since the codepath for DELETE is
different for PG < 14 this patch only adds the optimization for PG14.
With this patch the plan for DELETE on hypertables looks like this:
Custom Scan (HypertableModify) (actual rows=0 loops=1)
-> Delete on metrics (actual rows=0 loops=1)
Delete on metrics metrics_1
Delete on _hyper_5_8_chunk metrics
Delete on _hyper_5_11_chunk metrics
Delete on _hyper_5_12_chunk metrics
Delete on _hyper_5_13_chunk metrics
Delete on _hyper_5_14_chunk metrics_2
-> Custom Scan (ChunkAppend) on metrics (actual rows=1 loops=1)
Chunks excluded during startup: 4
-> Seq Scan on metrics metrics_1 (actual rows=0 loops=1)
Filter: ("time" > (now() - '3 years'::interval))
-> Bitmap Heap Scan on _hyper_5_14_chunk metrics_2 (actual rows=1 loops=1)
Recheck Cond: ("time" > (now() - '3 years'::interval))
Heap Blocks: exact=1
-> Bitmap Index Scan on _hyper_5_14_chunk_metrics_time_idx (actual rows=1 loops=1)
Index Cond: ("time" > (now() - '3 years'::interval))
Improve the performance of metadata scanning during hypertable
expansion.
When a hypertable is expanded to include all children chunks, only the
chunks that match the query restrictions are included. To find the
matching chunks, the planner first scans for all matching dimension
slices. The chunks that reference those slices are the chunks to
include in the expansion.
This change optimizes the scanning for slices by avoiding repeated
open/close of the dimension slice metadata table and index.
At the same time, related dimension slice scanning functions have been
refactored along the same line.
An index on the chunk constraint metadata table is also changed to
allow scanning on dimension_slice_id. Previously, dimension_slice_id
was the second key in the index, which made scans on this key less
efficient.
In certain multi-node queries, we end up using a parameterized query
on the datanodes. If "timescaledb.enable_remote_explain" is enabled we
run an EXPLAIN on the datanode with the remote query. EXPLAIN doesn't
work with parameterized queries. So, we check for that case and avoid
invoking a remote EXPLAIN if so.
Fixes#3974
Reported and test case provided by @daydayup863
cmake > 3.10 is not packaged for some of the platforms we build
packages eg old ubuntu and debian version. Currently we modify
the CMakeLists.txt in those build environments and set the
minimum version to 3.10 already, which proofes that timescaledb
builds fine with cmake 3.10.
Route UPDATE on Hypertables through our custom HypertableModify
node. This patch by itself does not make any other changes to
UPDATE but is the foundation for other features regarding UPDATE
on hypertables.
PostgreSQL scan functions might allocate memory that needs to live for
the duration of the scan. This applies also to functions that are
called during the scan, such as getting the next tuple. To avoid
situations when such functions are accidentally called on, e.g., a
short-lived per-tuple context, add a explicit scan memory context to
the Scanner interface that wraps the PostgreSQL scan API.
Scan functions cannot be called on a per-tuple memory context as they
might allocate data that need to live until the end of the scan. Fix
this in a couple of places to ensure correct memory handling.
Fixes#4148, #4145
This patch changes the ConstraintAwareAppend EXPLAIN output to show
the number of chunks excluded instead of the number of chunks left.
The number of chunks left can be seen from other EXPLAIN output
while the actual number of exclusions that happened can not. This
also makes the output consistent with output of ChunkAppend.
This patch changes the organization of the ChunkAppend code. It
removes all header files except chunk_append/chunk_append.h.
It also merges exec.c and explain.c to remove unnecessary function
exports, since the code from explain.c was only used by exec.c
This patch exports the contain_param function in planner.c and
changes ChunkAppend to use that version instead of having two
implementations of that function.
Add workflow events to allow manually running Sqlsmith tests or when
pushing to the 'sqlsmith' branch. This is useful when submitting PRs
that one wants to run extra checks on, including Sqlsmith.
When running `performDeletion` is is necessary to have a valid relation
id, but when doing a lookup using `ts_hypertable_get_by_id` this might
actually return a hypertable entry pointing to a table that does not
exist because it has been deleted previously. In this case, only the
catalog entry should be removed, but it is not necessary to delete the
actual table.
This scenario can occur if both the hypertable and a compressed table
are deleted as part of running a `sql_drop` event, for example, if a
compressed hypertable is defined inside an extension. In this case, the
compressed hypertable (indeed all tables) will be deleted first, and
the lookup of the compressed hypertable will find it in the metadata
but a lookup of the actual table will fail since the table does not
exist.
Fixes#4140
sqlsmith is a random SQL query generator and very useful for finding
bugs in our implementation as it tests complex queries and thereby
hits codepaths and interactions between different features not tested
in our normal regression checks.
This patch changes the extension function list to include the
signature as well since functions with different signature are
separate objects in postgres. This also changes the list to include
all functions. Even though functions in internal schemas are not
considered public API they still need be treated the same as functions
in other schemas with regards to extension upgrade/downgrade.
This patch also moves the test to regresscheck-shared since we do
not dedicated database to run these tests.
Reorganize the code and fix minor bug that was not computing the size
of FSM, VM and INIT forks of the parent hypertable.
Fixed the bug by exposing the `ts_relation_size` function to the SQL
level to encapsulate the logic to compute `heap`, `indexes` and `toast`
sizes.
Current implementation iterate over fork types to calculate the size of
each one by calling `pg_relation_size` PostgreSQL function and other
calls to calculate indexes and table size (six function calls).
Improving it by halving PostgreSQL function calls to calculate the size
of the relations (now three function calls).
Add option `USE_TELEMETRY` that can be used to exclude telemetry from
the compile.
Telemetry-specific SQL is moved, which is only included when extension
is compiled with telemetry and the notice is changed so that the
message about telemetry is not printed when Telemetry is not compiled
in.
The following code is not compiled in when telemetry is not used:
- Cross-module functions for telemetry.
- Checks for telemetry job in job execution.
- GUC variables `telemetry_level` and `telemetry_cloud`.
Telemetry subsystem is not included when compiling without telemetry,
which requires some functions to be moved out of the telemetry
subsystem:
- Metadata handling is moved out of the telemetry module since it is
used not only with telemetry.
- UUID functions are moved into a separate module instead of being
part of the telemetry subsystem.
- Telemetry functions are either added or removed when updating from a
previous version.
Tests are updated to:
- Not use telemetry functions to get UUID or Metadata and instead use
the moved UUID and metadata functions.
- Not include telemetry information in tests that do not require it.
- Configuration files do not set telemetry variables when telemetry is
not compiled in.
- Replaced usage of telemetry functions in non-telemetry tests with
other sources of same information.
Fixes#3931
This patch changes the workflow to run apt-get update before
installing any packages in case the local package database is
outdated and references packages no longer available.
Smoke tests where missing critical files and some tests had changed
since last run and did not handle update smoke tests, so fixing all
necessary issues.
Chunk scan performance during querying is improved by avoiding
repeated open and close of relations and indexes when joining chunk
information from different metadata tables.
When executing a query on a hypertable, it is expanded to include all
its children chunks. However, during the expansion, the chunks that
don't match the query constraints should also be excluded. The
following changes are made to make the scanning and exclusion more
efficient:
* Ensure metadata relations and indexes are only opened once even
though metadata for multiple chunks are scanned. This avoids doing
repeated open and close of tables and indexes for each chunk
scanned.
* Avoid interleaving scans of different relations, ensuring better
data locality, and having, e.g., indexes warm in cache.
* Avoid unnecessary scans that repeat work already done.
* Ensure chunks are locked in a consistent order (based on Oid).
To enable the above changes, some refactoring was necessary. The chunk
scans that happen during constraint exclusion are moved into separate
source files (`chunk_scan.c`) for better structure and readability.
Some test outputs are affected due to the new ordering of chunks in
append relations.
Make the Scanner module more flexible by allowing optional control
over when the scanned relation is opened and closed. Relations can
then remain open over multiple scans, which can improve performance
and efficiency.
Closes#2173
As part of adding a scan iterator interface on top of the Scanner
module (commit 8baaa98), the internal scanner state that was
previously private, was made public. Now that it is public, it makes
more sense to make it part of the standard user-facing `ScannerCtx`
struct, which also simplifies the code elsewhere.
This patch splits the node name logic from the child path logic
to allow getting a string representation for any postgres node.
This adds a new function:
const char * ts_get_node_name(Path *path)
This patch doesn't add any new callers to the function but it will
be used in subsequent patches to produce more user friendly error
messages when unexpected node types are encountered during planning.
When inserting in a distributed hypertable with a query on a
distributed hypertable a segfault would occur when all the chunks
on the query would get pruned.