The copy test is flaky because some test data is generated
dynamically based on the current date. This patch changes the data
generation to a time series with fixed dates.
This commit contains extends our telemetry system with function call
telemetry. It gathers function call-counts from all queries, and send
back counts for those functions that are built in or from our related
extensions.
A number of TimescaleDB functions internally call `AlterTableInternal`
to modify tables or indexes. For instance, `compress_chunk` and
`attach_tablespace` act as DDL commands to modify
hypertables. However, crashes occur when these functions are called
via `SELECT * INTO FROM <function_name>` or the equivalent `CREATE
TABLE AS` statement. The crashes happen because these statements are
considered process utility commands and therefore sets up an event
trigger context for collecting commands. However, the event trigger
context is not properly set up to record alter table statements in
this code path, thus causing the crashes.
To prevent crashes, wrap `AlterTableInternal` with the event trigger
functions to properly initialize the event trigger context.
This patch solves a crash in the multi-buffer copy optimization,
which was introduced in commit
bbb2f414d2090efd2d8533b464584157860ce49a.
This patch handles closed chunks (e.g., caused by timescaledb.max_open_
chunks_per_insert) properly. The problem is addressed by:
1) Re-reading the ChunkInsertState before the data is stored, which
ensures that the underlying table is open.
2) A TSCopyMultiInsertBuffer is deleted after the data of the buffer
is flushed. So, operations like table_finish_bulk_insert are
executed and the associated chunk can properly be closed.
This patch adds a test for attnum consistency to our update scripts.
When attnum between fresh install and updated install is different
the updated installation will not be able to correctly process affected
catalog tables.
Following work started by #4294 to improve performance of Continuous
Aggregates by removing the re-aggregation in the user view.
This PR get rid of `partialize_agg` and `finalize_agg` aggregate
functions and store the finalized aggregated (plain) data in the
materialization hypertable.
Because we're not storing partials anymore and removed the
re-aggregation, now is be possible to create indexes on aggregated
columns in the materialization hypertable in order to improve the
performance even more.
Also removed restrictions on types of aggregates users can perform
with Continuous Aggregates:
* aggregates with DISTINCT
* aggregates with FILTER
* aggregates with FILTER in HAVING clause
* aggregates without combine function
* ordered-set aggregates
* hypothetical-set aggregates
By default new Continuous Aggregates will be created using this new
format, but the previous version (with partials) will be supported.
Users can create the previous style by setting to `false` the storage
paramater named `timescaledb.finalized` during the creation of the
Continuous Aggregate.
Fixes#4233
This implements an optimization to allow now() expression to be
used during plan time chunk exclusions. Since now() is stable it
would not normally be considered for plan time chunk exclusion.
To enable this behaviour we convert `column > now()` expressions
into `column > const AND column > now()`. Assuming that time
always moves forward this is safe even for prepared statements.
This optimization works for SELECT, UPDATE and DELETE.
On hypertables with many chunks this can lead to a considerable
speedup for certain queries.
The following expressions are supported:
- column > now()
- column >= now()
- column > now() - Interval
- column > now() + Interval
- column >= now() - Interval
- column >= now() + Interval
Interval must not have a day or month component as those depend
on timezone settings.
Some microbenchmark to show the improvements, I did best of five
for all of the queries.
-- hypertable with 1k chunks
-- with optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 3.090 ms
-- without optimization
select * from metrics1k where time > now() - '5m'::interval;
Time: 145.640 ms
-- hypertable with 5k chunks
-- with optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 4.317 ms
-- without optimization
select * from metrics5k where time > now() - '5m'::interval;
Time: 775.259 ms
-- hypertable with 10k chunks
-- with optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 4.853 ms
-- without optimization
select * from metrics10k where time > now() - '5m'::interval;
Time: 1766.319 ms (00:01.766)
-- hypertable with 20k chunks
-- with optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 6.141 ms
-- without optimization
select * from metrics20k where time > now() - '5m'::interval;
Time: 3321.968 ms (00:03.322)
Speedup with 1k chunks: 47x
Speedup with 5k chunks: 179x
Speedup with 10k chunks: 363x
Speedup with 20k chunks: 540x
The setup scripts for upgrade/downgrade tests of Continuous Aggregates
has too many duplicated code for pre-2.0 tests. Refactor it a bit
removing the duplicated code by using `\if \else \endif` psql
meta-commands.
Also added a properly `round` function to all functions that returns
`float8` in SQL scripts because in rare cases it lead to flaky tests.
This is part of #4269.
This commit backports the Postgres multi-buffer / bulk insert
optimization into the timescale copy operator. If the target chunk
allows it (e.g., if no triggers are defined on the hypertable or the
chunk is not compressed), the data is stored in in-memory buffers
first and then flushed to the chunks in bulk operations.
Implements: #4080
Add an internal api to drop a single chunk.
This function drops the storage and metadata
associated with the chunk.
Note that chunk dependencies are not affected.
e.g. Continuous aggs are not updated when this chunk
is dropped.
First step to remove the re-aggregation for Continuous Aggregates
is to remove the `chunk_id` from the materialization hypertable.
Also added new metadata column named `finalized` to `continuous_cagg`
catalog table in order to store information about the new following
finalized version of Continuous Aggregates that will not need the
partials anymore. This flag is important to maintain backward
compatibility with previous Continuous Aggregate implementation that
requires the `chunk_id` to refresh data properly.
When postgres expands an inheritance tree it also adds the
parent hypertable as child relation. Since for a hypertable
the parent will never have any data we can mark this
relation as dummy relation so it gets ignored in later
steps. This is only relevant for code paths that use the
postgres inheritance code as we don't include the hypertable
as child when expanding the hypertable ourself.
This is similar to 3c40f924 which did the same adjustment for DELETE.
This patch also moves the marking into get_relation_info_hook so
it happens a bit earlier and prevents some additional cycles.
Commit 3c40f924 accidently broke DELETE statement triggers on PG14 that
were only defined on the hypertable itself. This patch fixes the issue
and also makes the trigger test no longer pg version specific.
Change truncate test to ignore warnings about potentially orphaned
files when dropping the test database. This seems to happen quite
frequently on appveyor causing the test to be flaky.
When postgres expands an inheritance tree it also adds the
parent hypertable as child relation. Since for a hypertable the
parent will never have any data we can mark this relation as
dummy relation so it gets ignored in later steps. This is only
relevant for code paths that use the postgres inheritance code
as we don't include the hypertable as child when expanding the
hypertable ourself.
Add the missing variables to the finalization view of Continuous
Aggregates and the corresponding columns to the materialization table.
Cover the case of targets that contain Aggref nodes and Var nodes
that are outside of the Aggref nodes at the same time.
Stop rebuilding the Continuous Aggregate view with ALTER MATERIALIZED
VIEW. Attempt to repair the view at post-update time instead, and fail
gracefully if it is not possible to do so without raw hypertable schema
or data modifications.
Stop rebuilding the Continuous Aggregate view when switching realtime
aggregation on and off. Instead, manipulate the User View by either:
1. removing the UNION ALL right-hand side and the WHERE clause when
disabling realtime aggregation
2. adding the Direct View to the right of a UNION ALL operator and
defining WHERE clauses with the relevant watermark checks when
enabling realtime aggregation
Fixes#3898
Stop throwing error "must be owner of hypertable" when a user with
TRUNCATE privilege on the hypertable attempts to TRUNCATE.
Previously we had a check that required TRUNCATE to only be
performed by the table owner, not taking into account the user's
TRUNCATE privilege, which is sufficient to allow this operation.
Fixes#4183
This patch changes the extension function list to include the
signature as well since functions with different signature are
separate objects in postgres. This also changes the list to include
all functions. Even though functions in internal schemas are not
considered public API they still need be treated the same as functions
in other schemas with regards to extension upgrade/downgrade.
This patch also moves the test to regresscheck-shared since we do
not dedicated database to run these tests.
Reorganize the code and fix minor bug that was not computing the size
of FSM, VM and INIT forks of the parent hypertable.
Fixed the bug by exposing the `ts_relation_size` function to the SQL
level to encapsulate the logic to compute `heap`, `indexes` and `toast`
sizes.
Add option `USE_TELEMETRY` that can be used to exclude telemetry from
the compile.
Telemetry-specific SQL is moved, which is only included when extension
is compiled with telemetry and the notice is changed so that the
message about telemetry is not printed when Telemetry is not compiled
in.
The following code is not compiled in when telemetry is not used:
- Cross-module functions for telemetry.
- Checks for telemetry job in job execution.
- GUC variables `telemetry_level` and `telemetry_cloud`.
Telemetry subsystem is not included when compiling without telemetry,
which requires some functions to be moved out of the telemetry
subsystem:
- Metadata handling is moved out of the telemetry module since it is
used not only with telemetry.
- UUID functions are moved into a separate module instead of being
part of the telemetry subsystem.
- Telemetry functions are either added or removed when updating from a
previous version.
Tests are updated to:
- Not use telemetry functions to get UUID or Metadata and instead use
the moved UUID and metadata functions.
- Not include telemetry information in tests that do not require it.
- Configuration files do not set telemetry variables when telemetry is
not compiled in.
- Replaced usage of telemetry functions in non-telemetry tests with
other sources of same information.
Fixes#3931
Smoke tests where missing critical files and some tests had changed
since last run and did not handle update smoke tests, so fixing all
necessary issues.
Chunk scan performance during querying is improved by avoiding
repeated open and close of relations and indexes when joining chunk
information from different metadata tables.
When executing a query on a hypertable, it is expanded to include all
its children chunks. However, during the expansion, the chunks that
don't match the query constraints should also be excluded. The
following changes are made to make the scanning and exclusion more
efficient:
* Ensure metadata relations and indexes are only opened once even
though metadata for multiple chunks are scanned. This avoids doing
repeated open and close of tables and indexes for each chunk
scanned.
* Avoid interleaving scans of different relations, ensuring better
data locality, and having, e.g., indexes warm in cache.
* Avoid unnecessary scans that repeat work already done.
* Ensure chunks are locked in a consistent order (based on Oid).
To enable the above changes, some refactoring was necessary. The chunk
scans that happen during constraint exclusion are moved into separate
source files (`chunk_scan.c`) for better structure and readability.
Some test outputs are affected due to the new ordering of chunks in
append relations.
Make the Scanner module more flexible by allowing optional control
over when the scanned relation is opened and closed. Relations can
then remain open over multiple scans, which can improve performance
and efficiency.
Closes#2173
We cache the Chunk structs in RelOptInfo private data. They are later
used to estimate the chunk sizes, check which data nodes they belong
to, et cetera. Looking up the chunks is expensive, so this change
speeds up the planning.
This patch locks down search_path in extension install and update
scripts to only contain pg_catalog, this requires that any reference
in those scripts is fully qualified. Additionally we add explicit
create commands to all update scripts for objects added to the
public schema. This change will make update scripts fail if a
function with identical signature already exists when installing
or upgrading instead reusing the existing object.
TimescaleDB was vulnerable to a privilege escalation attack in
the extension installation script. An attacker could precreate
objects normally owned by the extension and get those objects
used in the installation script since the script would only try
to create them if they did not already exist. Thanks to Pedro
Gallegos for reporting the problem.
This patch changes the schema, table and function creation to fail
and abort the installation when the object already exists instead
of using the existing object.
Security: CVE-2022-24128
If the ANY construct contains a singleton NULL then the logic in
"dimension_values_create_from_array" barfs causing a crash. Fix it
appropriately in the caller "hypertable_restrict_info_add_expr"
function.
Refactor the telemetry function and format to include stats broken
down on common relation types. The types include:
- Tables
- Partitioned tables
- Hypertables
- Distributed hypertables
- Continuous aggregates
- Materialized views
- Views
and for each of these types report (when applicable):
- Total number of relations
- Total number of children/chunks
- Total data volume (broken into heap, toast, and indexes).
- Compression stats
- PG stats, like reltuples
The telemetry function has also been refactored to return `jsonb`
instead of `text`. This makes it easier to query and manipulate the
resulting JSON format, and also gives cleaner output.
Closes#3932
To support tests with different configuration options, we split the
tests into *test configurations*. Each test configuration NAME will have
- A configuration template file `NAME.conf.in` that is used to run the
suite of tests.
- A variable `TEST_FILES_<NAME>` listing the test files available for
that test suite.
- A variable `SOLO_TESTS_<NAME>` that lists the tests that need to be
run as solo tests.
The code to generate test schedules is then factored out into a
separate file and used for each configuration.
This patch changes DELETE handling for hypertables to have the
postgres ModifyTable node be wrapped in a custom HypertableModify
node. By itself this does not change DELETE handling for hypertables
but instead enables subsequent patches to implement e.g. chunk
exclusion for DELETE or DELETE on compressed chunks.
Since PG 14 codepath for INSERT is different from previous versions
this PR will only change the plan for PG14+. DELETE handling for
distributed hypertables is not changed as part of this patch.
Starting with PG15, default permissions on the public schema is
restricted for any non-superuser non-owner. This causes test failures
since tables can no longer be created without explicitly adding
permissions, so we remove grant when bootstrapping the data nodes and
instead grant permissions to the users in the regression tests. This
keeps the default permissions on data nodes, but allow regression tests
to run.
Fixes#3957
Reference: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b073c3cc
Commit 97c2578ffa6b08f733a75381defefc176c91826b overcomplicated the
`invalidate_add_entry` API by adding parameters related to the remote
function call for multi-node on materialization hypertables.
Refactored it simplifying the function interface and adding a new
function to deal with materialization hypertables on multi-node
environment.
Fixes#3833
This patch optimizes how first()/last() initialize the compare
function. Previously the compare function would be looked up for
every transition function call but since polymorphic types are
resolved at parse time for a specific aggregate instance the compare
function will not change during its lifetime. Additionally this
patch also fixes a memory leak when using first()/last() on
pointer types.
These changes lead to a roughly 2x speed improvement for first()/
last() and make the memory usage near-constant.
Fixes#3935
Enable ALTER MATERIALIZED VIEW (timescaledb.compress)
This enables compression on the underlying materialized
hypertable. The segmentby and orderby columns for
compression are based on the GROUP BY clause and time_bucket
clause used while setting up the continuous aggregate.
timescaledb_information.continuous_aggregate view defn
change
Add support for compression policy on continuous
aggregates
Move code from job.c to policy_utils.c
Add support functions to check compression
policy validity for continuous aggregates.
The time_bucket comparison transformation code assumed the value and
the width of the time_bucket comparison expression were both Const.
But this was not validated only asserted. This can lead to wrong
query results. Found by sqlsmith.
Calling `CREATE DATABASE` we cannot have any backends connected to the
source database, except the current one. Since background workers
connect to any database that has installed the extension, they can
linger and cause a flaky error.
The test changed to just check that the extension can be implicitly
installed, but this require waiting for backends to be unregistered
from `procArray` in `procarray.c`, which the function
`pg_terminate_backend` does not handle properly and hence results in a
flaky test. This happen because `pg_terminate_backend` only signals the
process to terminate, but does not wait for it to actually terminate.
Instead, we add an explicit function that checks the `procArray` and
waits for it to not have any backends for a database and uses that to
wait for termination.
Since the number of buckets of the histogram function is an argument
to the function call it is possible to initialize the histogram
state with a lower number than actually needed in further calls
leading to a segfault. This patch changes the memory access to use
the number the state was initialized with instead of the number
passed to the call. This also changes the function to error when
the passed number differs from the initialized state.
This patch fixes a typo in the error message of CREATE INDEX when
using timescaledb.transaction_per_chunk.
Co-authored-by: Sven Klemm <sven@timescale.com>
Fix the "GRANT/REVOKE ALL IN SCHEMA" handling uniformly across
single-node and multi-node.
Even thought this is a SCHEMA specific activity, we decided to
include the chunks even if they are part of another SCHEMA. So
they will also end up getting/resetting the same privileges.
Includes test case changes for both single-node and multi-node use
cases.
PG14 introduced new `ALTER TABLE` sub-commands:
* `.. ALTER COLUMN .. SET COMPRESSION`: handled it properly on
`process_utility` hook code and added related regression tests
* `.. DETACH PARTITION .. {CONCURRENTLY | FINALIZE}`: handled it
properly on `process_utility` hook code but there's no need to add
regression tests because we don't rely to native partitioning in
hypertables.
Closes#3643
Since custom types are hashable in PG14 the partition test will be
different on PG14. Since the only difference was testing whether
creating hypertable with custom type paritition throws errors
without partitioning function that specific test got moved to ddl
tests which already is pg version specific.
VACUUM VERBOSE is the source for flaky tests and we don't gain much
by including the verbose output in the test. Additionally removing
the verbose option prevents us from having to make the vacuum tests
pg-version specific as PG14 slightly changes the formatting of the
VACUUM VERBOSE output.
With memoize enabled PG14 append tests produce a very different
plan compared to previous PG versions. To make comparing plans
between PG versions easier we disable memoize for PG14.
PG14 also modified how EXTRACT is shown in EXPLAIN output
so any query using EXTRACT will have different EXPLAIN output
between PG14 and previous versions.
Inside the `process_truncate` function is created a new relations list
removing the distributed hypertables and this new list is assigned to
the current statement relation list. This new list is allocated into
the `PortalContext` that is destroyed at the end of the statement
execution.
The problem arise on the subsequent `TRUNCATE` call because the
compiled plpgsql code is cached into another memory context and the
elements of the relations inside this cache is pointing to an invalid
memory area because the `PortalContext` is destroyed.
Fixed it by allocating the new relations list to the same memory
context of the original list.
Fixes#3580, fixes#3622, fixes#3182
Using now() in regression tests will result in flaky tests as this
can result in creating different number of chunks depending on
alignment of now() relative to chunk boundaries.
When a new chunk is created, the ACL is copied from the hypertable, but
the shared dependencies are not set at all. Since there is no shared
dependency, a `DROP OWNED BY` will not find the chunk and revoke the
privileges for the user from the chunk. When the user is later dropped,
the ACL for the chunk will contain a non-existent user.
This commit fixes this by adding shared dependencies of the hypertable
to the chunk when the chunk is created.
Fixes#3614