Enable ALTER MATERIALIZED VIEW (timescaledb.compress)
This enables compression on the underlying materialized
hypertable. The segmentby and orderby columns for
compression are based on the GROUP BY clause and time_bucket
clause used while setting up the continuous aggregate.
timescaledb_information.continuous_aggregate view defn
change
Add support for compression policy on continuous
aggregates
Move code from job.c to policy_utils.c
Add support functions to check compression
policy validity for continuous aggregates.
Commit fffd6c2350f5b3237486f3d49d7167105e72a55b fixes problem related
to PortalContext using PL/pgSQL procedure to execute the policy.
Unfortunately this new implementation introduced a problem when we use
INTEGER and not BIGINT for the time dimension.
Fixed it by dealing correclty with the integer types: SMALLINT, INTEGER
and BIGINT.
Also refatored the policy compression procedure replacing the two
procedures `policy_compression_{interval|integer}` by a simple
`policy_compression_execute` casting dimension type dynamically.
Fixes#3773
Instead of picking 1 chunk for processing, we find the
list of chunks that have to be compressed by
the compression job, and proceed to process each one in its
own transaction. Without this, we could end up in a situation
where the first chunk is continually picked for recompression
(due to active inserts into the chunk) and we don't make any
progress.
We can limit the number of chunks processed by a single run
of the job by setting the new config
parameter: max_chunks_to_compress, for the compression job.
Valid values are > 0, The job processes
only maxchunks_to_compress number of chunks and defers any
remaining items to the next scheduled run of the job.
The default is to process all pending chunks.
We have an additional job config parameter: verbose_log.
This enables additional logging that logs the chunks that
are processed by the job.
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.
This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).
recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
Tests are updated to no longer use continuous aggregate options that
will be removed, such as `refresh_lag`, `max_interval_per_job` and
`ignore_invalidation_older_than`. `REFRESH MATERIALIZED VIEW` has also
been replaced with `CALL refresh_continuous_aggregate()` using ranges
that try to replicate the previous refresh behavior.
The materializer test (`continuous_aggregate_materialize`) has been
removed, since this tested the "old" materializer code, which is no
longer used without `REFRESH MATERIALIZED VIEW`. The new API using
`refresh_continuous_aggregate` already allows manual materialization
and there are two previously added tests (`continuous_aggs_refresh`
and `continuous_aggs_invalidate`) that cover the new refresh path in
similar ways.
When updated to use the new refresh API, some of the concurrency
tests, like `continuous_aggs_insert` and `continuous_aggs_multi`, have
slightly different concurrency behavior. This is explained by
different and sometimes more conservative locking. For instance, the
first transaction of a refresh serializes around an exclusive lock on
the invalidation threshold table, even if no new threshold is
written. The previous code, only took the heavier lock once, and if, a
new threshold was written. This new, and stricter locking, means that
insert processes that read the invalidation threshold will block for a
short time when there are concurrent refreshes. However, since this
blocking only occurs during the first transaction of the refresh
(which is quite short), it probably doesn't matter too much in
practice. The relaxing of locks to improve concurrency and performance
can be implemented in the future.
This commit will add support for `WITH NO DATA` when creating a
continuous aggregate and will refresh the continuous aggregate when
creating it unless `WITH NO DATA` is provided.
All test cases are also updated to use `WITH NO DATA` and an additional
test case for verifying that both `WITH DATA` and `WITH NO DATA` works
as expected.
Closes#2341
This change makes the behavior of dropping chunks on a hypertable that
has associated continuous aggregates consistent with other
mutations. In other words, any way of deleting data, irrespective of
whether this is done through a `DELETE`, `DROP TABLE <chunk>` or
`drop_chunks` command, will invalidate the region of deleted data so
that a subsequent refresh of a continuous aggregate will know that the
region is out-of-date and needs to be materialized.
Previously, only a `DELETE` would invalidate continuous aggregates,
while `DROP TABLE <chunk>` and `drop_chunks` did not. In fact, each
way to delete data had different behavior:
1. A `DELETE` would generate invalidations and the materializer would
update any aggregates to reflect the changes.
2. A `DROP TABLE <chunk>` would not generate invalidations and the
changes would therefore not be reflected in aggregates.
3. A `drop_chunks` command would not work unless
`ignore_invalidation_older_than` was set. When enabled, the
`drop_chunks` would first materialize the data to be dropped and
then never materialize that region again, unless
`ignore_invalidation_older_than` was reset. But then the continuous
aggregates would be in an undefined state since invalidations had
been ignored.
Due to the different behavior of these mutations, a continuous
aggregate could get "out-of-sync" with the underlying hypertable. This
has now been fixed.
For the time being, the previous behavior of "refresh-on-drop" (i.e.,
materializing the data on continuous aggregates before dropping it) is
retained for `drop_chunks`. However, such "refresh-on-drop" behavior
should probably be revisited in the future since it happens silently
by default without an opt out. There are situations when such silent
refreshing might be undesirable; for instance, let's say the dropped
data had seen erroneous backfill that a user wants to ignore. Another
issue with "refresh-on-drop" is that it only happens for `drop_chunks`
and not other ways of deleting data.
Fixes#2242
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table. Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.
In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.
Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.
Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.
Fixes#2233
Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
This patch adds functionality to schedule arbitrary functions
or procedures as background jobs.
New functions:
add_job(
proc REGPROC,
schedule_interval INTERVAL,
config JSONB DEFAULT NULL,
initial_start TIMESTAMPTZ DEFAULT NULL,
scheduled BOOL DEFAULT true
)
Add a job that runs proc every schedule_interval. Proc can
be either a function or a procedure implemented in any language.
delete_job(job_id INTEGER)
Deletes the job.
run_job(job_id INTEGER)
Execute a job in the current session.
The parameter `cascade_to_materialization` is removed from
`drop_chunks` and `add_drop_chunks_policy` as well as associated tables
and test functions.
Fixes#2137
The `drop_chunks` function is refactored to make table name mandatory
for the function. As a result, the function was also refactored to
accept the `regclass` type instead of table name plus schema name and
the parameters were reordered to match the order for `show_chunks`.
The commit also refactor the code to pass the hypertable structure
between internal functions rather than the hypertable relid and moving
error checks to the PostgreSQL function. This allow the internal
functions to avoid some lookups and use the information in the
structure directly and also give errors earlier instead of first
dropping chunks and then error and roll back the transaction.
The function `get_chunks_to_compress` return chunks that are not
compressed but that are dropped, meaning a lookup using
`ts_chunk_get_by_id` will fail to find the corresponding `table_id`,
which later leads to a null pointer when looking for the chunk. This
leads to a segmentation fault.
This commit fixes this by ignoring chunk that have are marked as
dropped in the chunk table when scanning for chunks to compress.
This change will check sql commands which start a background worker
on a hypertable to verify that the table owner has permission to
log into the database. This is necessary, as background workers for
these commands will run with the permissions of the table owner, and
thus immediately fail if unable to log in.
To make tests more stable and to remove some repeated code in the
tests this PR changes the test runner to stop background workers.
Individual tests that need background workers can still start them
and this PR will only stop background workers for the initial database
for the test, behaviour for additional databases created during the
tests will not change.
Previously we were creating multiple rows using generate_series
and now(), depending on the time of day the test was run, this
could create one or two chunks, causing flakiness.
We changed the test to only create one row and thus one chunk