Users often execute TopN like queries over Continuous Aggregates and
now with the release 2.7 such queries are even faster because we
remove the re-aggregation and don't store partials anymore.
Also the previous PR #4430 gave us the ability to create indexes
direct on the aggregated columns leading to performance improvements.
But there are a noticable performance difference between
`Materialized-Only` and `Real-Time` Continuous Aggregates for TopN
queries.
Enabling the ORDER BY clause in the Continuous Aggregates definition
result in:
1) improvements of the User Experience that can use this so commom
clause in SELECT queries
2) performance improvements because we give the planner a chance to
use the MergeAppend node by producing ordered datasets.
Closes#4456
Following work started by #4294 to improve performance of Continuous
Aggregates by removing the re-aggregation in the user view.
This PR get rid of `partialize_agg` and `finalize_agg` aggregate
functions and store the finalized aggregated (plain) data in the
materialization hypertable.
Because we're not storing partials anymore and removed the
re-aggregation, now is be possible to create indexes on aggregated
columns in the materialization hypertable in order to improve the
performance even more.
Also removed restrictions on types of aggregates users can perform
with Continuous Aggregates:
* aggregates with DISTINCT
* aggregates with FILTER
* aggregates with FILTER in HAVING clause
* aggregates without combine function
* ordered-set aggregates
* hypothetical-set aggregates
By default new Continuous Aggregates will be created using this new
format, but the previous version (with partials) will be supported.
Users can create the previous style by setting to `false` the storage
paramater named `timescaledb.finalized` during the creation of the
Continuous Aggregate.
Fixes#4233
First step to remove the re-aggregation for Continuous Aggregates
is to remove the `chunk_id` from the materialization hypertable.
Also added new metadata column named `finalized` to `continuous_cagg`
catalog table in order to store information about the new following
finalized version of Continuous Aggregates that will not need the
partials anymore. This flag is important to maintain backward
compatibility with previous Continuous Aggregate implementation that
requires the `chunk_id` to refresh data properly.
The function `tsl_finalize_agg_ffunc` modified the aggregation state by
setting `trans_value` to the final result when computing the final
value. Since the state can be re-used several times, there could be
several calls to the finalization function, and the finalization
function would be confused when passed a final value instead of a
aggregation state transition value.
This commit fixes this by not modifying the `trans_value` when
computing the final value and instead just returns it (or the original
`trans_value` if there is no finalization function).
Fixes#3248
Enable ALTER MATERIALIZED VIEW (timescaledb.compress)
This enables compression on the underlying materialized
hypertable. The segmentby and orderby columns for
compression are based on the GROUP BY clause and time_bucket
clause used while setting up the continuous aggregate.
timescaledb_information.continuous_aggregate view defn
change
Add support for compression policy on continuous
aggregates
Move code from job.c to policy_utils.c
Add support functions to check compression
policy validity for continuous aggregates.
If the targetlist for the cagg query has both subexprs and exprs
from the having clause, the havingqual for the partial view
is generated incorrectly. Fix this issue by checking havingqual
against all the entries in the targetlist instead of first match.
Fixes#2655
partialize_agg is an internal function, which serializes partial
aggregate results. It is used to prepare partials for materialization
in continuous aggregates and partial results on data nodes in
distributed query execution. paritalize_agg doesn't expect push down of
aggregates, which happens when partitionwise aggregate is enabled, and
produces a query plan, which either crashes on assert during execution
or produces incorrect result.
This fix avoids adding partition info if the function is present in the
query. This can be seen as a work around and it is good to fix planning
of partialize_agg in the case of pushed down aggregates.
This commit also contains few minor fixes of readability of comments
and code around the changes.
Fixes#2849 and fixes#2858
If a bad value is given to `alter_job` or `add_job` for a configuration
parameter, no error will be given but the job will fail to execute.
This commit adds checks of the configuration parameters to the
functions so that an error is given immediately when calling it. The
commit factors out the extraction of parameters from the configuration
from the execution functions into a separate functions and calls them
from `alter_job` and `add_job` as well as when executing the job. Only
non-custom job checks are done.
The commit also moves a few functions that were only used in TSL code
from the `src/` directory to the `tsl/src/` directory and also removes
a redundant permission check and does a minor refactoring of the
`job_execute` function so that an active snapshot is always created
regardless of whether a transaction is open or not. The corresponding
code in the individual policy functions are removed since they are not
needed.
Closes#2607
Fixes support for continuous aggregates when the view query contains
an expression with several aggregates, e.g., `max(val) - min(val)`.
Usage of continuous aggregates with such expression was producing
errors if the aggregate expression was not the last in the SELECT
clause or not all GROUP BY expressions were present in the SELECT
clause.
An expression with several aggregates is materialized with partials
per aggregate. For example, `max(val) - min(val)` will be materialized
in two partial entry columns: one for `max` and one for `min`. Thus
all columns in the materialized hypertable should account for the
number of partials and cannot just use the position in the original
query. This fix makes sure to account for such case.
Fixes#2616
This change cleans up and removes duplicate code for internal lookups
of continuous aggregates. A number of related error messages have also
been cleaned up and made conformant with the error style guide.
This change filters materialized hypertables from the hypertables
view, similar to how internal compression hypertables are
filtered.
Materialized hypertables are internal objects created as a side effect
of creating a continuous aggregate, and these internal hypertables are
still listed in the continuous_aggregates view.
Fixes#2383
Tests are updated to no longer use continuous aggregate options that
will be removed, such as `refresh_lag`, `max_interval_per_job` and
`ignore_invalidation_older_than`. `REFRESH MATERIALIZED VIEW` has also
been replaced with `CALL refresh_continuous_aggregate()` using ranges
that try to replicate the previous refresh behavior.
The materializer test (`continuous_aggregate_materialize`) has been
removed, since this tested the "old" materializer code, which is no
longer used without `REFRESH MATERIALIZED VIEW`. The new API using
`refresh_continuous_aggregate` already allows manual materialization
and there are two previously added tests (`continuous_aggs_refresh`
and `continuous_aggs_invalidate`) that cover the new refresh path in
similar ways.
When updated to use the new refresh API, some of the concurrency
tests, like `continuous_aggs_insert` and `continuous_aggs_multi`, have
slightly different concurrency behavior. This is explained by
different and sometimes more conservative locking. For instance, the
first transaction of a refresh serializes around an exclusive lock on
the invalidation threshold table, even if no new threshold is
written. The previous code, only took the heavier lock once, and if, a
new threshold was written. This new, and stricter locking, means that
insert processes that read the invalidation threshold will block for a
short time when there are concurrent refreshes. However, since this
blocking only occurs during the first transaction of the refresh
(which is quite short), it probably doesn't matter too much in
practice. The relaxing of locks to improve concurrency and performance
can be implemented in the future.
This change simplifies the name of the functions for adding and
removing a continuous aggregate policy. The functions are renamed
from:
- `add_refresh_continuous_aggregate_policy`
- `remove_refresh_continuous_aggregate_policy`
to
- `add_continuous_aggregate_policy`
- `remove_continuous_aggregate_policy`
Fixes#2320
This commit will add support for `WITH NO DATA` when creating a
continuous aggregate and will refresh the continuous aggregate when
creating it unless `WITH NO DATA` is provided.
All test cases are also updated to use `WITH NO DATA` and an additional
test case for verifying that both `WITH DATA` and `WITH NO DATA` works
as expected.
Closes#2341
Support add and remove continuous agg policy functions
Integrate policy execution with refresh api for continuous
aggregates
The old api for continuous aggregates adds a job automatically
for a continuous aggregate. This is an explicit step with the
new API. So remove this functionality.
Refactor some of the utility functions so that the code can be shared
by multiple policies.
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table. Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.
In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.
Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.
Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.
Fixes#2233
Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
This patch adds functionality to schedule arbitrary functions
or procedures as background jobs.
New functions:
add_job(
proc REGPROC,
schedule_interval INTERVAL,
config JSONB DEFAULT NULL,
initial_start TIMESTAMPTZ DEFAULT NULL,
scheduled BOOL DEFAULT true
)
Add a job that runs proc every schedule_interval. Proc can
be either a function or a procedure implemented in any language.
delete_job(job_id INTEGER)
Deletes the job.
run_job(job_id INTEGER)
Execute a job in the current session.
To drop a continuous aggregate it was necessary to use the `CASCADE`
keyword, which would then cascade to the materialized hypertable. Since
this can cascade the drop to other objects that are dependent on the
continuous aggregate, this could accidentally drop more objects than
intended.
This commit fixes this by removing the check for `CASCADE` and adding
the materialized hypertable to the list of objects to drop.
Fixestimescale/timescaledb-private#659
This PR adds a new mode for continuous aggregates that we name
real-time aggregates. Unlike the original this new mode will
combine materialized data with new data received after the last
refresh has happened. This new mode will be the default behaviour
for newly created continuous aggregates.
To upgrade existing continuous aggregates to the new behaviour
the following command needs to be run for all continuous aggregates
ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=false);
To disable this behaviour for newly created continuous aggregates
and get the old behaviour the following command can be run
ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=true);
Set the threshold for continuous aggregates as the
max value in the raw hypertable when the max value
is lesser than the computed now time. This helps avoid
unnecessary materialization checks for data ranges
that do not exist. As a result, we also prevent
unnecessary writes to the thresholds and invalidation
log tables.
To make tests more stable and to remove some repeated code in the
tests this PR changes the test runner to stop background workers.
Individual tests that need background workers can still start them
and this PR will only stop background workers for the initial database
for the test, behaviour for additional databases created during the
tests will not change.
The log level used for continuous aggregate materialization messages
was INFO which is for requested information. Since there is no way to
control the behaviour externally INFO is a suboptimal choice because
INFO messages cannot be easily suppressed leading to irreproducable
test output. Even though time can be mocked to make output consistent
this is only available in debug builds.
This patch changes the log level of those messages to LOG, so
clients can easily control the ouput by setting client_min_messages.
We added a timescaledb.ignore_invalidation_older_than parameter for
continuous aggregatess. This parameter accept a time-interval (e.g. 1
month). if set, it limits the amount of time for which to process
invalidation. Thus, if
timescaledb.ignore_invalidation_older_than = '1 month'
then any modifications for data older than 1 month from the current
timestamp at insert time will not cause updates to the continuous
aggregate. This limits the amount of work that a backfill can trigger.
This parameter must be >= 0. A value of 0 means that invalidations are
never processed.
When recording invalidations for the hypertable at insert time, we use
the maximum ignore_invalidation_older_than of any continuous agg attached
to the hypertable as a cutoff for whether to record the invalidation
at all. When materializing a particular continuous agg, we use that
aggs ignore_invalidation_older_than cutoff. However we have to apply
that cutoff relative to the insert time not the materialization
time to make it easier for users to reason about. Therefore,
we record the insert time as part of the invalidation entry.
continuous aggregate views like
select time_bucket(), sum(col)
from ...
group by time_bucket(), grpcol;
when grpcol is missing from the select targetlist, the
partialize query's select targetlist is incorrect and the view
cannot be materialized. This PR fixes this issue.
Previously, refresh_lag in continuous aggs was calculated
relative to the maximum timestamp in the table. Change the
semantics so that it is relative to now(). This is more
intuitive.
Requires an integer_now function applied to hypertables
with integer-based time dimensions.
To create a continuous agg you now only need SELECT and
TRIGGER permission on the raw table. To continue refreshing
the continuous agg the owner of the continuous agg needs
only SELECT permission.
This commit adds tests to make sure that removing the
SELECT permission removes ability to refresh using
both REFRESH MATERIALIZED VIEW and also through a background
worker.
This work also uncovered divergence in permission logic for
creating triggers by a CREATE TRIGGER on chunks and when new
chunks are created. This has now been unified: there is a check
to make sure you can create the trigger on the main table and
then there is a check that the owner of the main table can create
triggers on chunks.
Alter view for continuous aggregates is allowed for the owner of the
view.
This commit fixes and tests permissions in the following
API calls:
- reorder_chunk (test only)
- alter_job_schedule
- add_drop_chunks_policy
- remove_drop_chunks_policy
- add_reorder_policy
- remove_reorder_policy
- drop_chunks
The partial view should always project the time_bucket expression related
column as this is a special column for the materialization table. The partial
view failed to project it when the user query's SELECT targetlist did not
contain the time_bucket expression. The materialization fails in this
scenario.
This commit change the name given to group columns in the materialized
tables to make them more intuitive for the user. The goal was to make
the column names the same as the column names in the view. The main
change was to change time_partitioning_col to be the same as the
view. "time_partition_col" is only used as the default when there is
no alias.
This commit also changes the assignment of the view aliases to the
target entries to occur much earlier in the create process.
Add indexes for materialization table created by continuous aggregates.
This behavior can be turned on/off by using timescaledb.create_group_indexes parameter
of the WITH clause when the continuous agg is created.