40 Commits

Author SHA1 Message Date
Alexander Kuzmenkov
706a3c0e50 Enable statement logging in the tests
Remove 'client_min_messages = LOG' where not needed, and add the 'LOG:
statement' output otherwise.
2022-08-25 15:29:28 +03:00
Fabrízio de Royes Mello
28440b7900 Enable ORDER BY on Continuous Aggregates
Users often execute TopN like queries over Continuous Aggregates and
now with the release 2.7 such queries are even faster because we
remove the re-aggregation and don't store partials anymore.

Also the previous PR #4430 gave us the ability to create indexes
direct on the aggregated columns leading to performance improvements.

But there are a noticable performance difference between
`Materialized-Only` and `Real-Time` Continuous Aggregates for TopN
queries.

Enabling the ORDER BY clause in the Continuous Aggregates definition
result in:

1) improvements of the User Experience that can use this so commom
   clause in SELECT queries

2) performance improvements because we give the planner a chance to
   use the MergeAppend node by producing ordered datasets.

Closes #4456
2022-07-31 15:52:55 -03:00
Fabrízio de Royes Mello
f266f5cf56 Continuous Aggregates finals form
Following work started by #4294 to improve performance of Continuous
Aggregates by removing the re-aggregation in the user view.

This PR get rid of `partialize_agg` and `finalize_agg` aggregate
functions and store the finalized aggregated (plain) data in the
materialization hypertable.

Because we're not storing partials anymore and removed the
re-aggregation, now is be possible to create indexes on aggregated
columns in the materialization hypertable in order to improve the
performance even more.

Also removed restrictions on types of aggregates users can perform
with Continuous Aggregates:
* aggregates with DISTINCT
* aggregates with FILTER
* aggregates with FILTER in HAVING clause
* aggregates without combine function
* ordered-set aggregates
* hypothetical-set aggregates

By default new Continuous Aggregates will be created using this new
format, but the previous version (with partials) will be supported.

Users can create the previous style by setting to `false` the storage
paramater named `timescaledb.finalized` during the creation of the
Continuous Aggregate.

Fixes #4233
2022-05-18 11:38:58 -03:00
Fabrízio de Royes Mello
1e8d37b54e Remove chunk_id from materialization hypertable
First step to remove the re-aggregation for Continuous Aggregates
is to remove the `chunk_id` from the materialization hypertable.

Also added new metadata column named `finalized` to `continuous_cagg`
catalog table in order to store information about the new following
finalized version of Continuous Aggregates that will not need the
partials anymore. This flag is important to maintain backward
compatibility with previous Continuous Aggregate implementation that
requires the `chunk_id` to refresh data properly.
2022-05-06 14:30:00 -03:00
Konstantina Skovola
687e7c7233 Fix option "timescaledb.create_group_indexes"
Previously this option was ignored when creating a
continuous aggregate, even when explicitly set to true.

Fixes #4249
2022-04-26 20:51:11 +03:00
Mats Kindahl
1b2926c076 Do not modify aggregation state in finalize
The function `tsl_finalize_agg_ffunc` modified the aggregation state by
setting `trans_value` to the final result when computing the final
value. Since the state can be re-used several times, there could be
several calls to the finalization function, and the finalization
function would be confused when passed a final value instead of a
aggregation state transition value.

This commit fixes this by not modifying the `trans_value` when
computing the final value and instead just returns it (or the original
`trans_value` if there is no finalization function).

Fixes #3248
2022-04-06 20:50:47 +02:00
gayyappan
d8d392914a Support for compression on continuous aggregates
Enable ALTER MATERIALIZED VIEW (timescaledb.compress)
This enables compression on the underlying materialized
hypertable. The segmentby and orderby columns for
compression are based on the GROUP BY clause and time_bucket
clause used while setting up the continuous aggregate.

timescaledb_information.continuous_aggregate view defn
change

Add support for compression policy on continuous
aggregates

Move code from job.c to policy_utils.c
Add support functions to check compression
policy validity for continuous aggregates.
2021-12-17 10:51:33 -05:00
gayyappan
217ba461ac Fix havingqual processing for caggs
If the targetlist for the cagg query has both subexprs and exprs
from the having clause, the havingqual for the partial view
is generated incorrectly. Fix this issue by checking havingqual
against all the entries in the targetlist instead of first match.

Fixes #2655
2021-08-17 11:12:28 -04:00
Ruslan Fomkin
f98337cd3c Avoid partitionwise planning of partialize_agg
partialize_agg is an internal function, which serializes partial
aggregate results. It is used to prepare partials for materialization
in continuous aggregates and partial results on data nodes in
distributed query execution. paritalize_agg doesn't expect push down of
aggregates, which happens when partitionwise aggregate is enabled, and
produces a query plan, which either crashes on assert during execution
or produces incorrect result.

This fix avoids adding partition info if the function is present in the
query. This can be seen as a work around and it is good to fix planning
of partialize_agg in the case of pushed down aggregates.

This commit also contains few minor fixes of readability of comments
and code around the changes.

Fixes #2849 and fixes #2858
2021-01-28 09:00:08 +01:00
Mats Kindahl
d043ff1e04 Check configuration in alter_job and add_job
If a bad value is given to `alter_job` or `add_job` for a configuration
parameter, no error will be given but the job will fail to execute.

This commit adds checks of the configuration parameters to the
functions so that an error is given immediately when calling it. The
commit factors out the extraction of parameters from the configuration
from the execution functions into a separate functions and calls them
from `alter_job` and `add_job` as well as when executing the job. Only
non-custom job checks are done.

The commit also moves a few functions that were only used in TSL code
from the `src/` directory to the `tsl/src/` directory and also removes
a redundant permission check and does a minor refactoring of the
`job_execute` function so that an active snapshot is always created
regardless of whether a transaction is open or not. The corresponding
code in the individual policy functions are removed since they are not
needed.

Closes #2607
2020-12-02 11:04:02 +01:00
Ruslan Fomkin
6a9a965409 Fix support for complex aggregate expression
Fixes support for continuous aggregates when the view query contains
an expression with several aggregates, e.g., `max(val) - min(val)`.
Usage of continuous aggregates with such expression was producing
errors if the aggregate expression was not the last in the SELECT
clause or not all GROUP BY expressions were present in the SELECT
clause.

An expression with several aggregates is materialized with partials
per aggregate. For example, `max(val) - min(val)` will be materialized
in two partial entry columns: one for `max` and one for `min`. Thus
all columns in the materialized hypertable should account for the
number of partials and cannot just use the position in the original
query. This fix makes sure to account for such case.

Fixes #2616
2020-11-20 17:39:46 +01:00
Sven Klemm
295817f18e Improve cagg datatype handling
This patch improves datatype handling when the aggregate function
argument type is a pseudotype.
2020-10-19 12:01:43 +02:00
Erik Nordström
4623db14ad Use consistent column names in views
Make all views that reference hypertables use `hypertable_schema` and
`hypertable_name`.
2020-10-05 15:18:47 +02:00
Sven Klemm
dbb9988eee Fix result ordering in tests
This patch fixes the result sorting in tests that had no ORDER BY
clause or where ORDER BY clause did not result in fixed ordering.
2020-09-28 12:15:42 +02:00
Erik Nordström
27e44f20ac Cleanup functions to find continuous aggregates
This change cleans up and removes duplicate code for internal lookups
of continuous aggregates. A number of related error messages have also
been cleaned up and made conformant with the error style guide.
2020-09-15 17:18:59 +02:00
Erik Nordström
4f74262991 Filter materialized hypertables in view
This change filters materialized hypertables from the hypertables
view, similar to how internal compression hypertables are
filtered.

Materialized hypertables are internal objects created as a side effect
of creating a continuous aggregate, and these internal hypertables are
still listed in the continuous_aggregates view.

Fixes #2383
2020-09-14 13:04:59 +02:00
Erik Nordström
202692f1ef Make tests use the new continuous aggregate API
Tests are updated to no longer use continuous aggregate options that
will be removed, such as `refresh_lag`, `max_interval_per_job` and
`ignore_invalidation_older_than`. `REFRESH MATERIALIZED VIEW` has also
been replaced with `CALL refresh_continuous_aggregate()` using ranges
that try to replicate the previous refresh behavior.

The materializer test (`continuous_aggregate_materialize`) has been
removed, since this tested the "old" materializer code, which is no
longer used without `REFRESH MATERIALIZED VIEW`. The new API using
`refresh_continuous_aggregate` already allows manual materialization
and there are two previously added tests (`continuous_aggs_refresh`
and `continuous_aggs_invalidate`) that cover the new refresh path in
similar ways.

When updated to use the new refresh API, some of the concurrency
tests, like `continuous_aggs_insert` and `continuous_aggs_multi`, have
slightly different concurrency behavior. This is explained by
different and sometimes more conservative locking. For instance, the
first transaction of a refresh serializes around an exclusive lock on
the invalidation threshold table, even if no new threshold is
written. The previous code, only took the heavier lock once, and if, a
new threshold was written. This new, and stricter locking, means that
insert processes that read the invalidation threshold will block for a
short time when there are concurrent refreshes. However, since this
blocking only occurs during the first transaction of the refresh
(which is quite short), it probably doesn't matter too much in
practice. The relaxing of locks to improve concurrency and performance
can be implemented in the future.
2020-09-11 16:07:21 +02:00
Erik Nordström
07ebd5c9b2 Rename continuous aggregate policy API
This change simplifies the name of the functions for adding and
removing a continuous aggregate policy. The functions are renamed
from:

- `add_refresh_continuous_aggregate_policy`
- `remove_refresh_continuous_aggregate_policy`

to

- `add_continuous_aggregate_policy`
- `remove_continuous_aggregate_policy`

Fixes #2320
2020-09-11 15:22:54 +02:00
Mats Kindahl
9565cbd0f7 Continuous aggregates support WITH NO DATA
This commit will add support for `WITH NO DATA` when creating a
continuous aggregate and will refresh the continuous aggregate when
creating it unless `WITH NO DATA` is provided.

All test cases are also updated to use `WITH NO DATA` and an additional
test case for verifying that both `WITH DATA` and `WITH NO DATA` works
as expected.

Closes #2341
2020-09-11 14:02:41 +02:00
gayyappan
97b4d1cae2 Support refresh continuous aggregate policy
Support add and remove continuous agg policy functions
Integrate policy execution with refresh api for continuous
aggregates
The old api for continuous aggregates adds a job automatically
for a continuous aggregate. This is an explicit step with the
new API. So remove this functionality.
Refactor some of the utility functions so that the code can be shared
by multiple policies.
2020-09-01 21:41:00 -04:00
Mats Kindahl
c054b381c6 Change syntax for continuous aggregates
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table.  Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.

In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.

Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.

Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.

Fixes #2233

Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
2020-08-27 17:16:10 +02:00
Sven Klemm
a9c087eb1e Allow scheduling custom functions as bgw jobs
This patch adds functionality to schedule arbitrary functions
or procedures as background jobs.

New functions:

add_job(
  proc REGPROC,
  schedule_interval INTERVAL,
  config JSONB DEFAULT NULL,
  initial_start TIMESTAMPTZ DEFAULT NULL,
  scheduled BOOL DEFAULT true
)

Add a job that runs proc every schedule_interval. Proc can
be either a function or a procedure implemented in any language.

delete_job(job_id INTEGER)

Deletes the job.

run_job(job_id INTEGER)

Execute a job in the current session.
2020-08-20 11:23:49 +02:00
Sven Klemm
d547d61516 Refactor continuous aggregate policy
This patch modifies the continuous aggregate policy to store its
configuration in the jobs table.
2020-08-11 22:57:02 +02:00
Mats Kindahl
9049a5d3cb Remove requirement of CASCADE from DROP VIEW
To drop a continuous aggregate it was necessary to use the `CASCADE`
keyword, which would then cascade to the materialized hypertable. Since
this can cascade the drop to other objects that are dependent on the
continuous aggregate, this could accidentally drop more objects than
intended.

This commit fixes this by removing the check for `CASCADE` and adding
the materialized hypertable to the list of objects to drop.

Fixes timescale/timescaledb-private#659
2020-08-03 22:01:21 +02:00
Sven Klemm
2ae4592930 Add real-time support to continuous aggregates
This PR adds a new mode for continuous aggregates that we name
real-time aggregates. Unlike the original this new mode will
combine materialized data with new data received after the last
refresh has happened. This new mode will be the default behaviour
for newly created continuous aggregates.

To upgrade existing continuous aggregates to the new behaviour
the following command needs to be run for all continuous aggregates

ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=false);

To disable this behaviour for newly created continuous aggregates
and get the old behaviour the following command can be run

ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=true);
2020-03-31 22:09:42 +02:00
gayyappan
ce624d61d3 Restrict watermark to max for continuous aggregates
Set the threshold for continuous aggregates as the
max value in the raw hypertable when the max value
is lesser than the computed now time. This helps avoid
unnecessary materialization checks for data ranges
that do not exist. As a result, we also prevent
unnecessary writes to the thresholds and invalidation
log tables.
2020-03-25 12:20:11 -04:00
Sven Klemm
0cc22ad278 Stop background worker in tests
To make tests more stable and to remove some repeated code in the
tests this PR changes the test runner to stop background workers.
Individual tests that need background workers can still start them
and this PR will only stop background workers for the initial database
for the test, behaviour for additional databases created during the
tests will not change.
2020-03-06 15:27:53 +01:00
Sven Klemm
08c3d9015f Change log level for cagg materialization messages
The log level used for continuous aggregate materialization messages
was INFO which is for requested information. Since there is no way to
control the behaviour externally INFO is a suboptimal choice because
INFO messages cannot be easily suppressed leading to irreproducable
test output. Even though time can be mocked to make output consistent
this is only available in debug builds.

This patch changes the log level of those messages to LOG, so
clients can easily control the ouput by setting client_min_messages.
2020-03-06 01:09:08 +01:00
Matvey Arye
08ad7b6612 Add ignore_invalidation_older_than to continuous aggs
We added a timescaledb.ignore_invalidation_older_than parameter for
continuous aggregatess. This parameter accept a time-interval (e.g. 1
month). if set, it limits the amount of time for which to process
invalidation. Thus, if
	timescaledb.ignore_invalidation_older_than = '1 month'
then any modifications for data older than 1 month from the current
timestamp at insert time will not cause updates to the continuous
aggregate. This limits the amount of work that a backfill can trigger.
This parameter must be >= 0. A value of 0 means that invalidations are
never processed.

When recording invalidations for the hypertable at insert time, we use
the maximum ignore_invalidation_older_than of any continuous agg attached
to the hypertable as a cutoff for whether to record the invalidation
at all. When materializing a particular continuous agg, we use that
aggs  ignore_invalidation_older_than cutoff. However we have to apply
that cutoff relative to the insert time not the materialization
time to make it easier for users to reason about. Therefore,
we record the insert time as part of the invalidation entry.
2019-12-04 15:47:03 -05:00
gayyappan
4ecc96509d Fix partial select query for continuous aggregate
continuous aggregate views like
select time_bucket(), sum(col)
from ...
group by time_bucket(), grpcol;

when grpcol is missing from the select targetlist, the
partialize query's select targetlist is incorrect and the view
cannot be materialized. This PR fixes this issue.
2019-12-03 13:20:38 -05:00
Matvey Arye
2f7d69f93b Make continuous agg relative to now()
Previously, refresh_lag in continuous aggs was calculated
relative to the maximum timestamp in the table. Change the
semantics so that it is relative to now(). This is more
intuitive.

Requires an integer_now function applied to hypertables
with integer-based time dimensions.
2019-11-21 14:17:37 -05:00
gayyappan
60cfe6cc90 Support for multiple continuous aggregates
Allow multiple continuous aggregates to be defined on a hypertable.
2019-06-24 17:05:49 -04:00
Matvey Arye
d580abf04f Change how permissions work with continuous aggs
To create a continuous agg you now only need SELECT and
TRIGGER permission on the raw table. To continue refreshing
the continuous agg the owner of the continuous agg needs
only SELECT permission.

This commit adds tests to make sure that removing the
SELECT permission removes ability to refresh using
both REFRESH MATERIALIZED VIEW and also through a background
worker.

This work also uncovered divergence in permission logic for
creating triggers by a CREATE TRIGGER on chunks and when new
chunks are created. This has now been unified: there is a check
to make sure you can create the trigger on the main table and
then there is a check that the owner of the main table can create
triggers on chunks.

Alter view for continuous aggregates is allowed for the owner of the
view.
2019-06-24 10:57:38 -04:00
Matvey Arye
77abec0d38 Improve permission checking for continuous aggs
Checks:
- Create View
- Drop View
- Alter View
- Refresh Materialized View
2019-06-24 10:57:38 -04:00
Matvey Arye
e834c2aba8 Better permission checks in API calls
This commit fixes and tests permissions in the following
API calls:
- reorder_chunk (test only)
- alter_job_schedule
- add_drop_chunks_policy
- remove_drop_chunks_policy
- add_reorder_policy
- remove_reorder_policy
- drop_chunks
2019-06-24 10:57:38 -04:00
gayyappan
0e842e2d90 Fix partial view targetlist for continuous aggregates
The partial view should always project the time_bucket expression related
column as this is a special column for the materialization table. The partial
view failed to project it when the user query's SELECT targetlist did not
contain the time_bucket expression. The materialization fails in this
scenario.
2019-05-02 14:36:33 -04:00
Matvey Arye
2a76041dae Make cont aggs group column names more intuitive
This commit change the name given to group columns in the materialized
tables to make them more intuitive for the user. The goal was to make
the column names the same as the column names in the view. The main
change was to change time_partitioning_col to be the same as the
view. "time_partition_col" is only used as the default when there is
no alias.

This commit also changes the assignment of the view aliases to the
target entries to occur much earlier in the create process.
2019-05-01 14:47:53 -04:00
Joshua Lockerman
b41591bcdb Test continuous aggregates with space partitions
Just a sanity check to make sure they work correctly.
2019-05-01 11:09:43 -04:00
gayyappan
297b9ed66a Add default index for continuous aggregates
Add indexes for materialization table created by continuous aggregates.
This behavior can be turned on/off by using timescaledb.create_group_indexes parameter
of the WITH clause when the continuous agg is created.
2019-04-30 14:31:03 -04:00
Matvey Arye
eec90593fe Rename continuous aggs files for consistency
Rename continuous aggs files to be more consistent and follow our
conventions.
2019-04-26 13:08:00 -04:00