7 Commits

Author SHA1 Message Date
Erik Nordström
4538fc6c40 Optimize continuous aggregate refresh
This change ensures a refresh of a continuous aggregate only
re-materializes the part of the aggregate that has been
invalidated. This makes refreshing much more efficient, and sometimes
eliminates the need to materialize data entirely (i.e., in case there
are no invalidations in the refresh window).

The ranges to refresh are the remainders of invalidations after they
are cut by the refresh window (i.e., all invalidations, or parts of
invalidations, that fall within the refresh window). The invalidations
used for a refresh are collected in a tuple store (which spills to
disk) as to not allocate too much memory in case of many
invalidations. Invalidations are, however, merged and deduplicated
before being added to the tuplestore, similar to how invalidations are
processed in the invalidation logs.

Currently, the refreshing proceeds with just materializing all
invalidated ranges in the order they appear in the tuple store, and
the ordering does not matter since all invalidated regions are
refreshed in the same transaction.
2020-08-31 10:22:32 +02:00
Erik Nordström
5b8ff384dd Add infinite invalidations to cagg log
In its initial state, a continuous aggregate should be completely
invalidated. Therefore, this change adds an infinite invalidation
`[-Infinity, +Infinity]` when a continuous aggregate is created.
2020-08-31 10:22:32 +02:00
Mats Kindahl
c054b381c6 Change syntax for continuous aggregates
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table.  Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.

In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.

Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.

Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.

Fixes #2233

Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
2020-08-27 17:16:10 +02:00
Erik Nordström
418f283443 Merge continuous aggregate invalidations
This change implements deduplication and merging of invalidation
entries for continuous aggregates in order to reduce the number of
reduntant entries in the continuous aggregate invalidation
log. Merging is done both when copying over entries from the
hypertable to the continuous aggregate invalidation log and when
cutting already existing invalidations in the latter log. Doing this
merging in both steps helps reduce the number of invalidations also
for the continuous aggregates that don't get refreshed by the active
refresh command.

Merging works by scanning invalidations in order of the lowest
modified value, and given this ordering it is possible to merge the
current and next entry into one large entry if they are
overlapping. This can continue until the current and next invalidation
are disjoint or there are no more invalidations to process.

Note, however, that only the continuous aggregate that gets refreshed
will be fully deduplicated. Some redundant entries might exist for
other aggregates since their entries in the continuous aggregate log
aren't cut against the refresh window.

Full deduplication for the refreshed continuous aggregate is only
possible if the continuous aggregate invalidation log is processed
last, since that also includes "old" entries. Therefore, this change
also changes the ordering of how the logs are processed. This also
makes it possible to process the hypertable invalidation log in the
first transaction of the refresh.
2020-08-13 12:35:23 +02:00
Erik Nordström
c01faa72f0 Set invalidation threshold during refresh
The invalidation threshold governs the window of data from the head of
a hypertable that shouldn't be subject to invalidations in order to
reduce write amplification during inserts on the hypertable.

When a continuous aggregate is refreshed, the invalidation threshold
must be moved forward (or initialized if it doesn't previously exist)
whenever the refresh window stretches beyond the current threshold.

Tests for setting the invalidation threshold are also added, including
new isolation tests for concurrency.
2020-08-12 11:16:23 +02:00
Erik Nordström
80720206df Make refresh_continuous_aggregate a procedure
When a continuous aggregate is refreshed, it also needs to move the
invalidation threshold in case the refresh window stretches beyond the
current threshold. The new invalidation threshold must be set in its
own transaction during the refresh, which can only be done if the
refresh command is a procedure.
2020-08-12 11:16:23 +02:00
Erik Nordström
9a7b4aa003 Process invalidations when refreshing continuous aggregate
This change adds intitial support for invalidation processing when
refreshing a continuous aggregate. Note that, currently, invalidations
are only cleared during a refresh, but not yet used to optimize
refreshes. There are two steps to this processing:

1. Invalidations are moved from hypertable invalidation log to the
   cagg invalidation log
2. The cagg invalidation entries are then processed for the continuous
   aggregate that gets refreshed.

The second step involves finding all invalidations that overlap with
the given refresh window and then either deleting them or cutting
them, depending on how they overlap.

Currently, the "invalidation threshold" is not moved up during a
refresh. This would only be required if the refresh window crosses
that threshold and will be addressed in a future change.
2020-08-04 14:22:04 +02:00