timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-28 01:30:29 +08:00

Author	SHA1	Message	Date
Erik Nordström	4538fc6c40	Optimize continuous aggregate refresh This change ensures a refresh of a continuous aggregate only re-materializes the part of the aggregate that has been invalidated. This makes refreshing much more efficient, and sometimes eliminates the need to materialize data entirely (i.e., in case there are no invalidations in the refresh window). The ranges to refresh are the remainders of invalidations after they are cut by the refresh window (i.e., all invalidations, or parts of invalidations, that fall within the refresh window). The invalidations used for a refresh are collected in a tuple store (which spills to disk) as to not allocate too much memory in case of many invalidations. Invalidations are, however, merged and deduplicated before being added to the tuplestore, similar to how invalidations are processed in the invalidation logs. Currently, the refreshing proceeds with just materializing all invalidated ranges in the order they appear in the tuple store, and the ordering does not matter since all invalidated regions are refreshed in the same transaction.	2020-08-31 10:22:32 +02:00
Erik Nordström	5b8ff384dd	Add infinite invalidations to cagg log In its initial state, a continuous aggregate should be completely invalidated. Therefore, this change adds an infinite invalidation `[-Infinity, +Infinity]` when a continuous aggregate is created.	2020-08-31 10:22:32 +02:00
Mats Kindahl	c054b381c6	Change syntax for continuous aggregates We change the syntax for defining continuous aggregates to use `CREATE MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates a view, while `CREATE MATERIALIZED VIEW` creates a table. Raise an error if `CREATE VIEW` is used to create a continuous aggregate and redirect to `CREATE MATERIALIZED VIEW`. In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous aggregates and continuous aggregates cannot be dropped with `DROP VIEW`. Continuous aggregates are altered using `ALTER MATERIALIZED VIEW` rather than `ALTER VIEW`, so we ensure that it works for `ALTER MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to change a continuous aggregate. Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the partial view as well as with the direct view, so this is handled as a special case. Fixes #2233 Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com> Co-authored-by: Mats Kindahl <mats@timescale.com>	2020-08-27 17:16:10 +02:00
Erik Nordström	418f283443	Merge continuous aggregate invalidations This change implements deduplication and merging of invalidation entries for continuous aggregates in order to reduce the number of reduntant entries in the continuous aggregate invalidation log. Merging is done both when copying over entries from the hypertable to the continuous aggregate invalidation log and when cutting already existing invalidations in the latter log. Doing this merging in both steps helps reduce the number of invalidations also for the continuous aggregates that don't get refreshed by the active refresh command. Merging works by scanning invalidations in order of the lowest modified value, and given this ordering it is possible to merge the current and next entry into one large entry if they are overlapping. This can continue until the current and next invalidation are disjoint or there are no more invalidations to process. Note, however, that only the continuous aggregate that gets refreshed will be fully deduplicated. Some redundant entries might exist for other aggregates since their entries in the continuous aggregate log aren't cut against the refresh window. Full deduplication for the refreshed continuous aggregate is only possible if the continuous aggregate invalidation log is processed last, since that also includes "old" entries. Therefore, this change also changes the ordering of how the logs are processed. This also makes it possible to process the hypertable invalidation log in the first transaction of the refresh.	2020-08-13 12:35:23 +02:00
Erik Nordström	c01faa72f0	Set invalidation threshold during refresh The invalidation threshold governs the window of data from the head of a hypertable that shouldn't be subject to invalidations in order to reduce write amplification during inserts on the hypertable. When a continuous aggregate is refreshed, the invalidation threshold must be moved forward (or initialized if it doesn't previously exist) whenever the refresh window stretches beyond the current threshold. Tests for setting the invalidation threshold are also added, including new isolation tests for concurrency.	2020-08-12 11:16:23 +02:00
Erik Nordström	80720206df	Make refresh_continuous_aggregate a procedure When a continuous aggregate is refreshed, it also needs to move the invalidation threshold in case the refresh window stretches beyond the current threshold. The new invalidation threshold must be set in its own transaction during the refresh, which can only be done if the refresh command is a procedure.	2020-08-12 11:16:23 +02:00
Erik Nordström	9a7b4aa003	Process invalidations when refreshing continuous aggregate This change adds intitial support for invalidation processing when refreshing a continuous aggregate. Note that, currently, invalidations are only cleared during a refresh, but not yet used to optimize refreshes. There are two steps to this processing: 1. Invalidations are moved from hypertable invalidation log to the cagg invalidation log 2. The cagg invalidation entries are then processed for the continuous aggregate that gets refreshed. The second step involves finding all invalidations that overlap with the given refresh window and then either deleting them or cutting them, depending on how they overlap. Currently, the "invalidation threshold" is not moved up during a refresh. This would only be required if the refresh window crosses that threshold and will be addressed in a future change.	2020-08-04 14:22:04 +02:00

7 Commits