This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata. It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.
It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this. Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
Add the query definition to
timescaledb_information.continuous_aggregates.
The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.
As an alternative we could save the original user query in our internal
catalogs. But this approach involves replicating a lot of postgres code
and causes portability problems.
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.
Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type
2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.
INSERT path:
On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
invalidated (that is, newly inserted, updated, or deleted) to
_timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
will be used to re-materialize these ranges, to ensure that the aggregate
is up-to-date. Currently these invalidations are recorded in by a trigger
_timescaledb_internal.continuous_agg_invalidation_trigger, which should be
added to the hypertable when the continuous aggregate is created. This trigger
stores a cache of min/max values per-hypertable, and on transaction commit
writes them to the log, if needed. At the moment, we consider them to always
be needed, unless we're in ReadCommitted mode or weaker, and the min
invalidated value is greater than the hypertable's invalidation threshold
(found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)
Materialization path:
Materialization currently happens in multiple phase: in phase 1 we determine
the timestamp at which we will end the new set of materializations, then we
update the hypertable's invalidation threshold to that point, and finally we
read the current invalidations, then materialize any invalidated rows, the new
range between the continuous aggregate's completed threshold (found in
_timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
invalidation threshold. After all of this is done we update the completed
threshold to the invalidation threshold. The portion of this protocol from
after the invalidations are read, until the completed threshold is written
(that is, actually materializing, and writing the completion threshold) is
included with this commit, with the remainder to follow in subsequent ones.
One important caveat is that since the thresholds are exclusive, we invalidate
all values _less_ than the invalidation threshold, and we store timevalue
as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
invalidated. To avoid this problem, we never materialize the time bucket
containing PG_INT64_MAX.
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:
1) The user view - This is the actual view queried by the enduser.
It is a query on top of the materialized hypertable and is
responsible for finalizing and combining partials in a manner
that return to the user the data as defined by the original
user-defined view.
2) The partial view - which queries the raw table and returns
columns as defined in the materialized table. This will be used
by the materializer to calculate the data that will be inserted
into the materialization table. Note the data here is the partial
state of any aggregates.
This PR adds support for finalizing aggregates with FINALFUNC_EXTRA. To
do this, we need to pass NULLS correspond to all of the aggregate
parameters to the ffunc as arguments following the partial state value.
These arguments need to have the correct concrete types.
For polymorphic aggregates, the types cannot be derived from the catalog
but need to be somehow conveyed to the finalize_agg. Two designs were
considered:
1) Encode the type names as part of the partial state (bytea)
2) Pass down the arguments as parameters to the finalize_agg
In the end (2) was picked for the simple reason that (1) would have
increased the size of each partial, sometimes considerably (esp. for small
partial values).
The types are passed down as names not OIDs because in the continuous
agg case using OIDs is not safe for backup/restore and in the clustering
case the datanodes may not have the same type OIDs either.
The ability to get aggregate partials instead of the final state
is important for both continuous aggregation and clustering.
This commit adds the ability to work with aggregate partials.
Namely a function called _timescaledb_internal.partialize_agg
can now wrap an aggregate and return the partial results as a bytea.
The _timescaledb_internal.finalize_agg aggregate allows you to combine
and finalize partials.
The partialize_agg function works as a marker in the planner to force
the planner to return partial result.
Unfortunately, we could not get the planner to modify the plan directly
to aggregate partials. Instead, the finalize_agg is a real aggregate
that performs aggregation on the partial state. Note that it is not
yet parallel.
Aggregate that use FINALFUNC_EXTRA are currently not supported.
Co-authored-by: gayyappan <gayathri@timescale.com>
Co-authored-by: David Kohn <david@timescale.com>
When doing a gapfill query with multiple columns that may contain
NULLs it is not trivial to remove NULL values from individual columns
with a WHERE clause, this new locf option allows those NULL values
to be ignored in gapfill queries with locf.
We drop the old locf function because we dont want 2 locf functions.
Unfortunately this means any views using locf have to be dropped.
When developing a feature across releases, timescaledb updates can get
stuck in the wrong update script, breaking the update process. To avoid
this, we introduce a new file "latest-dev.sql" in which all new updates
should go. During a release, this file gets renamed to
"<previous version>--<current version>.sql" ensuring that all new
updates are released and all updates in other branches will
automatically get redirected to the next update script.