1
0
mirror of https://github.com/timescale/timescaledb.git synced 2025-05-26 08:41:09 +08:00

268 Commits

Author SHA1 Message Date
Brian Rowe
aeac52aef6 Rename telemetry_metadata table to just metadata
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata.  It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.

It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this.  Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
2019-05-17 17:04:42 -07:00
Sven Klemm
bfabb30be0 Release 1.3.0 2019-05-07 02:47:13 +02:00
Joshua Lockerman
899cd0538d Allow scheduled drop_chunks to cascade to aggs
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
2019-04-30 15:46:49 -04:00
Joshua Lockerman
ae3480c2cb Fix continuous_aggs info
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
2019-04-26 13:08:00 -04:00
gayyappan
b8f9b91e60 Add user view query definition for cont aggs
Add the query definition to
timescaledb_information.continuous_aggregates.

The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.

As an alternative we could save the original user query in our internal
catalogs.  But this approach involves replicating a lot of postgres code
and causes portability problems.
2019-04-26 13:08:00 -04:00
Matvey Arye
dc0e250428 Add pg_dump/restore tests for continuous aggs
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.

Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type

2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
45fb1fc2c8 Handle drop_chunks on tables that have cont aggs
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
2019-04-26 13:08:00 -04:00
Joshua Lockerman
0737b370a3 Add the actual bgw job for continuous aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00
gayyappan
2dbc28df82 Create base infrastructure for continuous aggs
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:

1) The user view - This is the actual view queried by the enduser.
   It is a query on top of the materialized hypertable and is
   responsible for finalizing and combining partials in a manner
   that return to the user the data as defined by the original
   user-defined view.
2) The partial view - which queries the raw table and returns
   columns as defined in the materialized table. This will be used
   by the materializer to calculate the data that will be inserted
   into the materialization table. Note the data here is the partial
   state of any aggregates.
2019-04-26 13:08:00 -04:00
Sven Klemm
7961fc77e9 Rename installation_metadata to telemetry_metadata 2019-04-15 21:44:10 +02:00
Matvey Arye
d7b6ad239b Add support for FINALFUNC_EXTRA
This PR adds support for finalizing aggregates with FINALFUNC_EXTRA. To
do this, we need to pass NULLS correspond to all of the aggregate
parameters to the ffunc as arguments following the partial state value.
These arguments need to have the correct concrete types.

For polymorphic aggregates, the types cannot be derived from the catalog
but need to be somehow conveyed to the finalize_agg. Two designs were
considered:

1) Encode the type names as part of the partial state (bytea)
2) Pass down the arguments as parameters to the finalize_agg

In the end (2) was picked for the simple reason that (1) would have
increased the size of each partial, sometimes considerably (esp. for small
partial values).

The types are passed down as names not OIDs because in the continuous
agg case using OIDs is not safe for backup/restore and in the clustering
case the datanodes may not have the same type OIDs either.
2019-04-12 12:12:17 -04:00
gayyappan
b45343b3cc Add ability to work with aggregate partials
The ability to get aggregate partials instead of the final state
is important for both continuous aggregation and clustering.

This commit adds the ability to work with aggregate partials.
Namely a function called _timescaledb_internal.partialize_agg
can now wrap an aggregate and return the partial results as a bytea.

The _timescaledb_internal.finalize_agg aggregate allows you to combine
and finalize partials.

The partialize_agg function works as a marker in the planner to force
the planner to return partial result.

Unfortunately, we could not get the planner to modify the plan directly
to aggregate partials. Instead, the finalize_agg is a real aggregate
that performs aggregation on the partial state. Note that it is not
yet parallel.

Aggregate that use FINALFUNC_EXTRA are currently not supported.

Co-authored-by: gayyappan <gayathri@timescale.com>
Co-authored-by: David Kohn <david@timescale.com>
2019-04-12 12:12:17 -04:00
Sven Klemm
33ef1de542 Add treat_null_as_missing option to locf
When doing a gapfill query with multiple columns that may contain
NULLs it is not trivial to remove NULL values from individual columns
with a WHERE clause, this new locf option allows those NULL values
to be ignored in gapfill queries with locf.

We drop the old locf function because we dont want 2 locf functions.
Unfortunately this means any views using locf have to be dropped.
2019-02-16 00:09:38 +01:00
Joshua Lockerman
4295c04caf Release 1.2.0 2019-01-28 20:47:04 -05:00
Joshua Lockerman
fdaa7173fb Update telemetry with prettier os info
The info gotten from uname is difficult to work with, so read the os
name from /etc/os-release if it's available.
2019-01-18 10:23:01 -05:00
Joshua Lockerman
47b5b7d553 Log which chunks are dropped by background workers
We don't want to do this silently, so that users are
able to debug where their chunks went.
2019-01-10 13:53:38 -05:00
Joshua Lockerman
28265dcc1f Use a fixed file for the latest dev version
When developing a feature across releases, timescaledb updates can get
stuck in the wrong update script, breaking the update process. To avoid
this, we introduce a new file "latest-dev.sql" in which all new updates
should go. During a release, this file gets renamed to
"<previous version>--<current version>.sql" ensuring that all new
updates are released and all updates in other branches will
automatically get redirected to the next update script.
2019-01-07 14:03:05 -05:00