339 Commits

Author SHA1 Message Date
Brian Rowe
aeac52aef6 Rename telemetry_metadata table to just metadata
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata.  It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.

It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this.  Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
2019-05-17 17:04:42 -07:00
Sven Klemm
bfabb30be0 Release 1.3.0 2019-05-07 02:47:13 +02:00
Joshua Lockerman
899cd0538d Allow scheduled drop_chunks to cascade to aggs
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
2019-04-30 15:46:49 -04:00
Matvey Arye
74f8d204a5 Optimize getting the chunk_id in continuous aggs
We replace chunk_for_tuple with chunk_id_from_relid for getting
chunk id fields when materializing continuous aggs. The old
function required passing in the entire row. This was very slow
because a lot of data was passed around at execution time.

The new function just uses the internal `tableoid` attribute to
convert the table relid to a chunk_id. This is much more efficient.
We also add memoization to the new function because it is most often
called consecutively for the same chunk.
2019-04-29 15:45:23 -04:00
Joshua Lockerman
ae3480c2cb Fix continuous_aggs info
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
3895e5ce0e Add a setting for max an agg materializes per run
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
2019-04-26 13:08:00 -04:00
gayyappan
b8f9b91e60 Add user view query definition for cont aggs
Add the query definition to
timescaledb_information.continuous_aggregates.

The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.

As an alternative we could save the original user query in our internal
catalogs.  But this approach involves replicating a lot of postgres code
and causes portability problems.
2019-04-26 13:08:00 -04:00
Matvey Arye
dc0e250428 Add pg_dump/restore tests for continuous aggs
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.

Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type

2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
45fb1fc2c8 Handle drop_chunks on tables that have cont aggs
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
2019-04-26 13:08:00 -04:00
gayyappan
18d1607909 Add timescaledb_information views for continuous aggregates
Add timescaledb_information.continuous_aggregate_settings and timescaledb_information.continuous_aggregate_job_stats views
2019-04-26 13:08:00 -04:00
Matvey Arye
19d47daf23 Delete related catalog rows when continuous aggs are dropped
This PR deletes related rows from the following tables
* completed_threshold
* invalidation threshold
* hypertable invalidation log

The latter two tables are only affected if no other continuous aggs
exist on the raw hyperatble.

This commit also adds locks to prevent concurrent raw table inserts
and any access to the materialization table when dropping caggs. It
also moves all locks to the beginning of the function so that the lock
order is easier to track and reason about.

Also added a few formatting fixes.
2019-04-26 13:08:00 -04:00
gayyappan
1cbd8c74f7 Add invalidation trigger for continuous aggs
Add invalidation trigger for DML changes to the hypertable used in
the continuous aggregate query.

Also add user_view_query definition in continuous_agg catalog table.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
0737b370a3 Add the actual bgw job for continuous aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00
gayyappan
2dbc28df82 Create base infrastructure for continuous aggs
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:

1) The user view - This is the actual view queried by the enduser.
   It is a query on top of the materialized hypertable and is
   responsible for finalizing and combining partials in a manner
   that return to the user the data as defined by the original
   user-defined view.
2) The partial view - which queries the raw table and returns
   columns as defined in the materialized table. This will be used
   by the materializer to calculate the data that will be inserted
   into the materialization table. Note the data here is the partial
   state of any aggregates.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
1e486ef2a4 Fix ts_chunk_for_tuple performance
ts_chunk_for_tuple should use the chunk cache.
ts_chunk_for_tuple should be marked stable.
These fixes markedly improve performance.
2019-04-19 12:46:36 -04:00
David Kohn
35a1e357d8 Add functions for turning restoring on/off and setting license key
These functions improve usability and take all the proper steps to
set restoring on/off (and stop/start background workers in the process)
and to set the license key via a function rather than a guc modification.
2019-04-18 11:59:31 -04:00
Sven Klemm
7961fc77e9 Rename installation_metadata to telemetry_metadata 2019-04-15 21:44:10 +02:00
Matvey Arye
672c41755f Rename files to partialize_finalize
It's better to have more concrete names than just util_aggfns.

Also add TSDLLEXPORT where appropriate for windows.
2019-04-12 12:12:17 -04:00
Matvey Arye
d7b6ad239b Add support for FINALFUNC_EXTRA
This PR adds support for finalizing aggregates with FINALFUNC_EXTRA. To
do this, we need to pass NULLS correspond to all of the aggregate
parameters to the ffunc as arguments following the partial state value.
These arguments need to have the correct concrete types.

For polymorphic aggregates, the types cannot be derived from the catalog
but need to be somehow conveyed to the finalize_agg. Two designs were
considered:

1) Encode the type names as part of the partial state (bytea)
2) Pass down the arguments as parameters to the finalize_agg

In the end (2) was picked for the simple reason that (1) would have
increased the size of each partial, sometimes considerably (esp. for small
partial values).

The types are passed down as names not OIDs because in the continuous
agg case using OIDs is not safe for backup/restore and in the clustering
case the datanodes may not have the same type OIDs either.
2019-04-12 12:12:17 -04:00
gayyappan
b45343b3cc Add ability to work with aggregate partials
The ability to get aggregate partials instead of the final state
is important for both continuous aggregation and clustering.

This commit adds the ability to work with aggregate partials.
Namely a function called _timescaledb_internal.partialize_agg
can now wrap an aggregate and return the partial results as a bytea.

The _timescaledb_internal.finalize_agg aggregate allows you to combine
and finalize partials.

The partialize_agg function works as a marker in the planner to force
the planner to return partial result.

Unfortunately, we could not get the planner to modify the plan directly
to aggregate partials. Instead, the finalize_agg is a real aggregate
that performs aggregation on the partial state. Note that it is not
yet parallel.

Aggregate that use FINALFUNC_EXTRA are currently not supported.

Co-authored-by: gayyappan <gayathri@timescale.com>
Co-authored-by: David Kohn <david@timescale.com>
2019-04-12 12:12:17 -04:00
Dmitry Simonenko
4daeb06eee Track hypertables used during process utility hook execution
This patch does refactoring necessary to support execution of DDL
commands on remote servers.

Basically it extends cross module api with ddl_command_start,
ddl_command_end and sql_drop functions.

Variable hypertables_list added to ProcessUtilityArg. It is used
to keep a list of found hypertables during Utility/DDL statement
parsing. This information and information gathered from other
hook functions will be used to distinct distributed hypertables
and forward DDL commands to any remote servers associated with
them.
2019-03-29 13:04:18 +03:00
Sven Klemm
89cb73318d Add support for window functions to gapfill
This patch adds full support for window functions to gapfill queries.
The targetlist for the gapfill node is built from the final targetlist
and pushed down until aggregation node. locf and interpolate function
calls will be toplevel function calls in the targetlist of the gapfill node.
This patch changes gapfill code to no longer remove the marker
function calls from the plans to allow PostgreSQL to properly identify
subexpressions in targetlist.
2019-03-26 05:14:16 +01:00
Sven Klemm
38483358d0 Release 1.2.2 2019-03-14 14:32:27 +01:00
Joshua Lockerman
905cd4becc Add function to determine the chunk for a given row 2019-03-11 16:29:50 -04:00
Sven Klemm
33ef1de542 Add treat_null_as_missing option to locf
When doing a gapfill query with multiple columns that may contain
NULLs it is not trivial to remove NULL values from individual columns
with a WHERE clause, this new locf option allows those NULL values
to be ignored in gapfill queries with locf.

We drop the old locf function because we dont want 2 locf functions.
Unfortunately this means any views using locf have to be dropped.
2019-02-16 00:09:38 +01:00
Joshua Lockerman
6d9ffe5c7d Release 1.2.1 2019-02-08 19:11:09 -05:00
Joshua Lockerman
4295c04caf Release 1.2.0 2019-01-28 20:47:04 -05:00
Sven Klemm
fd8a5197c8 Make time_bucket_gapfill start and finish optional
Make time_bucket_gapfill start and finish optional, this is in
preparation for deducing them from WHERE clause. We make this
optional now to not introduce breaking change later. This also
only allows simple expressions for bucket_width, start and
finish because only those can be evaluated safely in gapfill_begin.
2019-01-28 19:07:34 +01:00
Joshua Lockerman
88c7149c2c Fix issues with non-dev versions when generating the update scripts
Without this, we generate multiple rules for the latest script
2019-01-28 10:04:59 -05:00
David Kohn
cf67ddd9b0 Add informational views for policies
Add views so that users can see what the parameters are for policies they have created
and a separate view so that they can see policies that have been created and scheduled on hypertables.
2019-01-25 13:51:52 -05:00
David Kohn
73d3a14665 Rename alter_policy_schedule & main_table for better UI
Rename alter_policy_schedule to alter_job_schedule for consistency with the job_id argument passed in.
Also rename main_table to hypertable in all of the policy related functions as they must deal with
hypertables that have already been created.
2019-01-25 13:51:52 -05:00
Sven Klemm
fa61613440 Change time_bucket_gapfill argument names
time_bucket_gapfill used end as argument name which is a sql keyword
and has to be quoted when used, this changes the argument names from
start/end to start/finish.
2019-01-25 18:38:55 +01:00
niksa
319b79c8ec Making chunks_in function internal
This function needs chunk ids as input. Since chunk ids are
 TimescaleDB internal metadata it feels more natural to make this function internal.
2019-01-23 10:04:06 +01:00
niksa
c77f4ab1b3 Explicit chunk exclusion
In some cases user might already know what chunks need to be scanned to answer
a particular query. Using `chunks_in` function we can skip calculating chunks
involved in particular query which should result in better performances as well.
A simple example:

`SELECT * FROM hypertable WHERE chunks_in(hypertable, ARRAY[1,2])`
2019-01-19 00:02:01 +01:00
Joshua Lockerman
fdaa7173fb Update telemetry with prettier os info
The info gotten from uname is difficult to work with, so read the os
name from /etc/os-release if it's available.
2019-01-18 10:23:01 -05:00
Sven Klemm
f89fd07c5b Remove year from SQL file license text
This changes the license text for SQL files to be identical
with the license text for C files.
2019-01-13 23:30:22 +01:00
Joshua Lockerman
65894f08cf Add view displaying info about the current license
Currently the view displays the current edition, expiry date, and
whether the license is expired. We're not displaying the license key
itself in the view as it can get rather long, and get be read via SHOW.
We also do not display the license's ID since that is for internal use.
2019-01-10 17:29:59 -05:00
Joshua Lockerman
47b5b7d553 Log which chunks are dropped by background workers
We don't want to do this silently, so that users are
able to debug where their chunks went.
2019-01-10 13:53:38 -05:00
Joshua Lockerman
27cd0fa27d Fix speeling 2019-01-09 17:16:17 -05:00
Joshua Lockerman
fafc98d343 Fix warnings for TSL licenses
So as to reduce the amount of logspam users receive, restrict printing license info
to the following:

  1. On CREATE EXTENSION
       a. in the notice, print the license expiration time, if any
       b. if the license is expired additionally print that
       c. else if the license will expire within a week print an addional warning
  2. On the first usage of a TSL function, print if the license is expired or will
     be expired within a week
2019-01-08 19:35:50 -05:00
Joshua Lockerman
28265dcc1f Use a fixed file for the latest dev version
When developing a feature across releases, timescaledb updates can get
stuck in the wrong update script, breaking the update process. To avoid
this, we introduce a new file "latest-dev.sql" in which all new updates
should go. During a release, this file gets renamed to
"<previous version>--<current version>.sql" ensuring that all new
updates are released and all updates in other branches will
automatically get redirected to the next update script.
2019-01-07 14:03:05 -05:00
Joshua Lockerman
2a284fc84e Move 1.2.0 updates to the correct file 2019-01-02 15:43:48 -05:00
Sven Klemm
6125111dfa Mark gapfill functions parallel safe
Gapfill functions need to be marked parallel safe to not prevent
parallelism. The gapfill node itself is still parallel restricted
but child nodes can be parallel
2019-01-02 15:43:48 -05:00
Joshua Lockerman
4e1e15f079 Add reorder command
New cluster-like command which writes to a new index than swaps,
much like is done for the data table, and only acquires
exclusive locks for said swap. This trades off disk usage for
lower contention: we hold locks for a much lower period of time,
allowing reads to work concurrently, but we have both the old
and new versions of the table existing at once, approximately
doubling storage usage while reorder is running.

Currently only works on chunks.
2019-01-02 15:43:48 -05:00
Amy Tai
9ad73249e1 Move enterprise updates to newest update file 2019-01-02 15:43:48 -05:00
Amy Tai
ef43e52107 Add alter_policy_schedule API function 2019-01-02 15:43:48 -05:00
Sven Klemm
5ba740ed98 Add gapfill query support
This patch adds first level support for gap fill queries, including
support for LOCF (last observation carried forward) and interpolation, without
requiring to join against `generate_series`. This makes it easier to join
timeseries with different or irregular sampling intervals.
2019-01-02 15:43:48 -05:00
Amy Tai
be7c74cdf3 Add logic for automatic DB maintenance functions
This commit adds logic for manipulating internal metadata tables used for enabling users to schedule automatic drop_chunks and recluster policies. This commit includes:

- SQL for creating policy tables and chunk stats table
- Catalog code and C code for accessing these three tables programatically
- Implement and expose new user API functions:  add_*_policy and remove_*_policy
- Stub scheduler logic for running the policies
2019-01-02 15:43:48 -05:00
Joshua Lockerman
4ff6ac7b91 Initial Timescale-Licensed-Module and License-Key Implementation
This commit adds support for dynamically loaded submodules to timescaledb
as well an initial license-key implementation in the tsl subdirectory.
Dynamically loaded modules allow our users to determine which licenses they
wish to use for their version of timescaledb; if they wish to only use
Apache-Licensed code, they do not load the Timescale-Licensed submodule. Calls
from the Apache-Licensed code into the Timescale-Licensed submodule are
handled via dynamicaly-set function pointers; see tsl/src/Readme.module.md for
more details.

This commit also adds code for license keys for the ApacheOnly, Community, and
Enterprise editions. The license key determines which features are enabled,
and controls loading the submodule: when a license key that requires the
sub-module is installed, the module is automatically loaded.
Currently the ApacheOnly and Community license-keys are hardcoded to be
"ApacheOnly" and "Community" respectively. The first version of the enterprise
license-key is described in tsl/src/Readme.module.md
2019-01-02 15:43:48 -05:00