29 Commits

Author SHA1 Message Date
Sven Klemm
2d7eb18f24 Drop unused SQL functions
This patch drops the following internal SQL functions which were
unused:
  _timescaledb_internal.is_main_table(regclass);
  _timescaledb_internal.is_main_table(text, text);
  _timescaledb_internal.hypertable_from_main_table(regclass);
  _timescaledb_internal.main_table_from_hypertable(integer);
  _timescaledb_internal.time_literal_sql(bigint, regtype);
2023-04-12 11:00:18 +02:00
Fabrízio de Royes Mello
38fcd1b76b Improve Realtime Continuous Aggregate performance
When calling the `cagg_watermark` function to get the watermark of a
Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query
in the underlying materialization hypertable.

The problem is that a `SELECT MAX(time_dimention)` query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.

Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
- Create CAgg: store the minimum value of hypertable time dimension
  data type;
- Refresh CAgg: store the last value of the time dimension materialized
  in the underlying materialization hypertable (or the minimum value of
  materialization hypertable time dimension data type if there's no
  data materialized);
- Drop CAgg Chunks: the same as refresh cagg.

Closes #4699, #5307
2023-03-22 16:35:23 -03:00
Fabrízio de Royes Mello
c0f2ed1809 Mark cagg_watermark parallel safe
The `cagg_watermark` function perform just read-only operations so is
safe to make it parallel safe to take advantage of the Postgres
parallel query.

Since 2.7 when we introduced the new Continuous Aggregate format we
don't use partials anymore and those aggregate functions
`partialize_agg` and `finalize_agg` are not parallel safe, so make no
sense don't take advantage of Postgres parallel query for realtime
Continuous Aggregates.
2023-01-31 13:07:19 -03:00
Sven Klemm
a4081516ca Append pg_temp to search_path
Postgres will prepend pg_temp to the effective search_path if it
is not present in the search_path. While pg_temp will never be
used to look up functions or operators unless explicitly requested
pg_temp will be used to look up relations. Putting pg_temp in
search_path makes sure objects in pg_temp will be considered last
and pg_temp cannot be used to mask existing objects.
2022-05-03 07:55:43 +02:00
Sven Klemm
6dddfaa54e Lock down search_path in install scripts
This patch locks down search_path in extension install and update
scripts to only contain pg_catalog, this requires that any reference
in those scripts is fully qualified. Additionally we add explicit
create commands to all update scripts for objects added to the
public schema. This change will make update scripts fail if a
function with identical signature already exists when installing
or upgrading instead reusing the existing object.
2022-02-09 17:53:20 +01:00
gayyappan
c55cbb9350 Expose subtract_integer_from_now as SQL function
Move subtract_integer_from_now to src directory
and create a SQL function for it.
2021-10-13 09:11:59 -04:00
Erik Nordström
bc9726607e Use end of last bucket as cagg watermark
The function `cagg_watermark` returns the time threshold at which
materialized data ends and raw query data begins in a real-time
aggregation query (union view).

The watermark is simply the completed threshold of the continuous
aggregate materializer. However, since the completed threshold will no
longer exist with the new continuous aggregates, the watermark
function has been changed to return the end of the last bucket in the
materialized hypertable.

In most cases, the completed threshold is the same as the end of the
last materialized bucket. However, there are situations when it is
not; for example, when there is a filter in the view query some
buckets might not be materialized because no data matched the
filter. The completed threshold would move ahead regardless. For
instance, if there is only data from "device_2" in the raw hypertable
and the aggregate has a filter `device=1`, there will be no buckets
materialized although the completed threshold moves forward. Therefore
the new watermark function might sometimes return a lower watermark
than the old function. A similar situation explains the different
output in one of the union view tests.
2020-09-05 00:38:36 +02:00
Sven Klemm
6aea391477 Fix signature of cagg_watermark
This patch changes the signature from cagg_watermark(oid) to
cagg_watermark(int). Since this is an API breaking change it couldn't
be done in an earlier release.
2020-08-17 18:19:12 +02:00
gayyappan
ed64af76a5 Fix real time aggregate support for multiple aggregates
We should compute the watermark using the materialization
hypertable id and not by using the raw hypertable id.
New test cases added to continuous_aggs_multi.sql. Existing test
cases in continuous_aggs_multi.sql were not correctly updated
for this feature.

Fixes #1865
2020-05-15 10:15:53 -04:00
Sven Klemm
2ae4592930 Add real-time support to continuous aggregates
This PR adds a new mode for continuous aggregates that we name
real-time aggregates. Unlike the original this new mode will
combine materialized data with new data received after the last
refresh has happened. This new mode will be the default behaviour
for newly created continuous aggregates.

To upgrade existing continuous aggregates to the new behaviour
the following command needs to be run for all continuous aggregates

ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=false);

To disable this behaviour for newly created continuous aggregates
and get the old behaviour the following command can be run

ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=true);
2020-03-31 22:09:42 +02:00
Joshua Lockerman
ae3480c2cb Fix continuous_aggs info
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
0737b370a3 Add the actual bgw job for continuous aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00
Sven Klemm
f89fd07c5b Remove year from SQL file license text
This changes the license text for SQL files to be identical
with the license text for C files.
2019-01-13 23:30:22 +01:00
Sven Klemm
c59a30feed Remove unused functions from utils.c
Remove the following unused functions:
_timescaledb_internal.to_microseconds(TIMESTAMPTZ)
_timescaledb_internal.to_timestamp_pg(BIGINT)
_timescaledb_internal.time_to_internal(anyelement)
2018-12-12 20:54:20 +01:00
Joshua Lockerman
e06733acf0 Fix casing in SQL license header to be consistent with elsewhere 2018-11-15 15:18:58 -05:00
Joshua Lockerman
20ec6914c0 Add license headers to SQL files and test code 2018-10-29 13:28:19 -04:00
Joshua Lockerman
974788516a Prefix public C functions with ts_
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
2018-09-27 11:45:04 -04:00
Floris van Nee
1d9ade7145 add support for other types as timescale column 2018-08-08 11:45:23 -04:00
Erik Nordström
9c9cdca6d3 Add support for adaptive chunk sizing
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.

Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
2018-08-08 17:01:31 +02:00
Erik Nordström
71962b86ec Refactor dimension-related API functions
The functions for adding and updating dimensions have been refactored
in C to:

- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER

A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.

A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
2018-01-25 19:02:34 +01:00
Matvey Arye
da8cc797a4 Add support for multiple extension version in one pg instance
This PR adds the ability to have multiple different versions of the timescaledb
extension be used by different databases in the same PostgreSQL
instance (server).

This is accomplished by splitting this extension into two .so files.
1) timescaledb.so -- stuff under loader/. Really not a lot of code.
    This code MUST be backwards compatible in the future.
2) timescaledb-version.so (most of our code). Need
   not be backwards compatible.

Timescaledb.so becomes a small stub which is preloaded and whose main
reason for existing is to dynamically load the right
timescaledb-version.so when the time comes.

This change allows either of the above .so to be loaded in
shared_preload_libraries. But timescaledb.so allows for multiple
versions used on different databases in the same instance along
with smoother upgrades. Using timescaledb-version.so allows for
finer-grained control and lock-in and is appropriate in only a few
production environments.

This PR also adds version checking so that a clear failure message
will be displayed if the .so version does not match the SQL extension
version.

To support multi-version functionality we changed the way SQL update
scripts are generated. Previously, the system used a bunch of
intermediate upgrade scripts.  So with 3 versions, you would have an
update script of 1--2, 2--3.  But, this PR changes things so that we
produce direct "shortcut" update files: 1--3, 2--3.
This is done for 2 reasons:
 1) Each of the update files should point to
    $libdir/timescaledb-current_version. Since you cannot guarantee that
    Previous .so for each intermediate version has been installed.
 2) You don't want intermediate version updates installed without the
    .so. For example, if you have versions 1,2,3
    and you are installing version 3, you want the upgrade files 1--3,
    2--3 but not 1--2 because if you have 1--2
    then a user could do ALTER EXTENSION timescaledb UPDATE TO 2. But
    the .so for version 2 may not be installed.

In order to test this functionality, we add a mock extension version .so
that we can test extension loading inside the regression framework.
2018-01-05 12:15:54 -05:00
Rob Kiefer
e44e47ed88 Update add_dimension to take INTERVAL times
The user should be able to add time dimensions using INTERVAL when
the column type is TIMESTAMP/TIMESTAMPTZ/DATE, so this change adds
that support.

Additionally it adds some additional tests and checks for
add_dimension, e.g., a nice error when the table is not a
hypertable.
2017-12-07 12:09:35 -05:00
Matvey Arye
13e1cb5343 Add reindex function
reindex allows you to reindex the indexes of only certain chunks,
filtering by time. This is a common use case because a user may
want to reindex chunks after they are no longer getting new data once.

reindex also has a recreate option which will not use REINDEX
but will rather CREATE INDEX a new index and then
DROP INDEX / RENAME new_index to old_name. This approach has advantages
in terms of blocking reads for a much shorter period of time. However,
it does more work and will use more disk space during the operation.
2017-11-21 14:08:57 -05:00
Matvey Arye
9c7191e898 Change TIMESTAMP partitioning to be completely tz-independent
Previously, for timezones w/o tz. The range_end and range_start were
defined as UTC, but the constraints on the table were written as as
the local time at the time of chunk creation. This does not work well
if timezones change over the life of the hypertable.

This change removes the dependency on local time for all timestamp
partitioning. Namely, the range_start and range_end remain as UTC
but the constraints are now always written in UTC too. Since old
constraints correctly describe the data currently in the chunks, the
update script to handle this change changes range_start and range_end
instead of the constraints.

Fixes #300.
2017-11-20 09:27:30 -05:00
Erik Nordström
741b25662e Mark IMMUTABLE functions as PARALLEL SAFE
Functions marked IMMUTABLE should also be parallel safe, but
aren't by default. This change marks all immutable functions
as parallel safe and removes the IMMUTABLE definitions on
some functions that have been wrongly labeled as IMMUTABLE.

If functions that are IMMUTABLE does not have the PARALLEL SAFE
label, then some standard PostgreSQL regression tests will fail
(this is true for PostgreSQL >= 10).
2017-11-17 20:24:30 +01:00
Matvey Arye
5cee104d57 Allow chunk_time_interval to be specified as an INTERVAL type 2017-09-15 12:48:14 -04:00
Matvey Arye
d2561cc4fd Add ability to partition by a date type 2017-09-07 12:22:03 -04:00
Matvey Arye
bfe58b61f7 Refactor towards supporting version upgrades
Clean up the table schema to get rid of legacy tables and functionality
that makes it more difficult to provide an upgrade path.

Notable changes:
* Get rid of legacy tables and code
* Simplify directory structure for SQL code
* Simplify table hierarchy: remove root table and make chunk tables
* inherit directly from main table
* Change chunk table suffix from _data to _chunk
* Simplify schema usage: _timescaledb_internal for internal functions.
* _timescaledb_catalog for metadata tables.
* Remove postgres_fdw dependency
* Improve code comments in sql code
2017-06-08 13:55:05 -04:00