This patch drops the following internal SQL functions which were
unused:
_timescaledb_internal.is_main_table(regclass);
_timescaledb_internal.is_main_table(text, text);
_timescaledb_internal.hypertable_from_main_table(regclass);
_timescaledb_internal.main_table_from_hypertable(integer);
_timescaledb_internal.time_literal_sql(bigint, regtype);
When calling the `cagg_watermark` function to get the watermark of a
Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query
in the underlying materialization hypertable.
The problem is that a `SELECT MAX(time_dimention)` query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.
Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
- Create CAgg: store the minimum value of hypertable time dimension
data type;
- Refresh CAgg: store the last value of the time dimension materialized
in the underlying materialization hypertable (or the minimum value of
materialization hypertable time dimension data type if there's no
data materialized);
- Drop CAgg Chunks: the same as refresh cagg.
Closes#4699, #5307
The `cagg_watermark` function perform just read-only operations so is
safe to make it parallel safe to take advantage of the Postgres
parallel query.
Since 2.7 when we introduced the new Continuous Aggregate format we
don't use partials anymore and those aggregate functions
`partialize_agg` and `finalize_agg` are not parallel safe, so make no
sense don't take advantage of Postgres parallel query for realtime
Continuous Aggregates.
Postgres will prepend pg_temp to the effective search_path if it
is not present in the search_path. While pg_temp will never be
used to look up functions or operators unless explicitly requested
pg_temp will be used to look up relations. Putting pg_temp in
search_path makes sure objects in pg_temp will be considered last
and pg_temp cannot be used to mask existing objects.
This patch locks down search_path in extension install and update
scripts to only contain pg_catalog, this requires that any reference
in those scripts is fully qualified. Additionally we add explicit
create commands to all update scripts for objects added to the
public schema. This change will make update scripts fail if a
function with identical signature already exists when installing
or upgrading instead reusing the existing object.
The function `cagg_watermark` returns the time threshold at which
materialized data ends and raw query data begins in a real-time
aggregation query (union view).
The watermark is simply the completed threshold of the continuous
aggregate materializer. However, since the completed threshold will no
longer exist with the new continuous aggregates, the watermark
function has been changed to return the end of the last bucket in the
materialized hypertable.
In most cases, the completed threshold is the same as the end of the
last materialized bucket. However, there are situations when it is
not; for example, when there is a filter in the view query some
buckets might not be materialized because no data matched the
filter. The completed threshold would move ahead regardless. For
instance, if there is only data from "device_2" in the raw hypertable
and the aggregate has a filter `device=1`, there will be no buckets
materialized although the completed threshold moves forward. Therefore
the new watermark function might sometimes return a lower watermark
than the old function. A similar situation explains the different
output in one of the union view tests.
This patch changes the signature from cagg_watermark(oid) to
cagg_watermark(int). Since this is an API breaking change it couldn't
be done in an earlier release.
We should compute the watermark using the materialization
hypertable id and not by using the raw hypertable id.
New test cases added to continuous_aggs_multi.sql. Existing test
cases in continuous_aggs_multi.sql were not correctly updated
for this feature.
Fixes#1865
This PR adds a new mode for continuous aggregates that we name
real-time aggregates. Unlike the original this new mode will
combine materialized data with new data received after the last
refresh has happened. This new mode will be the default behaviour
for newly created continuous aggregates.
To upgrade existing continuous aggregates to the new behaviour
the following command needs to be run for all continuous aggregates
ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=false);
To disable this behaviour for newly created continuous aggregates
and get the old behaviour the following command can be run
ALTER VIEW continuous_view_name SET (timescaledb.materialized_only=true);
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.
INSERT path:
On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
invalidated (that is, newly inserted, updated, or deleted) to
_timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
will be used to re-materialize these ranges, to ensure that the aggregate
is up-to-date. Currently these invalidations are recorded in by a trigger
_timescaledb_internal.continuous_agg_invalidation_trigger, which should be
added to the hypertable when the continuous aggregate is created. This trigger
stores a cache of min/max values per-hypertable, and on transaction commit
writes them to the log, if needed. At the moment, we consider them to always
be needed, unless we're in ReadCommitted mode or weaker, and the min
invalidated value is greater than the hypertable's invalidation threshold
(found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)
Materialization path:
Materialization currently happens in multiple phase: in phase 1 we determine
the timestamp at which we will end the new set of materializations, then we
update the hypertable's invalidation threshold to that point, and finally we
read the current invalidations, then materialize any invalidated rows, the new
range between the continuous aggregate's completed threshold (found in
_timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
invalidation threshold. After all of this is done we update the completed
threshold to the invalidation threshold. The portion of this protocol from
after the invalidations are read, until the completed threshold is written
(that is, actually materializing, and writing the completion threshold) is
included with this commit, with the remainder to follow in subsequent ones.
One important caveat is that since the thresholds are exclusive, we invalidate
all values _less_ than the invalidation threshold, and we store timevalue
as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
invalidated. To avoid this problem, we never materialize the time bucket
containing PG_INT64_MAX.
Remove the following unused functions:
_timescaledb_internal.to_microseconds(TIMESTAMPTZ)
_timescaledb_internal.to_timestamp_pg(BIGINT)
_timescaledb_internal.time_to_internal(anyelement)
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.
Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
The functions for adding and updating dimensions have been refactored
in C to:
- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.
A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
This PR adds the ability to have multiple different versions of the timescaledb
extension be used by different databases in the same PostgreSQL
instance (server).
This is accomplished by splitting this extension into two .so files.
1) timescaledb.so -- stuff under loader/. Really not a lot of code.
This code MUST be backwards compatible in the future.
2) timescaledb-version.so (most of our code). Need
not be backwards compatible.
Timescaledb.so becomes a small stub which is preloaded and whose main
reason for existing is to dynamically load the right
timescaledb-version.so when the time comes.
This change allows either of the above .so to be loaded in
shared_preload_libraries. But timescaledb.so allows for multiple
versions used on different databases in the same instance along
with smoother upgrades. Using timescaledb-version.so allows for
finer-grained control and lock-in and is appropriate in only a few
production environments.
This PR also adds version checking so that a clear failure message
will be displayed if the .so version does not match the SQL extension
version.
To support multi-version functionality we changed the way SQL update
scripts are generated. Previously, the system used a bunch of
intermediate upgrade scripts. So with 3 versions, you would have an
update script of 1--2, 2--3. But, this PR changes things so that we
produce direct "shortcut" update files: 1--3, 2--3.
This is done for 2 reasons:
1) Each of the update files should point to
$libdir/timescaledb-current_version. Since you cannot guarantee that
Previous .so for each intermediate version has been installed.
2) You don't want intermediate version updates installed without the
.so. For example, if you have versions 1,2,3
and you are installing version 3, you want the upgrade files 1--3,
2--3 but not 1--2 because if you have 1--2
then a user could do ALTER EXTENSION timescaledb UPDATE TO 2. But
the .so for version 2 may not be installed.
In order to test this functionality, we add a mock extension version .so
that we can test extension loading inside the regression framework.
The user should be able to add time dimensions using INTERVAL when
the column type is TIMESTAMP/TIMESTAMPTZ/DATE, so this change adds
that support.
Additionally it adds some additional tests and checks for
add_dimension, e.g., a nice error when the table is not a
hypertable.
reindex allows you to reindex the indexes of only certain chunks,
filtering by time. This is a common use case because a user may
want to reindex chunks after they are no longer getting new data once.
reindex also has a recreate option which will not use REINDEX
but will rather CREATE INDEX a new index and then
DROP INDEX / RENAME new_index to old_name. This approach has advantages
in terms of blocking reads for a much shorter period of time. However,
it does more work and will use more disk space during the operation.
Previously, for timezones w/o tz. The range_end and range_start were
defined as UTC, but the constraints on the table were written as as
the local time at the time of chunk creation. This does not work well
if timezones change over the life of the hypertable.
This change removes the dependency on local time for all timestamp
partitioning. Namely, the range_start and range_end remain as UTC
but the constraints are now always written in UTC too. Since old
constraints correctly describe the data currently in the chunks, the
update script to handle this change changes range_start and range_end
instead of the constraints.
Fixes#300.
Functions marked IMMUTABLE should also be parallel safe, but
aren't by default. This change marks all immutable functions
as parallel safe and removes the IMMUTABLE definitions on
some functions that have been wrongly labeled as IMMUTABLE.
If functions that are IMMUTABLE does not have the PARALLEL SAFE
label, then some standard PostgreSQL regression tests will fail
(this is true for PostgreSQL >= 10).
Clean up the table schema to get rid of legacy tables and functionality
that makes it more difficult to provide an upgrade path.
Notable changes:
* Get rid of legacy tables and code
* Simplify directory structure for SQL code
* Simplify table hierarchy: remove root table and make chunk tables
* inherit directly from main table
* Change chunk table suffix from _data to _chunk
* Simplify schema usage: _timescaledb_internal for internal functions.
* _timescaledb_catalog for metadata tables.
* Remove postgres_fdw dependency
* Improve code comments in sql code