Primary and unqiue constraints are limited to segment_by and order_by
columns and foreign key constraints are limited to segment_by columns
when creating a compressed hypertable. There are no restrictions on
check constraints.
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.
This also removes the custom type that was used before.
This commit add handling for dropping of chunks and hypertables
in the presence of associated compressed objects. If the uncompressed
chunk/hypertable is dropped than drop the associated compressed object
using DROP_RESTRICT unless cascading is explicitly enabled.
Also add a compressed_chunk_id index on compressed tables for
figuring out whether a chunk is compressed or not.
Change a bunch of APIs to use DropBehavior instead of a cascade bool
to be more explicit.
Also test the drop chunks policy.
This commit pushes down quals or order_by columns to make
use of the SegmentMetaMinMax objects. Namely =,<,<=,>,>= quals
can now be pushed down.
We also remove filters from decompress node for quals that
have been pushed down and don't need a recheck.
This commit also changes tests to add more segment by and
order-by columns.
Finally, we rename segment meta accessor functions to be smaller
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.
We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
Add support for compress_chunks function.
This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.
The parsing code will most likely be changed to use PG raw_parser
function.
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:
- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
structure and does not actually compress it (though
TOAST-based compression can be applied on top).
These compression algorithms are are fully described in
tsl/src/compression/README.md.
The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.
More information can be found in
src/adts/README.md
Add the option to set the next start time on a job in the
alter job schedule function. This also adds the ability
to pause jobs by setting next_start to 'infinity'
Also fix the enterprise licence check to only activate for
enterprise jobs.
This maintenance release contains bugfixes since the 1.4.1 release.
We deem it medium priority for upgrading.
In particular the fixes contained in this maintenance release address
2 potential segfaults and no other security vulnerabilities.
The bugfixes are related to background workers, OUTER JOINs, ordered
append on space partitioned hypertables and expression indexes.
Adds a move_chunk function which to a different tablespace. This is
implemented as an extension to the reorder command.
Given that the heap, toast tables, and indexes are being rewritten
during the reorder operation, adding the ability to modify the tablespace
is relatively simple and mostly requires adding parameters to the relevant
functions for the destination tablespace (and index tablespace). The tests
do not focus on further exercising the reorder infrastructure, but instead
ensure that tablespace movement and permissions checks properly occur.
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.
The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
This maintenance release contains bugfixes since the 1.4.0 release. We deem it medium
priority for upgrading.
In particular the fixes contained in this maintenance release address 2 potential
segfaults and no other security vulnerabilities. The bugfixes are related to queries
with prepared statements, PL/pgSQL functions and interoperability with other extensions.
More details below.
**Bugfixes**
* #1362 Fix ConstraintAwareAppend subquery exclusion
* #1363 Mark drop_chunks as VOLATILE and not PARALLEL SAFE
* #1369 Fix ChunkAppend with prepared statements
* #1373 Only allow PARAM_EXTERN as time_bucket_gapfill arguments
* #1380 Handle Result nodes gracefully in ChunkAppend
**Thanks**
* @overhacked for reporting an issue with drop_chunks and parallel queries
* @fvannee for reporting an issue with ConstraintAwareAppend and subqueries
* @rrb3942 for reporting a segfault with ChunkAppend and prepared statements
* @mchesser for reporting a segfault with time_bucket_gapfill and subqueries
* @lolizeppelin for reporting and helping debug an issue with ChunkAppend and Result nodes
Previously, drop_chunks returned an empty table, giving the user
no indication of what (if anything) had happened.
Now, drop_chunks returns a list of the chunks identifiers in the
same style as show_chunks, with the chunk's schema and table name.
Notably, when show_chunks is called directly before drop_chunks, the
output should be the same.
This release contains major new functionality for continuous aggregates
and adds performance improvements for analytical queries.
In version 1.3.0 we added support for continuous aggregates which
was initially limited to one continuous aggregate per hypertable.
With this release, we remove this restriction and allow multiple
continuous aggregates per hypertable.
This release adds a new custom node ChunkAppend that can perform
execution time constraint exclusion and is also used for ordered
append. Ordered append no longer requires a LIMIT clause and now
supports space partitioning and ordering by time_bucket.
The primary key on continuous_aggs_materialization_invalidation_log
prevents multiple records with the same materialization id. Remove
the primary key to fix this problem.
Previously, returns full report even if telemetry is disabled.
Now, reassures user telemetry is disabled and provides the option
to view the report locally.
The following functions have had permission checks
added or adjusted:
ts_chunk_index_clone
ts_chunk_index_replace
ts_hypertable_insert_blocker_trigger_add
ts_current_license_key
ts_calculate_chunk_interval
ts_chunk_adaptive_set
The following functions have been removed from the regular SQL install.
They are only installed and used in tests:
dimension_calculate_default_range_open
dimension_calculate_default_range_closed
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata. It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.
It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this. Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
We replace chunk_for_tuple with chunk_id_from_relid for getting
chunk id fields when materializing continuous aggs. The old
function required passing in the entire row. This was very slow
because a lot of data was passed around at execution time.
The new function just uses the internal `tableoid` attribute to
convert the table relid to a chunk_id. This is much more efficient.
We also add memoization to the new function because it is most often
called consecutively for the same chunk.
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
Add the query definition to
timescaledb_information.continuous_aggregates.
The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.
As an alternative we could save the original user query in our internal
catalogs. But this approach involves replicating a lot of postgres code
and causes portability problems.
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.
Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type
2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
This PR deletes related rows from the following tables
* completed_threshold
* invalidation threshold
* hypertable invalidation log
The latter two tables are only affected if no other continuous aggs
exist on the raw hyperatble.
This commit also adds locks to prevent concurrent raw table inserts
and any access to the materialization table when dropping caggs. It
also moves all locks to the beginning of the function so that the lock
order is easier to track and reason about.
Also added a few formatting fixes.
Add invalidation trigger for DML changes to the hypertable used in
the continuous aggregate query.
Also add user_view_query definition in continuous_agg catalog table.
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.
INSERT path:
On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
invalidated (that is, newly inserted, updated, or deleted) to
_timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
will be used to re-materialize these ranges, to ensure that the aggregate
is up-to-date. Currently these invalidations are recorded in by a trigger
_timescaledb_internal.continuous_agg_invalidation_trigger, which should be
added to the hypertable when the continuous aggregate is created. This trigger
stores a cache of min/max values per-hypertable, and on transaction commit
writes them to the log, if needed. At the moment, we consider them to always
be needed, unless we're in ReadCommitted mode or weaker, and the min
invalidated value is greater than the hypertable's invalidation threshold
(found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)
Materialization path:
Materialization currently happens in multiple phase: in phase 1 we determine
the timestamp at which we will end the new set of materializations, then we
update the hypertable's invalidation threshold to that point, and finally we
read the current invalidations, then materialize any invalidated rows, the new
range between the continuous aggregate's completed threshold (found in
_timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
invalidation threshold. After all of this is done we update the completed
threshold to the invalidation threshold. The portion of this protocol from
after the invalidations are read, until the completed threshold is written
(that is, actually materializing, and writing the completion threshold) is
included with this commit, with the remainder to follow in subsequent ones.
One important caveat is that since the thresholds are exclusive, we invalidate
all values _less_ than the invalidation threshold, and we store timevalue
as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
invalidated. To avoid this problem, we never materialize the time bucket
containing PG_INT64_MAX.
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:
1) The user view - This is the actual view queried by the enduser.
It is a query on top of the materialized hypertable and is
responsible for finalizing and combining partials in a manner
that return to the user the data as defined by the original
user-defined view.
2) The partial view - which queries the raw table and returns
columns as defined in the materialized table. This will be used
by the materializer to calculate the data that will be inserted
into the materialization table. Note the data here is the partial
state of any aggregates.