373 Commits

Author SHA1 Message Date
gayyappan
72588a2382 Restrict constraints on compressed hypertables.
Primary and unqiue constraints are limited to segment_by and order_by
columns and foreign key constraints are limited to segment_by columns
when creating a compressed hypertable. There are no restrictions on
check constraints.
2019-10-29 19:02:58 -04:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
gayyappan
43aa49ddc0 Add more information in compression views
Rename compression views to compressed_hypertable_stats and
compressed_chunk_stats and summarize information about compression
status for chunks.
2019-10-29 19:02:58 -04:00
gayyappan
edd3999553 Add trigger to block INSERT on compressed chunk
Prevent insert on compressed chunks by adding a trigger that blocks it.
Enable insert if the chunk gets decompressed.
2019-10-29 19:02:58 -04:00
Matvey Arye
0db50e7ffc Handle drops of compressed chunks/hypertables
This commit add handling for dropping of chunks and hypertables
in the presence of associated compressed objects. If the uncompressed
chunk/hypertable is dropped than drop the associated compressed object
using DROP_RESTRICT unless cascading is explicitly enabled.

Also add a compressed_chunk_id index on compressed tables for
figuring out whether a chunk is compressed or not.

Change a bunch of APIs to use DropBehavior instead of a cascade bool
to be more explicit.

Also test the drop chunks policy.
2019-10-29 19:02:58 -04:00
Matvey Arye
2bf97e452d Push down quals to segment meta columns
This commit pushes down quals or order_by columns to make
use of the SegmentMetaMinMax objects. Namely =,<,<=,>,>= quals
can now be pushed down.

We also remove filters from decompress node for quals that
have been pushed down and don't need a recheck.

This commit also changes tests to add more segment by and
order-by columns.

Finally, we rename segment meta accessor functions to be smaller
2019-10-29 19:02:58 -04:00
gayyappan
6e60d2614c Add compress chunks policy support
Add and drop compress chunks policy using bgw
infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
b9674600ae Add segment meta min/max
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.

We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
gayyappan
7a728dc15f Add view for compression size
View for compressed_chunk_size and compressed_hypertable_size
2019-10-29 19:02:58 -04:00
gayyappan
1f4689eca9 Record chunk sizes after compression
Compute chunk size before/after compressing a chunk and record in
catalog table.
2019-10-29 19:02:58 -04:00
gayyappan
44941f7bd2 Add UI for compress_chunks functionality
Add support for compress_chunks function.

This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.

The parsing code will most likely be changed to use PG raw_parser
function.
2019-10-29 19:02:58 -04:00
gayyappan
1c6aacc374 Add ability to create the compressed hypertable
This happens when compression is turned on for regular hypertables.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00
gayyappan
3edc016dfc Add catalog tables to support compression
This commit adds catalog tables that will be used by the
compression infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
7ea492f29e Add last_successful_finish to bgw_job_stats
This allows people to better monitor the bgw job health. It
indicates when the last time the job made progress was.
2019-10-15 19:14:14 -04:00
Sven Klemm
d82ad2c8f6 Add ts_ prefix to all exported functions
This patch adds the `ts_` prefix to exported functions that didnt
have it and removes exports that are not needed.
2019-10-15 14:42:02 +02:00
Matvey Arye
2209133781 Add next_start option to alter_job_schedule
Add the option to set the next start time on a job in the
alter job schedule function. This also adds the ability
to pause jobs by setting next_start to 'infinity'

Also fix the enterprise licence check to only activate for
enterprise jobs.
2019-10-11 15:44:19 -04:00
Matvey Arye
d2f68cbd64 Move the set_integer_now func into Apache2
We decided this should be an OSS capability.
2019-10-11 13:00:55 -04:00
Sven Klemm
33a75ab09d Release 1.4.2
This maintenance release contains bugfixes since the 1.4.1 release.
We deem it medium priority for upgrading.

In particular the fixes contained in this maintenance release address
2 potential segfaults and no other security vulnerabilities.
The bugfixes are related to background workers, OUTER JOINs, ordered
append on space partitioned hypertables and expression indexes.
2019-09-10 10:23:15 +02:00
David Kohn
897fef42b6 Add support for moving chunks to different tablespaces
Adds a move_chunk function which to a different tablespace. This is
implemented as an extension to the reorder command.
Given that the heap, toast tables, and indexes are being rewritten
during the reorder operation, adding the ability to modify the tablespace
is relatively simple and mostly requires adding parameters to the relevant
functions for the destination tablespace (and index tablespace). The tests
do not focus on further exercising the reorder infrastructure, but instead
ensure that tablespace movement and permissions checks properly occur.
2019-08-21 12:07:28 -04:00
Narek Galstyan
62de29987b Add a notion of now for integer time columns
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.

The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
2019-08-19 23:23:28 +04:00
Sven Klemm
03d4ae03d6 Release 1.4.1
This maintenance release contains bugfixes since the 1.4.0 release. We deem it medium
priority for upgrading.

In particular the fixes contained in this maintenance release address 2 potential
segfaults and no other security vulnerabilities. The bugfixes are related to queries
with prepared statements, PL/pgSQL functions and interoperability with other extensions.
More details below.

**Bugfixes**
* #1362 Fix ConstraintAwareAppend subquery exclusion
* #1363 Mark drop_chunks as VOLATILE and not PARALLEL SAFE
* #1369 Fix ChunkAppend with prepared statements
* #1373 Only allow PARAM_EXTERN as time_bucket_gapfill arguments
* #1380 Handle Result nodes gracefully in ChunkAppend

**Thanks**
* @overhacked for reporting an issue with drop_chunks and parallel queries
* @fvannee for reporting an issue with ConstraintAwareAppend and subqueries
* @rrb3942 for reporting a segfault with ChunkAppend and prepared statements
* @mchesser for reporting a segfault with time_bucket_gapfill and subqueries
* @lolizeppelin for reporting and helping debug an issue with ChunkAppend and Result nodes
2019-07-31 21:00:44 +02:00
Sven Klemm
a11910b5d5 Mark drop_chunks as VOLATILE and PARALLEL UNSAFE
The drop_chunks function was incorrectly marked as stable and
parallel safe this patch fixes the attributes.
2019-07-22 07:29:33 +02:00
Stephen Polcyn
1dc1850793 Drop_chunks returns list of dropped chunks
Previously, drop_chunks returned an empty table, giving the user
no indication of what (if anything) had happened.
Now, drop_chunks returns a list of the chunks identifiers in the
same style as show_chunks, with the chunk's schema and table name.

Notably, when show_chunks is called directly before drop_chunks, the
output should be the same.
2019-07-19 12:13:24 -04:00
Sven Klemm
a2f2db9cab Release 1.4.0
This release contains major new functionality for continuous aggregates
and adds performance improvements for analytical queries.

In version 1.3.0 we added support for continuous aggregates which
was initially limited to one continuous aggregate per hypertable.
With this release, we remove this restriction and allow multiple
continuous aggregates per hypertable.

This release adds a new custom node ChunkAppend that can perform
execution time constraint exclusion and is also used for ordered
append. Ordered append no longer requires a LIMIT clause and now
supports space partitioning and ordering by time_bucket.
2019-07-17 18:23:13 +02:00
gayyappan
e9df3bc1b6 Fix continuous agg catalog table insert failure
The primary key on continuous_aggs_materialization_invalidation_log
prevents multiple records with the same materialization id. Remove
the primary key to fix this problem.
2019-07-08 14:53:36 -04:00
gayyappan
5a0a73eabd Add columns to continuous_aggregate_stats view
Add more information about job history for continuous aggregate
background worker jobs.
2019-07-08 12:54:10 -04:00
Stephen Polcyn
ff44b33327 Update get_telemetry_report to expected behavior
Previously, returns full report even if telemetry is disabled.
Now, reassures user telemetry is disabled and provides the option
to view the report locally.
2019-06-25 19:35:17 +02:00
Sven Klemm
743a22f1fa Add 1.3.2 to update test scripts 2019-06-25 05:11:32 +02:00
gayyappan
60cfe6cc90 Support for multiple continuous aggregates
Allow multiple continuous aggregates to be defined on a hypertable.
2019-06-24 17:05:49 -04:00
Matvey Arye
e049238a07 Adjust permissions on internal functions
The following functions have had permission checks
added or adjusted:
ts_chunk_index_clone
ts_chunk_index_replace
ts_hypertable_insert_blocker_trigger_add
ts_current_license_key
ts_calculate_chunk_interval
ts_chunk_adaptive_set

The following functions have been removed from the regular SQL install.
They are only installed and used in tests:

dimension_calculate_default_range_open
dimension_calculate_default_range_closed
2019-06-24 10:57:38 -04:00
Sven Klemm
70a02b5410 Add 1.3.1 to update test scripts 2019-06-10 21:15:56 +02:00
Matvey Arye
48d9e2ce25 Add CMAKE option for default telemetry setting
Adds a CMAKE option to turn telemetry off by default by specifying
-DSEND_TELEMETRY_DEFAULT=NO. The default is YES or on.
2019-06-10 08:51:57 -04:00
Brian Rowe
aeac52aef6 Rename telemetry_metadata table to just metadata
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata.  It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.

It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this.  Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
2019-05-17 17:04:42 -07:00
Sven Klemm
bfabb30be0 Release 1.3.0 2019-05-07 02:47:13 +02:00
Joshua Lockerman
899cd0538d Allow scheduled drop_chunks to cascade to aggs
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
2019-04-30 15:46:49 -04:00
Matvey Arye
74f8d204a5 Optimize getting the chunk_id in continuous aggs
We replace chunk_for_tuple with chunk_id_from_relid for getting
chunk id fields when materializing continuous aggs. The old
function required passing in the entire row. This was very slow
because a lot of data was passed around at execution time.

The new function just uses the internal `tableoid` attribute to
convert the table relid to a chunk_id. This is much more efficient.
We also add memoization to the new function because it is most often
called consecutively for the same chunk.
2019-04-29 15:45:23 -04:00
Joshua Lockerman
ae3480c2cb Fix continuous_aggs info
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
3895e5ce0e Add a setting for max an agg materializes per run
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
2019-04-26 13:08:00 -04:00
gayyappan
b8f9b91e60 Add user view query definition for cont aggs
Add the query definition to
timescaledb_information.continuous_aggregates.

The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.

As an alternative we could save the original user query in our internal
catalogs.  But this approach involves replicating a lot of postgres code
and causes portability problems.
2019-04-26 13:08:00 -04:00
Matvey Arye
dc0e250428 Add pg_dump/restore tests for continuous aggs
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.

Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type

2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
45fb1fc2c8 Handle drop_chunks on tables that have cont aggs
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
2019-04-26 13:08:00 -04:00
gayyappan
18d1607909 Add timescaledb_information views for continuous aggregates
Add timescaledb_information.continuous_aggregate_settings and timescaledb_information.continuous_aggregate_job_stats views
2019-04-26 13:08:00 -04:00
Matvey Arye
19d47daf23 Delete related catalog rows when continuous aggs are dropped
This PR deletes related rows from the following tables
* completed_threshold
* invalidation threshold
* hypertable invalidation log

The latter two tables are only affected if no other continuous aggs
exist on the raw hyperatble.

This commit also adds locks to prevent concurrent raw table inserts
and any access to the materialization table when dropping caggs. It
also moves all locks to the beginning of the function so that the lock
order is easier to track and reason about.

Also added a few formatting fixes.
2019-04-26 13:08:00 -04:00
gayyappan
1cbd8c74f7 Add invalidation trigger for continuous aggs
Add invalidation trigger for DML changes to the hypertable used in
the continuous aggregate query.

Also add user_view_query definition in continuous_agg catalog table.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
0737b370a3 Add the actual bgw job for continuous aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00
gayyappan
2dbc28df82 Create base infrastructure for continuous aggs
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:

1) The user view - This is the actual view queried by the enduser.
   It is a query on top of the materialized hypertable and is
   responsible for finalizing and combining partials in a manner
   that return to the user the data as defined by the original
   user-defined view.
2) The partial view - which queries the raw table and returns
   columns as defined in the materialized table. This will be used
   by the materializer to calculate the data that will be inserted
   into the materialization table. Note the data here is the partial
   state of any aggregates.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
1e486ef2a4 Fix ts_chunk_for_tuple performance
ts_chunk_for_tuple should use the chunk cache.
ts_chunk_for_tuple should be marked stable.
These fixes markedly improve performance.
2019-04-19 12:46:36 -04:00