69 Commits

Author SHA1 Message Date
Erik Nordström
40367d2dbf Fix check constraint on hypertable metadata table
The `replication_factor` is set to `-1` on hypertables that are
created on data nodes as part of a larger distributed
hypertable. However, the check constraint on the hypertable metadata
table doesn't allow such values, causing update scripts to fail when
this check constraint is recreated as part of updating to version
`2.0.0-rc4`.

The reason it is possible to insert violating rows is because check
constraints aren't validated when inserting data using PostgreSQL's
internal catalog functions (in C). Therefore, the violating row can
exist until one tries to update a data node to `2.0.0-rc4`, at which
point the update script tries to recreate the `hypertable` metadata
table due to other changes that were made to the table.

This change fixes the check constraint to account for `-1` as a valid
value, and also changes the update scripts to account for the new
check constraint so that updates to the latest version will no longer
fail.
2020-12-21 12:31:43 +01:00
gayyappan
7c76fd4d09 Save compression settings on access node for distributed hypertables
1. Add compression_state column for hypertable catalog
by renaming compressed column for the hypertable catalog
table. compression_state is a tri-state column.
This column indicates if the hypertable has
compression enabled (value = 1) or if it is an internal
compression table (value = 2).

2. Save compression settings on access node when compression
is turned on for a distributed hypertable
For a distributed hypertable, that has compression enabled,
compression_state is set. We don't create any internal tables
on the access node.

Fixes #2660
2020-12-02 10:42:57 -05:00
Mats Kindahl
0e507affc1 Remove modification time from invalidation log
The `modification_time` column is hard to maintain with any level of
consistency over merges and splits of invalidation ranges so this
commit removes it from the invalidation log entries for both
hypertables and continuous aggregates. If the modification time is
needed in the future, we need to re-introduce it in a manner that can
maintain it over both merges and splits.

THe function `ts_get_now_internal` is also removed since it is not used
any more.

Part of #2521
2020-10-14 17:36:51 +02:00
Sven Klemm
ccfca446f2 Fix timescaledb_fdw function handling in update script
This patch splits the timescaledb_fdw sql file into two parts to
separate the idempotent parts from the non-idempotent ones so
the function definitions can be included in the regular update
script.
2020-10-13 14:59:27 +02:00
Sven Klemm
3f5872ec61 Run pg_format on SQL files 2020-10-05 21:33:42 +02:00
Sven Klemm
a1cf324063 Fix timescaledb_fdw sql script
Since CREATE FOREIGN DATA WRAPPER is not idempotent it must not be
grouped with the normal sql scripts but has to be in the pre_install
group.
2020-10-05 18:42:32 +02:00
Erik Nordström
519863f460 Remove catalog options for continuous aggregates
This change removes the catalog options `refresh_lag`,
`max_interval_per_job` and `ignore_invalidation_older_than`, which are
no longer used.

Closes #2396
2020-09-22 14:39:01 +02:00
Erik Nordström
5179447613 Remove completed threshold
The completed threshold in the TimescaleDB catalog is no longer used
by the refactored continuous aggregates, so it is removed.

Fixes #2178
2020-09-15 17:18:59 +02:00
Sven Klemm
4397e57497 Remove job_type from bgw_job table
Due to recent refactoring all policies now use the columns added
with the generic job support so the job_type column is no longer
needed.
2020-09-01 14:49:30 +02:00
Sven Klemm
d547d61516 Refactor continuous aggregate policy
This patch modifies the continuous aggregate policy to store its
configuration in the jobs table.
2020-08-11 22:57:02 +02:00
Sven Klemm
bb891cf4d2 Refactor retention policy
This patch changes the retention policy to store its configuration
in the bgw_job table and removes the bgw_policy_drop_chunks table.
2020-08-03 22:33:54 +02:00
Mats Kindahl
590446c6a7 Remove cascade_to_materialization parameter
The parameter `cascade_to_materialization` is removed from
`drop_chunks` and `add_drop_chunks_policy` as well as associated tables
and test functions.

Fixes #2137
2020-07-31 11:21:36 +02:00
Sven Klemm
0d5f1ffc83 Refactor compress chunk policy
This patch changes the compression policy to store its configuration
in the bgw_job table and removes the bgw_policy_compress_chunks table.
2020-07-30 19:58:37 +02:00
Sven Klemm
3e83577916 Refactor reorder policy
This patch changes the reorder policy to store it's configuration
in the bgw_job table and removes the bgw_policy_reorder table.
2020-07-29 12:07:13 +02:00
Sven Klemm
43f2c31b3e Add proc, hypertable index to bgw_job
This patch adds a proc_name, proc_schema, hypertable_id index to
bgw_job. 3 functions using the new index are added as well:
ts_bgw_job_find_by_proc
ts_bgw_job_find_by_hypertable_id
ts_bgw_job_find_by_proc_and_hypertable_id

These functions are required for migrating the existing policies
to store their configuration in bgw_job directly.
2020-07-27 20:17:56 +02:00
gayyappan
88f693887a Cleanup index on hypertable catalog table
Reorder schema_name + table_name index. Remove
unnecessary constraint.
2020-07-23 11:08:11 -04:00
Sven Klemm
2f2e5ae68b Change bgw_job catalog table to enable custom jobs
This patch adds the columns required for custom jobs to the bgw_job
catalog table.
2020-07-22 18:24:02 +02:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
Mats Kindahl
92b6c03e43 Remove cascade option from drop_chunks
This commit removes the `cascade` option from the function
`drop_chunks` and `add_drop_chunk_policy`, which will now never cascade
drops to dependent objects.  The tests are fixed accordingly and
verbosity turned up to ensure that the dependent objects are printed in
the error details.
2020-06-02 16:08:51 +02:00
Brian Rowe
79fb46456f Rename server to data node
The timescale clustering code so far has been written referring to the
remote databases as 'servers'.  This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest.  In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database.  Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.

As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes.  This change has updated the code to rename
those instances.
2020-05-27 17:31:09 +02:00
niksa
2fd99c6f4b Block new chunks on data nodes
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.

Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.

Here are some examples.

Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`

Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`

Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`

This change adds the `force` argument to `detach_server` as well.  If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
2020-05-27 17:31:09 +02:00
Matvey Arye
e7ba327f4c Add resolve and heal infrastructure for 2PC
This commit adds the ability to resolve whether or not 2PC
transactions have been committed or aborted and also adds a heal
function to resolve transactions that have been prepared but not
committed or rolled back.

This commit also removes the server id of the primary key on the
remote_txn table and adds another index. This was done because the
`remote_txn_persistent_record_exists` should not rely on the server
being contacted but should rather just check for the existance of the
id. This makes the resolution safe to setups where two frontend server
definitions point to the same database. While this may not be a
properly configured setup, it's better if the resolution process is
robust to this case.
2020-05-27 17:31:09 +02:00
Matvey Arye
0e109d209d Add tables for saving 2pc persistent records
The remote_txn table records commit decisions for 2pc transactions.
A successful 2pc transaction will have one row per remote connection
recorded in this table. In effect it is a mapping between the
distributed transaction and an identifier for each remote connection.

The records are needed to protect against crashes after a
frontend send a `COMMIT TRANSACTION` to one node
but not all nodes involved in the transaction. Towards this end,
the commitment of remote_txn rows represent a crash-safe irrevocable
promise that all participating datanodes will eventually get a `COMMIT
TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`.

The irrevocable nature of the commit of these records means that this
can only happen after the system is sure all participating transactions
will succeed. Thus it can only happen after all datanodes have succeeded
on a `PREPARE TRANSACTION` and will happen as part of the frontend's
transaction commit..
2020-05-27 17:31:09 +02:00
Matvey Arye
d2b4b6e22e Add remote transaction ID module
The remote transaction ID is used in two phase commit. It is the
identifier sent to the datanodes in PREPARE TRANSACTION and related
postgresql commands.

This is the first in a series of commits for adding two phase
commit support to our distributed txn infrastructure.
2020-05-27 17:31:09 +02:00
Erik Nordström
596be8cda1 Add mappings table for remote chunks
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.

The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.

Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
2020-05-27 17:31:09 +02:00
Erik Nordström
ece582d458 Add mappings table for remote hypertables
In a multi-node (clustering) setup, TimescaleDB needs to track which
remote servers have data for a particular distributed hypertable. It
also needs to know which servers to place new chunks on and to use in
queries against a distributed hypertable.

A new metadata table, `hypertable_server` is added to map a local
hypertable ID to a hypertable ID on a remote server. We require that
the remote hypertable has the same schema and name as the local
hypertable.

When a local server is removed (using `DROP SERVER` or our
`delete_server()`), all remote hypertable mappings for that server
should also be removed.
2020-05-27 17:31:09 +02:00
Sven Klemm
cbda1acd4f Record cagg view state in catalog
Record materialized_only state of continuous aggregate view in
catalog and show state in timescaledb_information.continuous_aggregates.
2020-04-14 06:57:33 +02:00
Matvey Arye
2c594ec6f9 Keep catalog rows for some dropped chunks
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).

If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
2019-12-30 09:10:44 -05:00
Matvey Arye
5eb047413b Allow drop_chunks while keeping continuous aggs
Allow dropping raw chunks on the raw hypertable while keeping
the continuous aggregate. This allows for downsampling data
and allows users to save on TCO. We only allow dropping
such data when the dropped data is older than the
`ignore_invalidation_older_than` parameter on all the associated
continuous aggs. This ensures that any modifications to the
region of data which was dropped should never be reflected
in the continuous agg and thus avoids semantic ambiguity
if chunks are dropped but then again recreated due to an
insert.

Before we drop a chunk we need to make sure to process any
continuous aggregate invalidations that were registed on
data inside the chunk. Thus we add an option to materialization
to perform materialization transactionally, to only process
invalidations, and to process invalidation only before a timestamp.

We fix drop_chunks and policy to properly process
`cascade_to_materialization` as a tri-state variable (unknown,
true, false); Existing policy rows should change false to NULL
(unknown) and true stays as true since it was explicitly set.
Remove the form data for bgw_policy_drop_chunk because there
is no good way to represent the tri-state variable in the
form data.

When dropping chunks with cascade_to_materialization = false, all
invalidations on the chunks are processed before dropping the chunk.
If we are so far behind that even the  completion threshold is inside
the chunks being dropped, we error. There are 2 reasons that we error:
1) We can't safely process new ranges transactionally without taking
   heavy weight locks and potentially locking the entire sytem
2) If a completion threshold is that far behind the system probably has
   some serious issues anyway.
2019-12-30 09:10:44 -05:00
Matvey Arye
08ad7b6612 Add ignore_invalidation_older_than to continuous aggs
We added a timescaledb.ignore_invalidation_older_than parameter for
continuous aggregatess. This parameter accept a time-interval (e.g. 1
month). if set, it limits the amount of time for which to process
invalidation. Thus, if
	timescaledb.ignore_invalidation_older_than = '1 month'
then any modifications for data older than 1 month from the current
timestamp at insert time will not cause updates to the continuous
aggregate. This limits the amount of work that a backfill can trigger.
This parameter must be >= 0. A value of 0 means that invalidations are
never processed.

When recording invalidations for the hypertable at insert time, we use
the maximum ignore_invalidation_older_than of any continuous agg attached
to the hypertable as a cutoff for whether to record the invalidation
at all. When materializing a particular continuous agg, we use that
aggs  ignore_invalidation_older_than cutoff. However we have to apply
that cutoff relative to the insert time not the materialization
time to make it easier for users to reason about. Therefore,
we record the insert time as part of the invalidation entry.
2019-12-04 15:47:03 -05:00
Matvey Arye
122856c1bd Fix update scripts for type functions
Type functions have to be CREATE OR REPLACED on every update
since they need to point to the correct .so. Thus,
split the type definitions into a pre, functions,
and post part and rerun the functions part on both
pre_install and on every update.
2019-11-11 17:10:13 -05:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Matvey Arye
0db50e7ffc Handle drops of compressed chunks/hypertables
This commit add handling for dropping of chunks and hypertables
in the presence of associated compressed objects. If the uncompressed
chunk/hypertable is dropped than drop the associated compressed object
using DROP_RESTRICT unless cascading is explicitly enabled.

Also add a compressed_chunk_id index on compressed tables for
figuring out whether a chunk is compressed or not.

Change a bunch of APIs to use DropBehavior instead of a cascade bool
to be more explicit.

Also test the drop chunks policy.
2019-10-29 19:02:58 -04:00
gayyappan
6e60d2614c Add compress chunks policy support
Add and drop compress chunks policy using bgw
infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
b9674600ae Add segment meta min/max
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.

We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
gayyappan
1f4689eca9 Record chunk sizes after compression
Compute chunk size before/after compressing a chunk and record in
catalog table.
2019-10-29 19:02:58 -04:00
gayyappan
44941f7bd2 Add UI for compress_chunks functionality
Add support for compress_chunks function.

This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.

The parsing code will most likely be changed to use PG raw_parser
function.
2019-10-29 19:02:58 -04:00
gayyappan
1c6aacc374 Add ability to create the compressed hypertable
This happens when compression is turned on for regular hypertables.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00
gayyappan
3edc016dfc Add catalog tables to support compression
This commit adds catalog tables that will be used by the
compression infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
7ea492f29e Add last_successful_finish to bgw_job_stats
This allows people to better monitor the bgw job health. It
indicates when the last time the job made progress was.
2019-10-15 19:14:14 -04:00
Narek Galstyan
62de29987b Add a notion of now for integer time columns
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.

The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
2019-08-19 23:23:28 +04:00
gayyappan
e9df3bc1b6 Fix continuous agg catalog table insert failure
The primary key on continuous_aggs_materialization_invalidation_log
prevents multiple records with the same materialization id. Remove
the primary key to fix this problem.
2019-07-08 14:53:36 -04:00
gayyappan
60cfe6cc90 Support for multiple continuous aggregates
Allow multiple continuous aggregates to be defined on a hypertable.
2019-06-24 17:05:49 -04:00
Brian Rowe
aeac52aef6 Rename telemetry_metadata table to just metadata
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata.  It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.

It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this.  Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
2019-05-17 17:04:42 -07:00
Joshua Lockerman
899cd0538d Allow scheduled drop_chunks to cascade to aggs
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
2019-04-30 15:46:49 -04:00
Joshua Lockerman
3895e5ce0e Add a setting for max an agg materializes per run
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
2019-04-26 13:08:00 -04:00
gayyappan
b8f9b91e60 Add user view query definition for cont aggs
Add the query definition to
timescaledb_information.continuous_aggregates.

The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.

As an alternative we could save the original user query in our internal
catalogs.  But this approach involves replicating a lot of postgres code
and causes portability problems.
2019-04-26 13:08:00 -04:00
Matvey Arye
dc0e250428 Add pg_dump/restore tests for continuous aggs
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.

Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type

2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
2019-04-26 13:08:00 -04:00