timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-26 08:41:09 +08:00

Author	SHA1	Message	Date
Erik Nordström	40367d2dbf	Fix check constraint on hypertable metadata table The `replication_factor` is set to `-1` on hypertables that are created on data nodes as part of a larger distributed hypertable. However, the check constraint on the hypertable metadata table doesn't allow such values, causing update scripts to fail when this check constraint is recreated as part of updating to version `2.0.0-rc4`. The reason it is possible to insert violating rows is because check constraints aren't validated when inserting data using PostgreSQL's internal catalog functions (in C). Therefore, the violating row can exist until one tries to update a data node to `2.0.0-rc4`, at which point the update script tries to recreate the `hypertable` metadata table due to other changes that were made to the table. This change fixes the check constraint to account for `-1` as a valid value, and also changes the update scripts to account for the new check constraint so that updates to the latest version will no longer fail.	2020-12-21 12:31:43 +01:00
gayyappan	7c76fd4d09	Save compression settings on access node for distributed hypertables 1. Add compression_state column for hypertable catalog by renaming compressed column for the hypertable catalog table. compression_state is a tri-state column. This column indicates if the hypertable has compression enabled (value = 1) or if it is an internal compression table (value = 2). 2. Save compression settings on access node when compression is turned on for a distributed hypertable For a distributed hypertable, that has compression enabled, compression_state is set. We don't create any internal tables on the access node. Fixes #2660	2020-12-02 10:42:57 -05:00
Mats Kindahl	0e507affc1	Remove modification time from invalidation log The `modification_time` column is hard to maintain with any level of consistency over merges and splits of invalidation ranges so this commit removes it from the invalidation log entries for both hypertables and continuous aggregates. If the modification time is needed in the future, we need to re-introduce it in a manner that can maintain it over both merges and splits. THe function `ts_get_now_internal` is also removed since it is not used any more. Part of #2521	2020-10-14 17:36:51 +02:00
Sven Klemm	ccfca446f2	Fix timescaledb_fdw function handling in update script This patch splits the timescaledb_fdw sql file into two parts to separate the idempotent parts from the non-idempotent ones so the function definitions can be included in the regular update script.	2020-10-13 14:59:27 +02:00
Sven Klemm	3f5872ec61	Run pg_format on SQL files	2020-10-05 21:33:42 +02:00
Sven Klemm	a1cf324063	Fix timescaledb_fdw sql script Since CREATE FOREIGN DATA WRAPPER is not idempotent it must not be grouped with the normal sql scripts but has to be in the pre_install group.	2020-10-05 18:42:32 +02:00
Erik Nordström	519863f460	Remove catalog options for continuous aggregates This change removes the catalog options `refresh_lag`, `max_interval_per_job` and `ignore_invalidation_older_than`, which are no longer used. Closes #2396	2020-09-22 14:39:01 +02:00
Erik Nordström	5179447613	Remove completed threshold The completed threshold in the TimescaleDB catalog is no longer used by the refactored continuous aggregates, so it is removed. Fixes #2178	2020-09-15 17:18:59 +02:00
Sven Klemm	4397e57497	Remove job_type from bgw_job table Due to recent refactoring all policies now use the columns added with the generic job support so the job_type column is no longer needed.	2020-09-01 14:49:30 +02:00
Sven Klemm	d547d61516	Refactor continuous aggregate policy This patch modifies the continuous aggregate policy to store its configuration in the jobs table.	2020-08-11 22:57:02 +02:00
Sven Klemm	bb891cf4d2	Refactor retention policy This patch changes the retention policy to store its configuration in the bgw_job table and removes the bgw_policy_drop_chunks table.	2020-08-03 22:33:54 +02:00
Mats Kindahl	590446c6a7	Remove cascade_to_materialization parameter The parameter `cascade_to_materialization` is removed from `drop_chunks` and `add_drop_chunks_policy` as well as associated tables and test functions. Fixes #2137	2020-07-31 11:21:36 +02:00
Sven Klemm	0d5f1ffc83	Refactor compress chunk policy This patch changes the compression policy to store its configuration in the bgw_job table and removes the bgw_policy_compress_chunks table.	2020-07-30 19:58:37 +02:00
Sven Klemm	3e83577916	Refactor reorder policy This patch changes the reorder policy to store it's configuration in the bgw_job table and removes the bgw_policy_reorder table.	2020-07-29 12:07:13 +02:00
Sven Klemm	43f2c31b3e	Add proc, hypertable index to bgw_job This patch adds a proc_name, proc_schema, hypertable_id index to bgw_job. 3 functions using the new index are added as well: ts_bgw_job_find_by_proc ts_bgw_job_find_by_hypertable_id ts_bgw_job_find_by_proc_and_hypertable_id These functions are required for migrating the existing policies to store their configuration in bgw_job directly.	2020-07-27 20:17:56 +02:00
gayyappan	88f693887a	Cleanup index on hypertable catalog table Reorder schema_name + table_name index. Remove unnecessary constraint.	2020-07-23 11:08:11 -04:00
Sven Klemm	2f2e5ae68b	Change bgw_job catalog table to enable custom jobs This patch adds the columns required for custom jobs to the bgw_job catalog table.	2020-07-22 18:24:02 +02:00
gayyappan	b93b30b0c2	Add counts to compression statistics Store information related to compressed and uncompressed row counts after compressing a chunk. This is saved in compression_chunk_size table.	2020-06-19 15:58:04 -04:00
Mats Kindahl	92b6c03e43	Remove cascade option from drop_chunks This commit removes the `cascade` option from the function `drop_chunks` and `add_drop_chunk_policy`, which will now never cascade drops to dependent objects. The tests are fixed accordingly and verbosity turned up to ensure that the dependent objects are printed in the error details.	2020-06-02 16:08:51 +02:00
Brian Rowe	79fb46456f	Rename server to data node The timescale clustering code so far has been written referring to the remote databases as 'servers'. This terminology is a bit overloaded, and in particular we don't enforce any network topology limitations that the term 'server' would suggest. In light of this we've decided to change to use the term 'node' when referring to the different databases in a distributed database. Specifically we refer to the frontend as an 'access node' and to the backends as 'data nodes', though we may omit the access or data qualifier where it's unambiguous. As the vast bulk of the code so far has been written for the case where there was a single access node, almost all instances of 'server' were references to data nodes. This change has updated the code to rename those instances.	2020-05-27 17:31:09 +02:00
niksa	2fd99c6f4b	Block new chunks on data nodes This functionality enables users to block or allow creation of new chunks on a data node for one or more hypertables. Use cases for this include the ability to block new chunks when a data node is running low on disk space or to affect chunk distribution across data nodes. Sometimes blocking data nodes for new chunks can make a hypertable under-replicated. For that case an additional argument `force => true` can be supplied to force blocking new chunks. Here are some examples. Block for a specific hypertable: `SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');` Block for all hypertables on the server: `SELECT * FROM block_new_chunks_on_server('server_1', force =>true);` Unblock: `SELECT * FROM allow_new_chunks_on_server('server_1', true);` This change adds the `force` argument to `detach_server` as well. If detaching or blocking new chunks will make a hypertable under-replicated then `force => true` needs to used.	2020-05-27 17:31:09 +02:00
Matvey Arye	e7ba327f4c	Add resolve and heal infrastructure for 2PC This commit adds the ability to resolve whether or not 2PC transactions have been committed or aborted and also adds a heal function to resolve transactions that have been prepared but not committed or rolled back. This commit also removes the server id of the primary key on the remote_txn table and adds another index. This was done because the `remote_txn_persistent_record_exists` should not rely on the server being contacted but should rather just check for the existance of the id. This makes the resolution safe to setups where two frontend server definitions point to the same database. While this may not be a properly configured setup, it's better if the resolution process is robust to this case.	2020-05-27 17:31:09 +02:00
Matvey Arye	0e109d209d	Add tables for saving 2pc persistent records The remote_txn table records commit decisions for 2pc transactions. A successful 2pc transaction will have one row per remote connection recorded in this table. In effect it is a mapping between the distributed transaction and an identifier for each remote connection. The records are needed to protect against crashes after a frontend send a `COMMIT TRANSACTION` to one node but not all nodes involved in the transaction. Towards this end, the commitment of remote_txn rows represent a crash-safe irrevocable promise that all participating datanodes will eventually get a `COMMIT TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`. The irrevocable nature of the commit of these records means that this can only happen after the system is sure all participating transactions will succeed. Thus it can only happen after all datanodes have succeeded on a `PREPARE TRANSACTION` and will happen as part of the frontend's transaction commit..	2020-05-27 17:31:09 +02:00
Matvey Arye	d2b4b6e22e	Add remote transaction ID module The remote transaction ID is used in two phase commit. It is the identifier sent to the datanodes in PREPARE TRANSACTION and related postgresql commands. This is the first in a series of commits for adding two phase commit support to our distributed txn infrastructure.	2020-05-27 17:31:09 +02:00
Erik Nordström	596be8cda1	Add mappings table for remote chunks A frontend node will now maintain mappings from a local chunk to the corresponding remote chunks in a `chunk_server` table. The frontend creates local chunks as foreign tables and adds entries to `chunk_server` for each chunk it creates on remote data node. Currently, the creation of remote chunks is not implemented, so a dummy chunk_id for the remote chunk will be added instead for testing purposes.	2020-05-27 17:31:09 +02:00
Erik Nordström	ece582d458	Add mappings table for remote hypertables In a multi-node (clustering) setup, TimescaleDB needs to track which remote servers have data for a particular distributed hypertable. It also needs to know which servers to place new chunks on and to use in queries against a distributed hypertable. A new metadata table, `hypertable_server` is added to map a local hypertable ID to a hypertable ID on a remote server. We require that the remote hypertable has the same schema and name as the local hypertable. When a local server is removed (using `DROP SERVER` or our `delete_server()`), all remote hypertable mappings for that server should also be removed.	2020-05-27 17:31:09 +02:00
Sven Klemm	cbda1acd4f	Record cagg view state in catalog Record materialized_only state of continuous aggregate view in catalog and show state in timescaledb_information.continuous_aggregates.	2020-04-14 06:57:33 +02:00
Matvey Arye	2c594ec6f9	Keep catalog rows for some dropped chunks If a chunk is dropped but it has a continuous aggregate that is not dropped we want to preserve the chunk catalog row instead of deleting the row. This is to prevent dangling identifiers in the materialization hypertable. It also preserves the dimension slice and chunk constraints rows for the chunk since those will be necessary when enabling this with multinode and is necessary to recreate the chunk too. The postgres objects associated with the chunk are all dropped (table, constraints, indexes). If data is ever reinserted to the same data region, the chunk is recreated with the same dimension definitions as before. The postgres objects are simply recreated.	2019-12-30 09:10:44 -05:00
Matvey Arye	5eb047413b	Allow drop_chunks while keeping continuous aggs Allow dropping raw chunks on the raw hypertable while keeping the continuous aggregate. This allows for downsampling data and allows users to save on TCO. We only allow dropping such data when the dropped data is older than the `ignore_invalidation_older_than` parameter on all the associated continuous aggs. This ensures that any modifications to the region of data which was dropped should never be reflected in the continuous agg and thus avoids semantic ambiguity if chunks are dropped but then again recreated due to an insert. Before we drop a chunk we need to make sure to process any continuous aggregate invalidations that were registed on data inside the chunk. Thus we add an option to materialization to perform materialization transactionally, to only process invalidations, and to process invalidation only before a timestamp. We fix drop_chunks and policy to properly process `cascade_to_materialization` as a tri-state variable (unknown, true, false); Existing policy rows should change false to NULL (unknown) and true stays as true since it was explicitly set. Remove the form data for bgw_policy_drop_chunk because there is no good way to represent the tri-state variable in the form data. When dropping chunks with cascade_to_materialization = false, all invalidations on the chunks are processed before dropping the chunk. If we are so far behind that even the completion threshold is inside the chunks being dropped, we error. There are 2 reasons that we error: 1) We can't safely process new ranges transactionally without taking heavy weight locks and potentially locking the entire sytem 2) If a completion threshold is that far behind the system probably has some serious issues anyway.	2019-12-30 09:10:44 -05:00
Matvey Arye	08ad7b6612	Add ignore_invalidation_older_than to continuous aggs We added a timescaledb.ignore_invalidation_older_than parameter for continuous aggregatess. This parameter accept a time-interval (e.g. 1 month). if set, it limits the amount of time for which to process invalidation. Thus, if timescaledb.ignore_invalidation_older_than = '1 month' then any modifications for data older than 1 month from the current timestamp at insert time will not cause updates to the continuous aggregate. This limits the amount of work that a backfill can trigger. This parameter must be >= 0. A value of 0 means that invalidations are never processed. When recording invalidations for the hypertable at insert time, we use the maximum ignore_invalidation_older_than of any continuous agg attached to the hypertable as a cutoff for whether to record the invalidation at all. When materializing a particular continuous agg, we use that aggs ignore_invalidation_older_than cutoff. However we have to apply that cutoff relative to the insert time not the materialization time to make it easier for users to reason about. Therefore, we record the insert time as part of the invalidation entry.	2019-12-04 15:47:03 -05:00
Matvey Arye	122856c1bd	Fix update scripts for type functions Type functions have to be CREATE OR REPLACED on every update since they need to point to the correct .so. Thus, split the type definitions into a pre, functions, and post part and rerun the functions part on both pre_install and on every update.	2019-11-11 17:10:13 -05:00
Matvey Arye	0f3e74215a	Split segment meta min_max into two columns This simplifies the code and the access to the min/max metadata. Before we used a custom type, but now the min/max are just the same type as the underlying column and stored as two columns. This also removes the custom type that was used before.	2019-10-29 19:02:58 -04:00
Matvey Arye	0db50e7ffc	Handle drops of compressed chunks/hypertables This commit add handling for dropping of chunks and hypertables in the presence of associated compressed objects. If the uncompressed chunk/hypertable is dropped than drop the associated compressed object using DROP_RESTRICT unless cascading is explicitly enabled. Also add a compressed_chunk_id index on compressed tables for figuring out whether a chunk is compressed or not. Change a bunch of APIs to use DropBehavior instead of a cascade bool to be more explicit. Also test the drop chunks policy.	2019-10-29 19:02:58 -04:00
gayyappan	6e60d2614c	Add compress chunks policy support Add and drop compress chunks policy using bgw infrastructure.	2019-10-29 19:02:58 -04:00
Matvey Arye	b9674600ae	Add segment meta min/max Add the type for min/max segment meta object. Segment metadata objects keep metadata about data in segments (compressed rows). The min/max variant keeps the min and max values inside the compressed object. It will be used on compression order by columns to allow queries that have quals on those columns to be able to exclude entire segments if no uncompressed rows in the segment may match the qual. We also add generalized infrastructure for datum serialization / deserialization for arbitrary types to and from memory as well as binary strings.	2019-10-29 19:02:58 -04:00
Matvey Arye	a078781c2e	Add decompress_chunk function This is the opposite dual of compress_chunk.	2019-10-29 19:02:58 -04:00
gayyappan	1f4689eca9	Record chunk sizes after compression Compute chunk size before/after compressing a chunk and record in catalog table.	2019-10-29 19:02:58 -04:00
gayyappan	44941f7bd2	Add UI for compress_chunks functionality Add support for compress_chunks function. This also adds support for compress_orderby and compress_segmentby parameters in ALTER TABLE. These parameteres are used by the compress_chunks function. The parsing code will most likely be changed to use PG raw_parser function.	2019-10-29 19:02:58 -04:00
gayyappan	1c6aacc374	Add ability to create the compressed hypertable This happens when compression is turned on for regular hypertables.	2019-10-29 19:02:58 -04:00
Joshua Lockerman	584f5d1061	Implement time-series compression algorithms This commit introduces 4 compression algorithms as well as 3 ADTs to support them. The compression algorithms are time-series optimized. The following algorithms are implemented: - DeltaDelta compresses integer and timestamp values - Gorilla compresses floats - Dictionary compression handles any data type and is optimized for low-cardinality datasets. - Array stores any data type in an array-like structure and does not actually compress it (though TOAST-based compression can be applied on top). These compression algorithms are are fully described in tsl/src/compression/README.md. The Abstract Data Types that are implemented are - Vector - A dynamic vector that can store any type. - BitArray - A dynamic vector to store bits. - SimpleHash - A hash table implementation from PG12. More information can be found in src/adts/README.md	2019-10-29 19:02:58 -04:00
gayyappan	3edc016dfc	Add catalog tables to support compression This commit adds catalog tables that will be used by the compression infrastructure.	2019-10-29 19:02:58 -04:00
Matvey Arye	7ea492f29e	Add last_successful_finish to bgw_job_stats This allows people to better monitor the bgw job health. It indicates when the last time the job made progress was.	2019-10-15 19:14:14 -04:00
Narek Galstyan	62de29987b	Add a notion of now for integer time columns This commit implements functionality for users to give a custom definition of now() for integer open dimension typed hypertables. Such a now() function enables us to talk about intervals in the context of hypertables with integer time columns. In order to simplify future code. This commit defines a custom ts_interval type that unites the usual postgres intervals and integer time dimension intervals under a single composite type. The commit also enables adding drop chunks policy on hypertables with integer time dimensions if a custom now() function has been set.	2019-08-19 23:23:28 +04:00
gayyappan	e9df3bc1b6	Fix continuous agg catalog table insert failure The primary key on continuous_aggs_materialization_invalidation_log prevents multiple records with the same materialization id. Remove the primary key to fix this problem.	2019-07-08 14:53:36 -04:00
gayyappan	60cfe6cc90	Support for multiple continuous aggregates Allow multiple continuous aggregates to be defined on a hypertable.	2019-06-24 17:05:49 -04:00
Brian Rowe	aeac52aef6	Rename telemetry_metadata table to just metadata This change renames the _timescale_catalog.telemetry_metadata to _timescale_catalog.metadata. It also adds a new boolean column to this table which is used to flag data which should be included in telemetry. It also renamed the src/telemetry/metadata.{h,c} files to src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect this. Finally it also includes the logic to use the new boolean column when populating the telemetry parse state.	2019-05-17 17:04:42 -07:00
Joshua Lockerman	899cd0538d	Allow scheduled drop_chunks to cascade to aggs This commit adds a cascade_to_materializations flag to the scheduled version of drop_chunks that behaves much like the one from manual drop_chunks: if a hypertable that has a continuous aggregate tries to drop chunks, and this flag is not set, the chunks will not be dropped.	2019-04-30 15:46:49 -04:00
Joshua Lockerman	3895e5ce0e	Add a setting for max an agg materializes per run Add a setting max_materialized_per_run which can be set to prevent a continuous aggregate from materializing too much of the table in a single run. This will prevent a single run from locking the hypertable for too long, when running on a large data set.	2019-04-26 13:08:00 -04:00
gayyappan	b8f9b91e60	Add user view query definition for cont aggs Add the query definition to timescaledb_information.continuous_aggregates. The user query (specified in the CREATE VIEW stmt of a continuous aggregate) is transformed in the process of creating a continuous aggregate and this modified query is saved in the pg_rewrite catalog tables. In order to display the original query, we create an internal view which is a replica of the user query. This is used to display the definition in timescaledb_information.continuous_aggregates. As an alternative we could save the original user query in our internal catalogs. But this approach involves replicating a lot of postgres code and causes portability problems.	2019-04-26 13:08:00 -04:00
Matvey Arye	dc0e250428	Add pg_dump/restore tests for continuous aggs The data in caggs needs to survive dump/restore. This test makes sure that caggs that are materialized both before and after restore are correct. Two code changes were necessary to make this work: 1) the valid_job_type constraint on bgw_job needed to be altered to add 'continuous_aggregate' as a valid job type 2) The user_view_query field needed to be changed to a text because dump/restore does not support pg_node_tree.	2019-04-26 13:08:00 -04:00

1 2

69 Commits