timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-16 18:43:18 +08:00

Author	SHA1	Message	Date
Mats Kindahl	5661ff1523	Add role-level security to job error log Since the job error log can contain information from many different sources and also from many different jobs it is important to ensure that visibility of the job error log entries is restricted to job owners. This commit extend the view `timescaledb_information.job_errors` with role-based checks so that a user can only see entries for jobs that she has permission to view and restrict the permissions to `_timescaledb_internal.job_errors` so that users only can view the job error log through the view. A special case is added so that the superuser and the database owner can see all log entries, even if there is no associated job id with the log entry. Closes #5217	2023-01-30 12:13:00 +01:00
Fabrízio de Royes Mello	3749953e97	Hierarchical Continuous Aggregates Enable users create Hierarchical Continuous Aggregates (aka Continuous Aggregates on top of another Continuous Aggregates). With this PR users can create levels of aggregation granularity in Continuous Aggregates making the refresh process even faster. A problem with this feature can be in upper levels we can end up with the "average of averages". But to get the "real average" we can rely on "stats_aggs" TimescaleDB Toolkit function that calculate and store the partials that can be finalized with other toolkit functions like "average" and "sum". Closes #1400	2022-11-18 14:34:18 -03:00
Erik Nordström	f13214891c	Add function to alter data nodes Add a new function, `alter_data_node()`, which can be used to change the data node's configuration originally set up via `add_data_node()` on the access node. The new functions introduces a new option "available" that allows configuring the availability of the data node. Setting `available=>false` means that the node should no longer be used for reads and writes. Only read "failover" is implemented as part of this change, however. To fail over reads, the alter data node function finds all the chunks for which the unavailable data node is the "primary" query target and "fails over" to a chunk replica on another data node instead. If some chunks do not have a replica to fail over to, a warning will be raised. When a data node is available again, the function can be used to switch back to using the data node for queries. Closes #2104	2022-11-11 13:59:42 +01:00
Konstantina Skovola	c54cf3ea56	Add job execution statistics to telemetry This patch adds two new fields to the telemetry report, `stats_by_job_type` and `errors_by_sqlerrcode`. Both report results grouped by job type (different types of policies or user defined action). The patch also adds a new field to the `bgw_job_stats` table, `total_duration_errors` to separate the duration of the failed runs from the duration of successful ones.	2022-11-04 11:06:01 +02:00
Ante Kresic	2475c1b92f	Roll up uncompressed chunks into compressed ones This change introduces a new option to the compression procedure which decouples the uncompressed chunk interval from the compressed chunk interval. It does this by allowing multiple uncompressed chunks into one compressed chunk as part of the compression procedure. The main use-case is to allow much smaller uncompressed chunks than compressed ones. This has several advantages: - Reduce the size of btrees on uncompressed data (thus allowing faster inserts because those indexes are memory-resident). - Decrease disk-space usage for uncompressed data. - Reduce number of chunks over historical data. From a UX point of view, we simple add a compression with clause option `compress_chunk_time_interval`. The user should set that according to their needs for constraint exclusion over historical data. Ideally, it should be a multiple of the uncompressed chunk interval and so we throw a warning if it is not.	2022-11-02 15:14:18 +01:00
Konstantina Skovola	54ed0d5c05	Introduce fixed schedules for background jobs Currently, the next start of a scheduled background job is calculated by adding the `schedule_interval` to its finish time. This does not allow scheduling jobs to execute at fixed times, as the next execution is "shifted" by the job duration. This commit introduces the option to execute a job on a fixed schedule instead. Users are expected to provide an initial_start parameter on which subsequent job executions are aligned. The next start is calculated by computing the next time_bucket of the finish time with initial_start origin. An `initial_start` parameter is added to the compression, retention, reorder and continuous aggregate `add_policy` signatures. By passing that upon policy creation users indicate the policy will execute on a fixed schedule, or drifting schedule if `initial_start` is not provided. To allow users to pick a drifting schedule when registering a UDA, an additional parameter `fixed_schedule` is added to `add_job` to allow users to specify the old behavior by setting it to false. Additionally, an optional TEXT parameter, `timezone`, is added to both add_job and add_policy signatures, to address the 1-hour shift in execution time caused by DST switches. As internally the next start of a fixed schedule job is calculated using time_bucket, the timezone parameter allows using timezone-aware buckets to calculate the next start.	2022-10-18 18:46:57 +03:00
Konstantina Skovola	9bd772de25	Add interface for troubleshooting job failures This commit gives more visibility into job failures by making the information regarding a job runtime error available in an extension table (`job_errors`) that users can directly query. This commit also adds an infromational view on top of the table for convenience. To prevent the `job_errors` table from growing too large, a retention job is also set up with a default retention interval of 1 month. The retention job is registered with a custom check function that requires that a valid "drop_after" interval be provided in the config field of the job.	2022-09-30 15:22:27 +02:00
Sven Klemm	6de979518d	Fix compression_chunk_size primary key The primary key for compression_chunk_size was defined as chunk_id, compressed_chunk_id but other places assumed chunk_id is actually unique and would error when it was not. Since it makes no sense to have multiple entries per chunk since that reference would be to a no longer existing chunk the primary key is changed to chunk_id only with this patch.	2022-09-08 22:28:20 +02:00
Fabrízio de Royes Mello	e34218ce29	Migrate Continuous Aggregates to the new format Timescale 2.7 released a new version of Continuous Aggregate (#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. When upgrading to Timescale 2.7, new created Continuous Aggregates are using the new format, but existing Continuous Aggregates keep using the format they were defined with. Created a procedure to upgrade existing Continuous Aggregates from the old format to the new format, by calling a simple procedure: test=# CALL cagg_migrate('conditions_summary_daily'); Closes #4424	2022-08-25 17:49:09 -03:00
Mats Kindahl	e0f3e17575	Use new validation functions Old patch was using old validation functions, but there are already validation functions that both read and validate the policy, so using those. Also removing the old `job_config_check` function since that is no longer use and instead adding a `job_config_check` that calls the checking function with the configuration.	2022-08-25 10:38:03 +03:00
gayyappan	6beda28965	Modify chunk exclusion to include OSM chunks OSM chunks manage their ranges and the timescale catalog has dummy ranges for these dimensions. So the chunk exclusion logic cannot rely on the timescaledb catalog metadata to exclude an OSM chunk.	2022-08-18 09:32:21 -04:00
gayyappan	847919a05f	Add osm_chunk field to chunk catalog table Setting this field to true indicates that this is an OSM chunk.	2022-08-18 09:32:21 -04:00
Erik Nordström	025bda6a81	Add stateful partition mappings Add a new metadata table `dimension_partition` which explicitly and statefully details how a space dimension is split into partitions, and (in the case of multi-node) which data nodes are responsible for storing chunks in each partition. Previously, partition and data nodes were assigned dynamically based on the current state when creating a chunk. This is the first in a series of changes that will add more advanced functionality over time. For now, the metadata table simply writes out what was previously computed dynamically in code. Future code changes will alter the behavior to do smarter updates to the partitions when, e.g., adding and removing data nodes. The idea of the `dimension_partition` table is to minimize changes in the partition to data node mappings across various events, such as changes in the number of data nodes, number of partitions, or the replication factor, which affect the mappings. For example, increasing the number of partitions from 3 to 4 currently leads to redefining all partition ranges and data node mappings to account for the new partition. Complete repartitioning can be disruptive to multi-node deployments. With stateful mappings, it is possible to split an existing partition without affecting the other partitions (similar to partitioning using consistent hashing). Note that the dimension partition table expresses the current state of space partitions; i.e., the space-dimension constraints and data nodes to be assigned to new chunks. Existing chunks are not affected by changes in the dimension partition table, although an external job could rewrite, move, or copy chunks as desired to comply with the current dimension partition state. As such, the dimension partition table represents the "desired" space partitioning state. Part of #4125	2022-08-02 11:38:32 +02:00
Fabrízio de Royes Mello	42f197e579	Explicit constraint names in schema definition In `src/ts_catalog/catalog.c` we explicit define some constraints and indexes names into `catalog_table_index_definitions` array, but in our pre-install SQL script for schema definition we don't, so let's be more explicit here and prevent future surprises.	2022-06-17 12:47:55 -03:00
Dmitry Simonenko	f1575bb4c3	Support moving compressed chunks between data nodes This change allows to copy or move compressed chunks between data nodes by including compressed chunk into the chunk copy command stages.	2022-05-18 22:14:50 +03:00
Fabrízio de Royes Mello	1e8d37b54e	Remove `chunk_id` from materialization hypertable First step to remove the re-aggregation for Continuous Aggregates is to remove the `chunk_id` from the materialization hypertable. Also added new metadata column named `finalized` to `continuous_cagg` catalog table in order to store information about the new following finalized version of Continuous Aggregates that will not need the partials anymore. This flag is important to maintain backward compatibility with previous Continuous Aggregate implementation that requires the `chunk_id` to refresh data properly.	2022-05-06 14:30:00 -03:00
Sven Klemm	a4081516ca	Append pg_temp to search_path Postgres will prepend pg_temp to the effective search_path if it is not present in the search_path. While pg_temp will never be used to look up functions or operators unless explicitly requested pg_temp will be used to look up relations. Putting pg_temp in search_path makes sure objects in pg_temp will be considered last and pg_temp cannot be used to mask existing objects.	2022-05-03 07:55:43 +02:00
Erik Nordström	c1cf067c4f	Improve restriction scanning during hypertable expansion Improve the performance of metadata scanning during hypertable expansion. When a hypertable is expanded to include all children chunks, only the chunks that match the query restrictions are included. To find the matching chunks, the planner first scans for all matching dimension slices. The chunks that reference those slices are the chunks to include in the expansion. This change optimizes the scanning for slices by avoiding repeated open/close of the dimension slice metadata table and index. At the same time, related dimension slice scanning functions have been refactored along the same line. An index on the chunk constraint metadata table is also changed to allow scanning on dimension_slice_id. Previously, dimension_slice_id was the second key in the index, which made scans on this key less efficient.	2022-03-21 15:18:44 +01:00
Sven Klemm	6dddfaa54e	Lock down search_path in install scripts This patch locks down search_path in extension install and update scripts to only contain pg_catalog, this requires that any reference in those scripts is fully qualified. Additionally we add explicit create commands to all update scripts for objects added to the public schema. This change will make update scripts fail if a function with identical signature already exists when installing or upgrading instead reusing the existing object.	2022-02-09 17:53:20 +01:00
Sven Klemm	c8b8516e46	Fix extension installation privilege escalation TimescaleDB was vulnerable to a privilege escalation attack in the extension installation script. An attacker could precreate objects normally owned by the extension and get those objects used in the installation script since the script would only try to create them if they did not already exist. Thanks to Pedro Gallegos for reporting the problem. This patch changes the schema, table and function creation to fail and abort the installation when the object already exists instead of using the existing object. Security: CVE-2022-24128	2022-02-09 17:53:20 +01:00
Aleksander Alekseev	958040699c	Monthly buckets support in CAGGs This patch allows using time_bucket_ng("N month", ...) in CAGGs. Users can also specify years, or months AND years. CAGGs on top of distributed hypertables are supported as well.	2021-12-13 22:21:17 +03:00
Nikhil	2ffa1bf436	Implement cleanup for chunk copy/move A chunk copy/move operation is carried out in stages and it can fail in any of them. We track the last completed stage in the "chunk_copy_operation" catalog table. In case of failure, a "chunk_copy_cleanup" function can be invoked to bring the chunk back to its original state on the source datanode and all transient objects like replication slot, publication, subscription, empty chunk, metadata updates, etc are cleaned up. Includes test case changes for each and every stage induced failure. To avoid confusion between chunk copy activity and chunk copy operation this patch also consistently uses "operation" everywhere now instead of "activity"	2021-07-29 16:53:12 +03:00
Dmitry Simonenko	38c1781748	Copy/move chunk refactoring Remove copy_chunk_data() function and code needed to support it, such as the 'transactional' argument. Rework copy chunk logic using separate stages. Introduce copy_chunk() API function as an internal wrapper for the move_chunk().	2021-07-29 16:53:12 +03:00
Nikhil	f6b0250557	Implement wrapper API for copy/move chunk The building blocks required for implementing end-to-end copy/move chunk functionality have now been wrapped in a procedure. A procedure is required because multiple transactions are needed to carry out the activity across the access node and the involved two data nodes. The following steps are encapsulated in this procedure 1) Create an empty chunk table on the destination data node 2) Copy the data from the src data node chunk to this newly created destination node chunk. This is done via inbuilt PostgreSQL logical replication functionality 3) Attach this chunk to the hypertable on the dst data node 4) Remove this chunk from the src data node to complete the move if requested A new catalog table "chunk_copy_activity" has been added to track the progress of the above stages. A unique id gets assigned to each activity and it is updated with the completed stages as things progress.	2021-07-29 16:53:12 +03:00
Mats Kindahl	10e339f591	Add experimental schema This commit adds an experimental schema where experimental features can be added.	2021-06-04 08:28:27 +02:00
gayyappan	e0bff859e3	Add chunk_status column to catalog chunk table Add a new column chunk_status to _timescaledb_catalog.chunk.	2021-04-26 16:26:47 -04:00
Erik Nordström	40367d2dbf	Fix check constraint on hypertable metadata table The `replication_factor` is set to `-1` on hypertables that are created on data nodes as part of a larger distributed hypertable. However, the check constraint on the hypertable metadata table doesn't allow such values, causing update scripts to fail when this check constraint is recreated as part of updating to version `2.0.0-rc4`. The reason it is possible to insert violating rows is because check constraints aren't validated when inserting data using PostgreSQL's internal catalog functions (in C). Therefore, the violating row can exist until one tries to update a data node to `2.0.0-rc4`, at which point the update script tries to recreate the `hypertable` metadata table due to other changes that were made to the table. This change fixes the check constraint to account for `-1` as a valid value, and also changes the update scripts to account for the new check constraint so that updates to the latest version will no longer fail.	2020-12-21 12:31:43 +01:00
gayyappan	7c76fd4d09	Save compression settings on access node for distributed hypertables 1. Add compression_state column for hypertable catalog by renaming compressed column for the hypertable catalog table. compression_state is a tri-state column. This column indicates if the hypertable has compression enabled (value = 1) or if it is an internal compression table (value = 2). 2. Save compression settings on access node when compression is turned on for a distributed hypertable For a distributed hypertable, that has compression enabled, compression_state is set. We don't create any internal tables on the access node. Fixes #2660	2020-12-02 10:42:57 -05:00
Mats Kindahl	0e507affc1	Remove modification time from invalidation log The `modification_time` column is hard to maintain with any level of consistency over merges and splits of invalidation ranges so this commit removes it from the invalidation log entries for both hypertables and continuous aggregates. If the modification time is needed in the future, we need to re-introduce it in a manner that can maintain it over both merges and splits. THe function `ts_get_now_internal` is also removed since it is not used any more. Part of #2521	2020-10-14 17:36:51 +02:00
Sven Klemm	ccfca446f2	Fix timescaledb_fdw function handling in update script This patch splits the timescaledb_fdw sql file into two parts to separate the idempotent parts from the non-idempotent ones so the function definitions can be included in the regular update script.	2020-10-13 14:59:27 +02:00
Sven Klemm	3f5872ec61	Run pg_format on SQL files	2020-10-05 21:33:42 +02:00
Sven Klemm	a1cf324063	Fix timescaledb_fdw sql script Since CREATE FOREIGN DATA WRAPPER is not idempotent it must not be grouped with the normal sql scripts but has to be in the pre_install group.	2020-10-05 18:42:32 +02:00
Erik Nordström	519863f460	Remove catalog options for continuous aggregates This change removes the catalog options `refresh_lag`, `max_interval_per_job` and `ignore_invalidation_older_than`, which are no longer used. Closes #2396	2020-09-22 14:39:01 +02:00
Erik Nordström	5179447613	Remove completed threshold The completed threshold in the TimescaleDB catalog is no longer used by the refactored continuous aggregates, so it is removed. Fixes #2178	2020-09-15 17:18:59 +02:00
Sven Klemm	4397e57497	Remove job_type from bgw_job table Due to recent refactoring all policies now use the columns added with the generic job support so the job_type column is no longer needed.	2020-09-01 14:49:30 +02:00
Sven Klemm	d547d61516	Refactor continuous aggregate policy This patch modifies the continuous aggregate policy to store its configuration in the jobs table.	2020-08-11 22:57:02 +02:00
Sven Klemm	bb891cf4d2	Refactor retention policy This patch changes the retention policy to store its configuration in the bgw_job table and removes the bgw_policy_drop_chunks table.	2020-08-03 22:33:54 +02:00
Mats Kindahl	590446c6a7	Remove cascade_to_materialization parameter The parameter `cascade_to_materialization` is removed from `drop_chunks` and `add_drop_chunks_policy` as well as associated tables and test functions. Fixes #2137	2020-07-31 11:21:36 +02:00
Sven Klemm	0d5f1ffc83	Refactor compress chunk policy This patch changes the compression policy to store its configuration in the bgw_job table and removes the bgw_policy_compress_chunks table.	2020-07-30 19:58:37 +02:00
Sven Klemm	3e83577916	Refactor reorder policy This patch changes the reorder policy to store it's configuration in the bgw_job table and removes the bgw_policy_reorder table.	2020-07-29 12:07:13 +02:00
Sven Klemm	43f2c31b3e	Add proc, hypertable index to bgw_job This patch adds a proc_name, proc_schema, hypertable_id index to bgw_job. 3 functions using the new index are added as well: ts_bgw_job_find_by_proc ts_bgw_job_find_by_hypertable_id ts_bgw_job_find_by_proc_and_hypertable_id These functions are required for migrating the existing policies to store their configuration in bgw_job directly.	2020-07-27 20:17:56 +02:00
gayyappan	88f693887a	Cleanup index on hypertable catalog table Reorder schema_name + table_name index. Remove unnecessary constraint.	2020-07-23 11:08:11 -04:00
Sven Klemm	2f2e5ae68b	Change bgw_job catalog table to enable custom jobs This patch adds the columns required for custom jobs to the bgw_job catalog table.	2020-07-22 18:24:02 +02:00
gayyappan	b93b30b0c2	Add counts to compression statistics Store information related to compressed and uncompressed row counts after compressing a chunk. This is saved in compression_chunk_size table.	2020-06-19 15:58:04 -04:00
Mats Kindahl	92b6c03e43	Remove cascade option from drop_chunks This commit removes the `cascade` option from the function `drop_chunks` and `add_drop_chunk_policy`, which will now never cascade drops to dependent objects. The tests are fixed accordingly and verbosity turned up to ensure that the dependent objects are printed in the error details.	2020-06-02 16:08:51 +02:00
Brian Rowe	79fb46456f	Rename server to data node The timescale clustering code so far has been written referring to the remote databases as 'servers'. This terminology is a bit overloaded, and in particular we don't enforce any network topology limitations that the term 'server' would suggest. In light of this we've decided to change to use the term 'node' when referring to the different databases in a distributed database. Specifically we refer to the frontend as an 'access node' and to the backends as 'data nodes', though we may omit the access or data qualifier where it's unambiguous. As the vast bulk of the code so far has been written for the case where there was a single access node, almost all instances of 'server' were references to data nodes. This change has updated the code to rename those instances.	2020-05-27 17:31:09 +02:00
niksa	2fd99c6f4b	Block new chunks on data nodes This functionality enables users to block or allow creation of new chunks on a data node for one or more hypertables. Use cases for this include the ability to block new chunks when a data node is running low on disk space or to affect chunk distribution across data nodes. Sometimes blocking data nodes for new chunks can make a hypertable under-replicated. For that case an additional argument `force => true` can be supplied to force blocking new chunks. Here are some examples. Block for a specific hypertable: `SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');` Block for all hypertables on the server: `SELECT * FROM block_new_chunks_on_server('server_1', force =>true);` Unblock: `SELECT * FROM allow_new_chunks_on_server('server_1', true);` This change adds the `force` argument to `detach_server` as well. If detaching or blocking new chunks will make a hypertable under-replicated then `force => true` needs to used.	2020-05-27 17:31:09 +02:00
Matvey Arye	e7ba327f4c	Add resolve and heal infrastructure for 2PC This commit adds the ability to resolve whether or not 2PC transactions have been committed or aborted and also adds a heal function to resolve transactions that have been prepared but not committed or rolled back. This commit also removes the server id of the primary key on the remote_txn table and adds another index. This was done because the `remote_txn_persistent_record_exists` should not rely on the server being contacted but should rather just check for the existance of the id. This makes the resolution safe to setups where two frontend server definitions point to the same database. While this may not be a properly configured setup, it's better if the resolution process is robust to this case.	2020-05-27 17:31:09 +02:00
Matvey Arye	0e109d209d	Add tables for saving 2pc persistent records The remote_txn table records commit decisions for 2pc transactions. A successful 2pc transaction will have one row per remote connection recorded in this table. In effect it is a mapping between the distributed transaction and an identifier for each remote connection. The records are needed to protect against crashes after a frontend send a `COMMIT TRANSACTION` to one node but not all nodes involved in the transaction. Towards this end, the commitment of remote_txn rows represent a crash-safe irrevocable promise that all participating datanodes will eventually get a `COMMIT TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`. The irrevocable nature of the commit of these records means that this can only happen after the system is sure all participating transactions will succeed. Thus it can only happen after all datanodes have succeeded on a `PREPARE TRANSACTION` and will happen as part of the frontend's transaction commit..	2020-05-27 17:31:09 +02:00
Matvey Arye	d2b4b6e22e	Add remote transaction ID module The remote transaction ID is used in two phase commit. It is the identifier sent to the datanodes in PREPARE TRANSACTION and related postgresql commands. This is the first in a series of commits for adding two phase commit support to our distributed txn infrastructure.	2020-05-27 17:31:09 +02:00

1 2

95 Commits