timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-16 18:43:18 +08:00

Author	SHA1	Message	Date
Nikhil Sontakke	50bca31130	Add support for chunk column statistics tracking Allow users to specify that ranges (min/max values) be tracked for a specific column using the enable_column_stats() API. We will store such min/max ranges in a new timescaledb catalog table _timescaledb_catalog.chunk_column_stats. As of now we support tracking min/max ranges for smallint, int, bigint, serial, bigserial, date, timestamp, timestamptz data types. Support for other stats for bloom filters etc. will be added in the future. We add an entry of the form (ht_id, invalid_chunk_id, col, -INF, +INF) into this catalog to indicate that min/max values need to be calculated for this column in a given hypertable for chunks. We also iterate through existing chunks and add -INF, +INF entries for them in the catalog. This allows for selection of these chunks by default since no min/max values have been calculated for them. This actual min-max start/end range is calculated later. One of the entry points is during compression for now. The range is stored in start (inclusive) and end (exclusive) form. If DML happens into a compressed chunk then as part of marking it as partial, we also mark the corresponding catalog entries as "invalid". So partial chunks do not get excluded further. When recompression happens we get the new min/max ranges from the uncompressed portion and try to reconcile the ranges in the catalog based on these new values. This is safe to do in case of INSERTs and UPDATEs. In case of DELETEs, since we are deleting rows, it's possible that the min/max ranges change, but as of now we err on the side of caution and retain the earlier values which can be larger than the actual range. We can thus store the min/max values for such columns in this catalog table at the per-chunk level. Note that these min/max range values do not participate in partitioning of the data. Such data ranges will be used for chunk pruning if the WHERE clause of an SQL query specifies ranges on such a column. Note that Executor startup time chunk exclusion logic is also able to use this metadata effectively. A "DROP COLUMN" on a column with a statistics tracking enabled on it ends up removing all relevant entries from the catalog tables. A "decompress_chunk" on a compressed chunk removes its entries from the "chunk_column_stats" catalog table since now it's available for DML. Also a new "disable_column_stats" API has been introduced to allow removal of min/max entries from the catalog for a specific column.	2024-07-12 14:43:16 +05:30
Fabrízio de Royes Mello	cdfa1560e5	Refactor code for getting time bucket function Oid This is a small refactoring for getting time bucket function Oid from a view definition. It will be necessary for a following PRs for completely remove the uncessary catalog metadata table `continuous_aggs_bucket_function`. Also added a new SQL function `cagg_get_bucket_function_info` to return all `time_bucket` information based on a user view definition.	2024-06-26 10:33:23 -03:00
Fabrízio de Royes Mello	438736f6bd	Post release 2.15.1	2024-05-30 14:08:38 -03:00
Fabrízio de Royes Mello	8b994c717d	Remove regprocedure oid type from catalog In #6624 we refactored the time bucket catalog table to make it more generic and save information for all Continuous Aggregates. Previously it stored only variable bucket size information. The problem is we used the `regprocedure` type to store the OID of the given time bucket function but unfortunately it is not supported by `pg_upgrade`. Fixed it by changing the column to TEXT and resolve to/from OID using builtin `regprocedurein` and `format_procedure_qualified` functions. Fixes #6935	2024-05-22 11:01:56 -03:00
Fabrízio de Royes Mello	ca125cf620	Post-release changes for 2.15.0.	2024-05-07 16:44:43 -03:00
Sven Klemm	e298ecd532	Don't reuse job id We shouldnt reuse job ids to make it easy to recognize the job log entries for a job. We also need to keep the old job around to not break loading dumps from older versions.	2024-05-03 09:05:57 +02:00
Jan Nidzwetzki	f88899171f	Add migration for CAggs using time_bucket_ng The function time_bucket_ng is deprecated. This PR adds a migration path for existing CAggs. Since time_bucket and time_bucket_ng use different origin values, a custom origin is set if needed to let time_bucket create the same buckets as created by time_bucket_ng so far.	2024-04-25 16:08:48 +02:00
Fabrízio de Royes Mello	66c0702d3b	Refactor job execution history table In #6767 we introduced the ability to track job execution history including succeeded and failed jobs. The new metadata table `_timescaledb_internal.bgw_job_stat_history` has two JSONB columns `config` (store config information) and `error_data` (store the ErrorData information). The problem is that this approach is not flexible for future history recording changes so this PR refactor the current implementation to use only one JSONB column named `data` that will store more job information in that form: { "job": { "owner": "fabrizio", "proc_name": "error", "scheduled": true, "max_retries": -1, "max_runtime": "00:00:00", "proc_schema": "public", "retry_period": "00:05:00", "initial_start": "00:05:00", "fixed_schedule": true, "schedule_interval": "00:00:30" }, "config": { "bar": 1 }, "error_data": { "domain": "postgres-16", "lineno": 841, "context": "SQL statement \"SELECT 1/0\"\nPL/pgSQL function error(integer,jsonb) line 3 at PERFORM", "message": "division by zero", "filename": "int.c", "funcname": "int4div", "proc_name": "error", "sqlerrcode": "22012", "proc_schema": "public", "context_domain": "plpgsql-16" } }	2024-04-19 09:19:23 -03:00
Fabrízio de Royes Mello	52094a3103	Track job execution history In #4678 we added an interface for troubleshoting job failures by logging it in the metadata table `_timescaledb_internal.job_errors`. With this PR we extended the existing interface to also store succeeded executions. A new GUC named `timescaledb.enable_job_execution_logging` was added to control this new behavior and the default value is `false`. We renamed the metadata table to `_timescaledb_internal.bgw_job_stat_history` and added a new view `timescaledb_information.job_history` to users that have enough permissions can check the job execution history.	2024-04-04 10:39:28 -03:00
Jan Nidzwetzki	8dcb6eed99	Populate CAgg bucket catalog table for all CAggs This changes the behavior of the CAgg catalog tables. From now on, all CAggs that use a time_bucket function create an entry in the catalog table continuous_aggs_bucket_function. In addition, the duplicate bucket_width attribute is removed from the catalog table continuous_agg.	2024-03-13 16:40:56 +01:00
Sven Klemm	c87be4ab84	Remove get_chunk_colstats and get_chunk_relstats These 2 functions were used in the multinode context and are no longer used now.	2024-03-03 23:14:02 +01:00
Jan Nidzwetzki	fdf3aa3bfa	Use NULL in CAgg bucket function catalog table Historically, we have used an empty string for undefined values in the catalog table continuous_aggs_bucket_function. Since #6624, the optional arguments can be NULL. This patch cleans up the empty strings and changes the logic to work with NULL values.	2024-02-23 20:58:32 +01:00
Jan Nidzwetzki	b01c8e7377	Unify handling of CAgg bucket_origin So far, bucket_origin was defined as a Timestamp but used as a TimestampTz in many places. This commit changes this and unifies the usage of the variable.	2024-02-16 18:28:21 +01:00
Jan Nidzwetzki	ab7a09e876	Make CAgg time_bucket catalog table more generic The catalog table continuous_aggs_bucket_function is currently only used for variable bucket sizes. Information about the fixed-size buckets is stored in the table continuous_agg only. This causes some problems (e.g., we have redundant fields for the bucket_size, fixes size buckets with offsets are not supported, ...). This commit is the first in a row of commits that refactor the catalog for the CAgg time_bucket function. The goals are: * Remove the CAgg redundant attributes in the catalog * Create an entry in continuous_aggs_bucket_function for all CAggs that use time_bucket This first commit refactors the continuous_aggs_bucket_function table and prepares it for more generic use. Not all attributes are used yet, but these will change in follow-up PRs.	2024-02-16 15:39:49 +01:00
Fabrízio de Royes Mello	5a359ac660	Remove metadata when dropping chunk Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped when dropping chunks. In #4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unnecessary dropped chunk metadata in our catalog. Closes #6570	2024-02-16 10:45:04 -03:00
Sven Klemm	8d8f158302	2.14.1 post release Adjust update tests to include new version.	2024-02-15 06:15:59 +01:00
Sven Klemm	ea6d826c12	Add compression settings informational view This patch adds 2 new views hypertable_compression_settings and chunk_compression_settings to query the per chunk compression settings.	2024-02-13 07:33:37 +01:00
Ante Kresic	ba3ccc46db	Post-release fixes for 2.14.0 Bumping the previous version and adding tests for 2.14.0.	2024-02-12 09:32:40 +01:00
Sven Klemm	101e4c57ef	Add recompress optional argument to compress_chunk This patch deprecates the recompress_chunk procedure as all that functionality is covered by compress_chunk now. This patch also adds a new optional boolean argument to compress_chunk to force applying changed compression settings to existing compressed chunks.	2024-02-07 12:19:13 +01:00
Nikhil Sontakke	2b8f98c616	Support approximate hypertable size If a lot of chunks are involved then the current pl/pgsql function to compute the size of each chunk via a nested loop is pretty slow. Additionally, the current functionality makes a system call to get the file size on disk for each chunk everytime this function is called. That again slows things down. We now have an approximate function which is implemented in C to avoid the issues in the pl/pgsql function. Additionally, this function also uses per backend caching using the smgr layer to compute the approximate size cheaply. The PG cache invalidation clears off the cached size for a chunk when DML happens into it. That size cache is thus able to get the latest size in a matter of minutes. Also, due to the backend caching, any long running session will only fetch latest data for new or modified chunks and can use the cached data (which is calculated afresh the first time around) effectively for older chunks.	2024-02-01 13:25:41 +05:30
Nikhil Sontakke	c715d96aa4	Don't dump unnecessary extension tables Logging and caching related tables from the timescaledb extension should not be dumped using pg_dump. Our scripts specify a few such unwanted tables. Apart from being unnecessary, the "job_errors" had some restricted permissions causing additional problems in pg_dump. We now don't include such tables for dumping. Fixes #5449	2024-01-25 12:01:11 +05:30
Sven Klemm	0b23bab466	Include _timescaledb_catalog.metadata in dumps This patch changes the dump configuration for _timescaledb_catalog.metadata to include all entries. To allow loading logical dumps with this configuration an insert trigger is added that turns uniqueness conflicts into updates to not block the restore.	2024-01-23 12:53:48 +01:00
Matvey Arye	e89bc24af2	Add functions for determining compression defaults Add functions to help determine defaults for segment_by and order_by.	2024-01-22 08:10:23 -05:00
Sven Klemm	754f77e083	Remove chunks_in function This function was used to propagate chunk exclusion decisions from an access node to data nodes and is no longer needed with the removal of multinode.	2024-01-22 09:18:26 +01:00
Sven Klemm	f57d584dd2	Make compression settings per chunk This patch implements changes to the compressed hypertable to allow per chunk configuration. To enable this the compressed hypertable can no longer be in an inheritance tree as the schema of the compressed chunk is determined by the compression settings. While this patch implements all the underlying infrastructure changes, the restrictions for changing compression settings remain intact and will be lifted in a followup patch.	2024-01-17 12:53:07 +01:00
Mats Kindahl	662fcc1b1b	Make extension state available through function The extension state is not easily accessible in release builds, which makes debugging issue with the loader very difficult. This commit introduces a new schema `_timescaledb_debug` and makes the function `ts_extension_get_state` available also in release builds as `_timescaledb_debug.extension_state`. See #1682	2024-01-11 10:52:35 +01:00
Jan Nidzwetzki	df7a8fed6f	Post-release fixes for 2.13.1 Bumping the previous version and adding tests for 2.13.1	2024-01-09 16:31:07 +01:00
Sven Klemm	8f73f95c2a	Remove replication_factor field from _timescaledb_catalog.hypertable	2023-12-18 10:53:27 +01:00
Sven Klemm	11dd9af847	Remove multinode catalog objects This patch removes the following objects: tables: - _timescaledb_catalog.chunk_data_node - _timescaledb_catalog.dimension_partition - _timescaledb_catalog.hypertable_data_node - _timescaledb_catalog.remote_txn views: - timescaledb_information.data_nodes functions: - _timescaledb_functions.hypertable_remote_size - _timescaledb_functions.chunks_remote_size - _timescaledb_functions.indexes_remote_size - _timescaledb_functions.compressed_chunk_remote_stats	2023-12-18 10:53:27 +01:00
Sven Klemm	6395b249a9	Remove remote connection handling code Remove the code used by multinode to handle remote connections. This patch completely removes tsl/src/remote and any remaining distributed hypertable checks.	2023-12-15 19:13:08 +01:00
Sven Klemm	06867af966	Remove multinode functions from crossmodule struct This commit removes the multinode specific entries from the cross module function struct. It also removes the function set_chunk_default_data_node	2023-12-14 21:32:14 +01:00
Sven Klemm	11df1dd648	Remove experimental multinode functions This commit removes the following functions: - timescaledb_experimental.block_new_chunks - timescaledb_experimental.allow_new_chunks - timescaledb_experimental.subscription_exec - timescaledb_experimental.move_chunk - timescaledb_experimental.copy_chunk - timescaledb_experimental.cleanup_copy_chunk_operation	2023-12-13 23:38:32 +01:00
Sven Klemm	8a2029f569	Remove rxid type and distributed size util functions	2023-12-13 23:38:32 +01:00
Sven Klemm	19f1395191	Remove internal multinode ddl functions This commit removes the following functions: - _timescaledb_functions.create_chunk_replica_table - _timescaledb_functions.chunk_drop_replica - _timescaledb_functions.wait_subscription_sync - _timescaledb_functions.health - _timescaledb_functions.drop_stale_chunks	2023-12-13 23:38:32 +01:00
Sven Klemm	c914d19fac	Remove the timescaledb_fdw foreign data wrapper This is the fdw implementation that was used for communication between multinode instances.	2023-12-13 09:48:03 +01:00
Sven Klemm	36c36564a8	Refactor compression setting storage This patch drops the catalog table _timescaledb_catalog.hypertable_compression and stores those settings in _timescaledb_catalog.compression_settings instead. The storage format is changed and the new table will have 1 entry per relation instead of 1 entry per column and has no dependancy on hypertables. All other aspects of compression will remain the same. This is refactoring is to enable per chunk compression settings in a follow-up patch.	2023-12-12 21:45:33 +01:00
Sven Klemm	bc935ab2ca	Remove multinode public API This patch removes the following functions/procedures: - add_data_node - alter_data_node - attach_data_node - create_distributed_hypertable - create_distributed_restore_point - delete_data_node - detach_data_node - distributed_exec - set_replication_factor - _timescaledb_functions.ping_data_node - _timescaledb_functions.remote_txn_heal_data_node - _timescaledb_functions.set_dist_id - _timescaledb_functions.set_peer_dist_id - _timescaledb_functions.show_connection_cache - _timescaledb_functions.validate_as_data_node - _timescaledb_internal.ping_data_node - _timescaledb_internal.remote_txn_heal_data_node - _timescaledb_internal.set_dist_id - _timescaledb_internal.set_peer_dist_id - _timescaledb_internal.show_connection_cache - _timescaledb_internal.validate_as_data_node	2023-12-12 20:37:35 +01:00
Jan Nidzwetzki	65f681537d	Fix makeaclitem function creation	2023-11-29 21:49:17 +01:00
Jan Nidzwetzki	3b59a8a774	Post-release fixes for 2.13.0 Bumping the previous version and adding tests for 2.13.0.	2023-11-29 21:49:17 +01:00
Nikhil Sontakke	51d92a3638	Fix non-default tablespaces with constraints If a hypertable uses a non-default tablespace for its primary or unique constraints with additional DEFERRABLE or INITIALLY DEFERRED characteristics then any chunk creation will fail with syntax error. We now set the tablespace via a separate command for such constraints for the chunks. Fixes #6338	2023-11-23 19:09:48 +05:30
Nikhil Sontakke	44817252b5	Use creation time in retention/compression policy The retention and compression policies can now use drop_created_before and compress_created_before arguments respectively to specify chunk selection using their creation times. We don't support creation times for CAggs, yet.	2023-11-16 20:17:17 +05:30
Fabrízio de Royes Mello	3e08d21ace	Add SQL function cagg_validate_query With this function is possible to execute the Continuous Aggregate query validation over an arbitrary query string, without the need to actually create the Continuous Aggregate. It can be used, for example, to check for most frequent queries maybe using `pg_stat_statements`, validate them and check if there are queries that potenttialy can turned into a Continuous Aggregate.	2023-11-14 08:29:26 -03:00
Mats Kindahl	f2310216a8	Repair relacl on extension upgrade If users have accidentally been removed from `pg_authid` as a result of bugs where dropping a user did not revoke privileges from all tables where the had privileges, it will not be possible to create new chunks since these require the user to be found when copying the privileges for the parent table (either compressed hypertable or normal hypertable). To fix the situation, we repair the `pg_class` table when updating the extension by modifying the `relacl` for relations and remove any user that do not have an entry in `pg_authid`. A repair function `_timescaledb_functions.repair_relation_acls` is added that will perform the job. A `makeaclitem` from PG16 that accepts a list of comma and used as part of the repair is also added as `_timescaledb_functions.makeaclitem`.	2023-11-09 11:35:27 +01:00
Nikhil Sontakke	844807a374	Change show_chunks/drop_chunks using creation time - Updated show_chunks, drop_chunks APIs to get the affected chunks using chunk creation time metadata based on the "date/time/interval" like boundary specified for the INTEGER columns. - We honor "integer_now" function if it's specified so as to keep backwards compatibility with the existing behavior Co-authored-by: Dipesh Pandit <dipesh@timescale.com>	2023-11-02 18:37:09 +05:30
Jan Nidzwetzki	8767de658b	Reduce WAL activity by freezing tuples immediately When we compress a chunk, we create a new compressed chunk for storing the compressed data. So far, the tuples were just inserted into the compressed chunk and frozen by a later vacuum run. However, freezing tuples causes WAL activity can be optimized because the compressed chunk is created in the same transaction as the tuples. This patch reduces the WAL activity by storing these tuples directly as frozen and preventing a freeze operation in the future. This approach is similar to PostgreSQL's COPY FREEZE.	2023-10-25 13:27:07 +02:00
Dipesh Pandit	0b87a069e7	Add metadata for chunk creation time - Added creation_time attribute to timescaledb catalog table "chunk". Also, updated corresponding view timescaledb_information.chunks to include chunk_creation_time attribute. - A newly created chunk is assigned the creation time during chunk creation to handle new partition range for give dimension (Time/ SERIAL/BIGSERIAL/INT/...). - In case of an already existing chunk, the creation time is updated as part of running upgrade script. The current timestamp (now()) at the time of upgrade has been assigned as chunk creation time. - Similarly, downgrade script is updated to drop the attribute creation_time from catalog table "chunk". - All relevant queries/views/test output have been updated accordingly. Co-authored-by: Nikhil Sontakke <nikhil@timescale.com>	2023-10-04 14:49:05 +05:30
Dipesh Pandit	6019775ec5	Simplify hypertable DDL APIs The current hypertable creation interface is heavily focused on a time column, but since hypertables are focused on partitioning of not only time columns, we introduce a more generic API that support different types of keys for partitioning. The new interface introduced new versions of create_hypertable, add_dimension, and a replacement function `set_partitioning_interval` that replaces `set_chunk_time_interval`. The new functions accept an instance of dimension_info that can be constructed using constructor functions `by_range` and `by_hash`, allowing a more versatile and future-proof API. For examples: SELECT create_hypertable('conditions', by_range('time')); SELECT add_dimension('conditions', by_hash('device')); The old API remains, but will eventually be deprecated.	2023-09-28 08:14:30 +02:00
Sven Klemm	b339131c68	2.12.0 Post-release adjustments	2023-09-27 09:38:11 +02:00
Konstantina Skovola	646cecd14d	Add API function for updating OSM chunk ranges This commit introduces a function `hypertable_osm_range_update` in the _timescaledb_functions schema. This function is meant to serve as an API call for the OSM extension to update the time range of a hypertable's OSM chunk with the min and max values present in the contiguous time range its tiered chunks span. If the range is not contiguous, then it must be set to the invalid range an OSM chunk is assigned upon creation. A new status field is also introduced in the hypertable catalog table to keep track of whether the ranges covered by tiered and non-tiered chunks overlap. When there is no overlap detected then it is possible to apply the Ordered Append optimization in the presence of OSM chunks.	2023-09-15 23:21:52 +03:00
Sven Klemm	e4facda540	Add compatibility layer for _timescaledb_internal functions With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating.	2023-08-31 14:55:31 +02:00

1 2 3 4

156 Commits