timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-20 04:35:00 +08:00

Author	SHA1	Message	Date
Erik Nordström	a51d21efbe	Fix issue creating dimensional constraints During chunk creation, the chunk's dimensional CHECK constraints are created via an "upcall" to PL/pgSQL code. However, creating dimensional constraints in PL/pgSQL code sometimes fails, especially during high-concurrency inserts, because PL/pgSQL code scans metadata using a snapshot that might not see the same metadata as the C code. As a result, chunk creation sometimes fail during constraint creation. To fix this issue, implement dimensional CHECK-constraint creation in C code. Other constraints (FK, PK, etc.) are still created via an upcall, but should probably also be rewritten in C. However, since these constraints don't depend on recently updated metadata, this is left to a future change. Fixes #5456	2023-03-24 10:55:08 +01:00
Konstantina Skovola	72c0f5b25e	Rewrite recompress_chunk in C for segmentwise processing This patch introduces a C-function to perform the recompression at a finer granularity instead of decompressing and subsequently compressing the entire chunk. This improves performance for the following reasons: - it needs to sort less data at a time and - it avoids recreating the decompressed chunk and the heap inserts associated with that by decompressing each segment into a tuplesort instead. If no segmentby is specified when enabling compression or if an index does not exist on the compressed chunk then the operation is performed as before, decompressing and subsequently compressing the entire chunk.	2023-03-23 11:39:43 +02:00
Fabrízio de Royes Mello	38fcd1b76b	Improve Realtime Continuous Aggregate performance When calling the `cagg_watermark` function to get the watermark of a Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query in the underlying materialization hypertable. The problem is that a `SELECT MAX(time_dimention)` query can be expensive because it will scan all hypertable chunks increasing the planning time for a Realtime Continuous Aggregates. Improved it by creating a new catalog table to serve as a cache table to store the current Continous Aggregate watermark in the following situations: - Create CAgg: store the minimum value of hypertable time dimension data type; - Refresh CAgg: store the last value of the time dimension materialized in the underlying materialization hypertable (or the minimum value of materialization hypertable time dimension data type if there's no data materialized); - Drop CAgg Chunks: the same as refresh cagg. Closes #4699, #5307	2023-03-22 16:35:23 -03:00
Sven Klemm	03a799b874	Mention that new status values need handling in downgrade script When adding new status values we must make sure to add special handling for these values to the downgrade script as previous versions will not know how to deal with those.	2023-03-14 23:59:10 +01:00
Sven Klemm	65562f02e8	Support unique constraints on compressed chunks This patch allows unique constraints on compressed chunks. When trying to INSERT into compressed chunks with unique constraints any potentially conflicting compressed batches will be decompressed to let postgres do constraint checking on the INSERT. With this patch only INSERT ON CONFLICT DO NOTHING will be supported. For decompression only segment by information is considered to determine conflicting batches. This will be enhanced in a follow-up patch to also include orderby metadata to require decompressing less batches.	2023-03-13 12:04:38 +01:00
Ildar Musin	4c0075010d	Add hooks for hypertable drops To properly clean up the OSM catalog we need a way to reliably track hypertable deletion (including internal hypertables for CAGGS).	2023-03-06 15:10:49 +01:00
Sven Klemm	4527f51e7c	Refactor INSERT into compressed chunks This patch changes INSERTs into compressed chunks to no longer be immediately compressed but stored in the uncompressed chunk instead and later merged with the compressed chunk by a separate job. This greatly simplifies the INSERT-codepath as we no longer have to rewrite the target of INSERTs and on-the-fly compress leading to a roughly 2x improvement on INSERT rate into compressed chunk. Additionally this improves TRIGGER-support for INSERTs into compressed chunks. This is a necessary refactoring to allow UPSERT/UPDATE/DELETE on compressed chunks in follow-patches.	2022-12-21 12:53:29 +01:00
Sven Klemm	3b94b996f2	Use custom node to block frozen chunk modifications This patch changes the code that blocks frozen chunk modifications to no longer use triggers but to use custom node instead. Frozen chunks is a timescaledb internal object and should therefore not be protected by TRIGGER which is external and creates several hazards. TRIGGERs created to protect internal state contend with user-created triggers. The trigger created to protect frozen chunks does not work well with our restoring GUC which we use when restoring logical dumps. Thirdly triggers are not functional for any internal operations but are only working in code paths that explicitly added trigger support.	2022-11-25 19:56:48 +01:00
Nikhil Sontakke	c92e29ba3a	Fix DML HA in multi-node If a datanode goes down for whatever reason then DML activity to chunks residing on (or targeted to) that DN will start erroring out. We now handle this by marking the target chunk as "stale" for this DN by changing the metadata on the access node. This allows us to continue to do DML to replicas of the same chunk data on other DNs in the setup. This obviously will only work for chunks which have "replication_factor" > 1. Note that for chunks which do not have undergo any change will continue to carry the appropriate DN related metadata on the AN. This means that such "stale" chunks will become underreplicated and need to be re-balanced by using the copy_chunk functionality by a micro service or some such process. Fixes #4846	2022-11-25 17:42:26 +05:30
Dmitry Simonenko	5813173e07	Introduce drop_stale_chunks() function This function drops chunks on a specified data node if those chunks are not known by the access node. Call drop_stale_chunks() automatically when data node becomes available again. Fix #4848	2022-11-23 19:21:05 +02:00
Jan Nidzwetzki	380464df9b	Perform frozen chunk status check via trigger The commit 9f4dcea30135d1e36d1c452d631fc8b8743b3995 introduces frozen chunks. Checking whether a chunk is frozen or not has been done so far in the query planner. If it is not possible to determine which chunks are affected by a query in the planner (e.g., due to a cast in the WHERE condition), all chunks are checked. This leads (1) to an increased planning time and (2) to the situation that a single frozen chunk could reject queries, even if the frozen chunk is not addressed by the query.	2022-11-18 15:29:49 +01:00
gayyappan	b9ca06d6e3	Move freeze/unfreeze chunk to tsl Move code for freeze and unfreeze chunk to tsl directory.	2022-11-17 15:28:47 -05:00
Sven Klemm	3059290bea	Add new chunk state CHUNK_STATUS_COMPRESSED_PARTIAL A chunk is in this state when it is compressed but also has uncompressed data in the uncompressed chunk. Individual tuples can only ever exist in either area. This is preparation patch to add support for uncompressed staging area for DML operations.	2022-11-07 13:32:37 +01:00
Sven Klemm	f289ef8828	Remove unused function ts_chunk_is_uncompressed_or_unordered	2022-11-07 11:03:34 +01:00
Ante Kresic	2475c1b92f	Roll up uncompressed chunks into compressed ones This change introduces a new option to the compression procedure which decouples the uncompressed chunk interval from the compressed chunk interval. It does this by allowing multiple uncompressed chunks into one compressed chunk as part of the compression procedure. The main use-case is to allow much smaller uncompressed chunks than compressed ones. This has several advantages: - Reduce the size of btrees on uncompressed data (thus allowing faster inserts because those indexes are memory-resident). - Decrease disk-space usage for uncompressed data. - Reduce number of chunks over historical data. From a UX point of view, we simple add a compression with clause option `compress_chunk_time_interval`. The user should set that according to their needs for constraint exclusion over historical data. Ideally, it should be a multiple of the uncompressed chunk interval and so we throw a warning if it is not.	2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov	840f144e09	Enable and fix -Wclobbered The one in job_stat.c could probably lead to errors.	2022-11-01 18:01:26 +04:00
gayyappan	e08e0a59db	Add hook for chunk creation After data is tiered using OSM, we cannot insert data into the same range. Need a callback that can be invoked by timescaledb to check for range overlaps before creating a new chunk	2022-10-28 12:43:31 -04:00
Alexander Kuzmenkov	313845a882	Enable -Wextra Our code mostly has warnings about comparison with different signedness.	2022-10-27 16:06:58 +04:00
Alexander Kuzmenkov	864da20cee	Build on Ubuntu 22.04 It has newer GCC which should detect more warnings.	2022-10-26 23:32:05 +04:00
Alexander Kuzmenkov	4e47302c2c	Speed up chunk search by restriction clauses We don't have to look up the dimension slices for dimensions for which we don't have restrictions. Also sort chunks by ids before looking up the metadata, because this gives more favorable table access patterns (closer to sequential). This fixes a planning time regression introduced in 2.7.	2022-09-12 13:44:18 +03:00
gayyappan	7c55d0d5dc	Modify OSM chunk's constraint info in chunk catalog The OSM chunk registers a dummy primary dimension range in the TimescaleDB catalog. Use the max interval of the dimension instead of the min interval i.e use range like [Dec 31 294246 PST, infinity). Otherwise, policies can try to apply the policy on an OSM chunk. Add test with policies for OSM chunks	2022-08-24 17:14:36 -04:00
Alexander Kuzmenkov	51259b31c4	Fix OOM in large INSERTs Do not allocate various temporary data in PortalContext, such as the hyperspace point corresponding to the row, or the intermediate data required for chunk lookup.	2022-08-23 19:40:51 +03:00
gayyappan	6beda28965	Modify chunk exclusion to include OSM chunks OSM chunks manage their ranges and the timescale catalog has dummy ranges for these dimensions. So the chunk exclusion logic cannot rely on the timescaledb catalog metadata to exclude an OSM chunk.	2022-08-18 09:32:21 -04:00
gayyappan	847919a05f	Add osm_chunk field to chunk catalog table Setting this field to true indicates that this is an OSM chunk.	2022-08-18 09:32:21 -04:00
Fabrízio de Royes Mello	500c225999	Handle properly default privileges on CAggs If a default privilege is configured and applied to a given Continuous Aggregate during it creation just the user view has the ACL properly configured but the underlying materialization hypertable no leading to permission errors. Fixed it by copying the privileges from the user view to the materialization hypertable during the Continous Aggregate creation. Fixes #4555	2022-08-12 14:30:10 -03:00
gayyappan	95cc330e0c	Add inherited check constraints to OSM chunk When a table is added to an inheritance hierrachy, PG checks if all check constraints are present on this table. When a OSM chunk is added as a child of a hypertable with constraints, make sure that all check constraints are replicated on the child OSM chunk as well.	2022-08-10 10:20:14 -04:00
Erik Nordström	025bda6a81	Add stateful partition mappings Add a new metadata table `dimension_partition` which explicitly and statefully details how a space dimension is split into partitions, and (in the case of multi-node) which data nodes are responsible for storing chunks in each partition. Previously, partition and data nodes were assigned dynamically based on the current state when creating a chunk. This is the first in a series of changes that will add more advanced functionality over time. For now, the metadata table simply writes out what was previously computed dynamically in code. Future code changes will alter the behavior to do smarter updates to the partitions when, e.g., adding and removing data nodes. The idea of the `dimension_partition` table is to minimize changes in the partition to data node mappings across various events, such as changes in the number of data nodes, number of partitions, or the replication factor, which affect the mappings. For example, increasing the number of partitions from 3 to 4 currently leads to redefining all partition ranges and data node mappings to account for the new partition. Complete repartitioning can be disruptive to multi-node deployments. With stateful mappings, it is possible to split an existing partition without affecting the other partitions (similar to partitioning using consistent hashing). Note that the dimension partition table expresses the current state of space partitions; i.e., the space-dimension constraints and data nodes to be assigned to new chunks. Existing chunks are not affected by changes in the dimension partition table, although an external job could rewrite, move, or copy chunks as desired to comply with the current dimension partition state. As such, the dimension partition table represents the "desired" space partitioning state. Part of #4125	2022-08-02 11:38:32 +02:00
Dmitry Simonenko	65b5dc900f	Support add_dimension() with existing data This change allows to create new dimensions even with existing chunks. It does not modify any existing data or do migration, instead it creates full-range (-inf/inf) dimension slice for existing chunks in order to be compatible with newly created dimension. All new chunks created after this will follow logic of the new dimension and its partitioning. Fix: #2818	2022-08-01 10:52:03 +03:00
Sven Klemm	90c7c652b1	Fix chunk creation on hypertables with non-default statistics When triggering chunk creation on a hypertable with non-default statistics targets by a user different from the hypertable owner the chunk creation will fail with a permission error. This patch changes the chunk table creation to run the attribute modification as the table owner. Fixes #4474	2022-07-22 16:59:00 +02:00
gayyappan	6b0a9937c5	Fix attach_osm_table_chunk Add chunk inheritance when attaching a OSM tabel as a chunk of the hypertable	2022-07-21 19:27:24 -04:00
Jan Nidzwetzki	a608d7db61	Fix race conditions during chunk (de)compression This patch introduces a further check to compress_chunk_impl and decompress_chunk_impl. After all locks are acquired, a check is made to see if the chunk is still (un-)compressed. If the chunk was (de-)compressed while waiting for the locks, the (de-)compression operation is stopped. In addition, the chunk locks in decompress_chunk_impl are upgraded to AccessExclusiveLock to ensure the chunk is not deleted while other transactions are using it. Fixes: #4480	2022-07-05 15:13:10 +02:00
gayyappan	6c20e74674	Block drop chunk if chunk is in frozen state A chunk in frozen state cannot be dropped. drop_chunks will skip over frozen chunks without erroring. Internal api , drop_chunk will error if you attempt to drop a chunk without unfreezing it. This PR also adds a new internal API to unfreeze a chunk.	2022-06-30 09:56:50 -04:00
gayyappan	131f58ee60	Add internal api for foreign table chunk Add _timescaledb_internal.attach_osm_table_chunk. This treats a pre-existing foreign table as a hypertable chunk by adding dummy metadata to the catalog tables.	2022-06-23 10:11:56 -04:00
Sven Klemm	308ce8c47b	Fix various misspellings	2022-06-13 10:53:08 +02:00
Alexander Kuzmenkov	3c56d3eceb	Faster lookup of chunks by point Don't keep the chunk constraints while searching. The number of candidate chunks can be very large, so keeping these constraints is a lot of work and uses a lot of memory. For finding the matching chunk, it is enough to track the number of dimensions that matched a given chunk id. After finding the chunk id, we can look up only the matching chunk data with the usual function. This saves some work when doing INSERTs.	2022-06-07 18:10:20 +05:30
Erik Nordström	9b91665162	Fix crashes in functions using AlterTableInternal A number of TimescaleDB functions internally call `AlterTableInternal` to modify tables or indexes. For instance, `compress_chunk` and `attach_tablespace` act as DDL commands to modify hypertables. However, crashes occur when these functions are called via `SELECT * INTO FROM <function_name>` or the equivalent `CREATE TABLE AS` statement. The crashes happen because these statements are considered process utility commands and therefore sets up an event trigger context for collecting commands. However, the event trigger context is not properly set up to record alter table statements in this code path, thus causing the crashes. To prevent crashes, wrap `AlterTableInternal` with the event trigger functions to properly initialize the event trigger context.	2022-05-19 17:37:09 +02:00
gayyappan	5d56b1cdbc	Add api _timescaledb_internal.drop_chunk Add an internal api to drop a single chunk. This function drops the storage and metadata associated with the chunk. Note that chunk dependencies are not affected. e.g. Continuous aggs are not updated when this chunk is dropped.	2022-05-11 15:10:38 -04:00
gayyappan	9f4dcea301	Add _timescaledb_internal.freeze_chunk API This is an internal function to freeze a chunk for PG14 and later. This function sets a chunk status to frozen. Operations that modify the chunk data (like insert, update, delete) are not supported. Frozen chunks can be dropped. Additionally, chunk status is cached as part of classify_relation.	2022-05-10 14:00:32 -04:00
Alexander Kuzmenkov	935684c83a	Cache whether a rel is a chunk in classify_relation Use a per-query hash table for this. This speeds up the repeated calls to classify_relation by avoiding the costly chunk lookup.	2022-03-23 16:49:02 +05:30
Alexander Kuzmenkov	ae79ba6eb4	Scan less chunk metadata when planning ForeignModify Instead of loading the entire Chunk struct, just look up the data nodes.	2022-03-23 14:03:34 +05:30
Erik Nordström	c1cf067c4f	Improve restriction scanning during hypertable expansion Improve the performance of metadata scanning during hypertable expansion. When a hypertable is expanded to include all children chunks, only the chunks that match the query restrictions are included. To find the matching chunks, the planner first scans for all matching dimension slices. The chunks that reference those slices are the chunks to include in the expansion. This change optimizes the scanning for slices by avoiding repeated open/close of the dimension slice metadata table and index. At the same time, related dimension slice scanning functions have been refactored along the same line. An index on the chunk constraint metadata table is also changed to allow scanning on dimension_slice_id. Previously, dimension_slice_id was the second key in the index, which made scans on this key less efficient.	2022-03-21 15:18:44 +01:00
Erik Nordström	14deea6bd5	Improve chunk scan performance Chunk scan performance during querying is improved by avoiding repeated open and close of relations and indexes when joining chunk information from different metadata tables. When executing a query on a hypertable, it is expanded to include all its children chunks. However, during the expansion, the chunks that don't match the query constraints should also be excluded. The following changes are made to make the scanning and exclusion more efficient: * Ensure metadata relations and indexes are only opened once even though metadata for multiple chunks are scanned. This avoids doing repeated open and close of tables and indexes for each chunk scanned. * Avoid interleaving scans of different relations, ensuring better data locality, and having, e.g., indexes warm in cache. * Avoid unnecessary scans that repeat work already done. * Ensure chunks are locked in a consistent order (based on Oid). To enable the above changes, some refactoring was necessary. The chunk scans that happen during constraint exclusion are moved into separate source files (`chunk_scan.c`) for better structure and readability. Some test outputs are affected due to the new ordering of chunks in append relations.	2022-02-28 16:53:01 +01:00
gayyappan	264540610e	Fix tablespace for compressed chunk's index When a hypertable uses a non default tablespace, based on attach_tablespace settings, the compressed chunk's index is still created in the default tablespace. This PR fixes this behavior and creates the compressed chunk and its indexes in the same tablespace. When move_chunk is executed on a compressed chunk, move the indexes to the specified destination tablespace. Fixes #4000	2022-02-14 11:06:10 -05:00
gayyappan	e5db6a9eec	Fix status for dropped chunks that have catalog entries Chunks that are dropped but preserve the catalog entries have an incorrect status when they are marked as dropped. This happens if the chunk was previously compressed and then gets dropped - the status in the catalog tuple reflects the compression status. This should be reset since the data is now dropped.	2022-01-31 17:39:39 -05:00
Alexander Kuzmenkov	22f9cf689d	Don't leak Chunks in classify_relation This function is called often, at least 4 times per chunk, so these add up. Freeing these chunks allows us to save memory. Ideally, we should fix the function not to look up the chunk anew each time. Also make a couple other tweaks that reduce memory usage for planning.	2022-01-31 15:02:50 +03:00
gayyappan	9f64df8567	Add ts_catalog subdirectory Move files that are related to timescaledb catalog access to this subdirectory	2022-01-24 16:58:09 -05:00
Fabrízio de Royes Mello	342f848d90	Refactor invalidation log inclusion Commit 97c2578ffa6b08f733a75381defefc176c91826b overcomplicated the `invalidate_add_entry` API by adding parameters related to the remote function call for multi-node on materialization hypertables. Refactored it simplifying the function interface and adding a new function to deal with materialization hypertables on multi-node environment. Fixes #3833	2022-01-17 11:45:12 -03:00
Sven Klemm	d989e61b56	Improve show_chunks and drop_chunks error handling This patch fixes a segfault when calling show_chunks on internal compressed hypertable and a cache lookup failure for drop_chunks when calling on internal compressed hypertable.	2021-12-20 10:02:57 +01:00
gayyappan	d8d392914a	Support for compression on continuous aggregates Enable ALTER MATERIALIZED VIEW (timescaledb.compress) This enables compression on the underlying materialized hypertable. The segmentby and orderby columns for compression are based on the GROUP BY clause and time_bucket clause used while setting up the continuous aggregate. timescaledb_information.continuous_aggregate view defn change Add support for compression policy on continuous aggregates Move code from job.c to policy_utils.c Add support functions to check compression policy validity for continuous aggregates.	2021-12-17 10:51:33 -05:00
Mats Kindahl	aae19319c0	Rewrite recompress_chunk as procedure When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes #3846	2021-12-09 19:42:12 +01:00

1 2 3 4 5

237 Commits