timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-15 18:13:18 +08:00

Author	SHA1	Message	Date
Nikhil Sontakke	293104add2	Use numrows_pre_compression in approx row count The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.	2023-12-04 22:26:56 +05:30
Sven Klemm	393cc68057	Simplify compress_chunk calls in tests Dont construct chunk names from internal catalog tables in tests but instead use show_chunks and the informational views.	2023-11-04 20:08:28 +01:00
Jan Nidzwetzki	f3b3e55693	Add sorted paths for compressed chunks This patch adds per-chunk sorted paths for compressed chunks. This has two advantages: (1) Make ChunkAppend possible. If at least one chunk of a hypertable is uncompressed, PostgreSQL will generate a MergeAppend path in generate_orderedappend_paths() / create_merge_append_path() due to the existing pathkeys of the index on the uncompressed chunk. If all chunks are compressed, no path keys are present and no MergeAppend path is generated by PostgreSQL. In that case, the ChunkAppend optimization cannot be used because MergeAppend path can be promoted in ts_chunk_append_path_create(). Adding a sorted path with pathkeys makes ChunkAppend possible for these queries. (2) Sorting on a per-chunk basis and merging / appending these results could be faster than sorting the whole input. Especially limit queries that use an ORDER BY that is compatible with the partitioning of the hypertable could be inefficiently executed otherwise. For example, an expensive query plan with a sort node on top of the append node could be chosen. Due to the sort node at the high level in the query plan and the missing ChunkAppend node (see (1)), all chunks are decompressed (instead of only the actually needed ones). This patch adds sorted versions of the DecompressChunkPaths. This ensures that pathkeys are present and PostgreSQL can generate a MergeAppend path that could be promoted into a ChunkAppend path. Fixes: #4223	2023-10-27 09:13:22 +02:00
Jan Nidzwetzki	d38b35b26e	Propagate parameter changes to child nodes This patch ensures that changed parameters are propagated to the child nodes of the DecompressChunk node before a rescan is executed. Fixes: #6069	2023-09-18 12:08:28 +02:00
Jan Nidzwetzki	ba9b81854c	Support for partial aggregations at chunk level This patch adds support for partial aggregations at the chunk level. The aggregation is replanned in the create_upper_paths_hook of PostgreSQL. The AggPath is split up into multiple AGGSPLIT_INITIAL_SERIAL operations (one on top of each chunk), which create partials, and one AGGSPLIT_FINAL_DESERIAL operation, which finalizes the aggregation.	2023-09-14 09:30:23 +02:00
Sven Klemm	0da18a93b5	Move chunk functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - calculate_chunk_interval(int, bigint, bigint) - chunk_status(regclass) - chunks_in(record, integer[]) - chunk_id_from_relid(oid) - show_chunk(regclass) - create_chunk(regclass, jsonb, name, name, regclass) - set_chunk_default_data_node(regclass, name) - get_chunk_relstats(regclass) - get_chunk_colstats(regclass) - create_chunk_table(regclass, jsonb, name, name) - freeze_chunk(regclass) - unfreeze_chunk(regclass) - drop_chunk(regclass) - attach_osm_table_chunk(regclass, regclass)	2023-08-24 09:33:59 +02:00
Jan Nidzwetzki	9a2dfbfb83	Improved parallel DecompressChunk worker selection This PR improves the way the number of parallel workers for the DecompressChunk node are calculated. Since 1a93c2d482b50a43c105427ad99e6ecb58fcac7f, no partial paths for small relations are generated, which could cause a fallback to a sequential plan and a performance regression. This patch ensures that for all relations, a partial path is created again.	2023-08-10 12:38:14 +02:00
Jan Nidzwetzki	36e7100013	Fix duplicates on partially compressed chunk reads When the uncompressed part of a partially compressed chunk is read by a non-partial path and the compressed part by a partial path, the append node on top could process the uncompressed part multiple times because the path was declared as a partial path and the append node assumed it could be executed in all workers in parallel without producing duplicates. This PR fixes the declaration of the path.	2023-07-13 08:57:53 +02:00
Ante Kresic	fb0df1ae4e	Insert into indexes during chunk compression If there any indexes on the compressed chunk, insert into them while inserting the heap data rather than reindexing the relation at the end. This reduces the amount of locking on the compressed chunk indexes which created issues when merging chunks and should help with the future updates of compressed data.	2023-06-26 09:37:12 +02:00
Sotiris Stamokostas	8b10a6795c	Compression test changes for PG14.0 We have changed the compression test by disabling parallel append in some test cases because the regression test was falling only in PG14.0 but not in PG14.8 or any other PostgreSQL version	2023-06-12 10:44:41 +03:00
Sotiris Stamokostas	1a93c2d482	Improve parallel workers for decompression So far, we have set the number of desired workers for decompression to 1. If a query touches only one chunk, we end up with one worker in a parallel plan. Only if the query touches multiple chunks PostgreSQL spins up multiple workers. These workers could then be used to process the data of one chunk. This patch removes our custom worker calculation and relies on PostgreSQL logic to calculate the desired parallelity. Co-authored-by: Jan Kristof Nidzwetzki <jan@timescale.com>	2023-06-02 16:16:08 +03:00
Bharathy	769f9fe609	Fix segfault when deleting from compressed chunk During UPDATE/DELETE on compressed hypertables, we iterate over plan tree to collect all scan nodes. For each scan nodes there can be filter conditions. Prior to this patch we collect only first filter condition and use for first chunk which may be wrong. In this patch as and when we encounter a target scan node, we immediatly process those chunks. Fixes #5640	2023-05-03 23:19:26 +05:30
Ante Kresic	464d20fb41	Propagate vacuum/analyze to compressed chunks With recent changes, we enabled analyze on uncompressed chunk tables for compressed chunks. This change includes analyzing the compressed chunks table when analyzing the hypertable and its chunks, enabling us to remove the generating stats when compressing chunks.	2023-04-13 12:15:32 +02:00
Bharathy	1fb058b199	Support UPDATE/DELETE on compressed hypertables. This patch does following: 1. Executor changes to parse qual ExprState to check if SEGMENTBY column is specified in WHERE clause. 2. Based on step 1, we build scan keys. 3. Executor changes to do heapscan on compressed chunk based on scan keys and move only those rows which match the WHERE clause to staging area aka uncompressed chunk. 4. Mark affected chunk as partially compressed. 5. Perform regular UPDATE/DELETE operations on staging area. 6. Since there is no Custom Scan (HypertableModify) node for UPDATE/DELETE operations on PG versions < 14, we don't support this feature on PG12 and PG13.	2023-04-05 17:19:45 +05:30
shhnwz	699fcf48aa	Stats improvement for Uncompressed Chunks During the compression autovacuum use to be disabled for uncompressed chunk and enable after decompression. This leads to postgres maintainence issue. Let's not disable autovacuum for uncompressed chunk anymore. Let postgres take care of the stats in its natural way. Fixes #309	2023-03-22 23:51:13 +05:30
Bharathy	cc51e20e87	Add support for ON CONFLICT DO UPDATE for compressed hypertables This patch fixes execution of INSERT with ON CONFLICT DO UPDATE by removing error and allowing UPDATE do happen on the given compressed hypertable.	2023-03-20 22:55:27 +05:30
Sven Klemm	65562f02e8	Support unique constraints on compressed chunks This patch allows unique constraints on compressed chunks. When trying to INSERT into compressed chunks with unique constraints any potentially conflicting compressed batches will be decompressed to let postgres do constraint checking on the INSERT. With this patch only INSERT ON CONFLICT DO NOTHING will be supported. For decompression only segment by information is considered to determine conflicting batches. This will be enhanced in a follow-up patch to also include orderby metadata to require decompressing less batches.	2023-03-13 12:04:38 +01:00
Bharathy	9a2cbe30a1	Fix ChunkAppend, ConstraintAwareAppend child subplan When TidRangeScan is child of ChunkAppend or ConstraintAwareAppend node, an error is reported as "invalid child of chunk append: Node (26)". This patch fixes the issue by recognising TidRangeScan as a valid child. Fixes: #4872	2023-01-18 18:06:30 +05:30
Fabrízio de Royes Mello	a4356f342f	Remove trailing whitespaces from test code	2022-11-18 16:31:47 -03:00
Sven Klemm	131773a902	Reset compression sequence when group resets The sequence number of the compressed tuple is per segment by grouping and should be reset when the grouping changes to prevent overflows with many segmentby columns.	2022-08-15 13:34:00 +02:00
gayyappan	e5db6a9eec	Fix status for dropped chunks that have catalog entries Chunks that are dropped but preserve the catalog entries have an incorrect status when they are marked as dropped. This happens if the chunk was previously compressed and then gets dropped - the status in the catalog tuple reflects the compression status. This should be reset since the data is now dropped.	2022-01-31 17:39:39 -05:00
Sven Klemm	ff5d7e42bb	Adjust code to PG14 reltuples changes PG14 changes the initial value of pg_class.reltuples to -1 to allow differentiating between an empty relation and a relation where ANALYZE has not yet run. https://github.com/postgres/postgres/commit/3d351d916b	2021-06-29 16:35:35 +02:00
gayyappan	7c76fd4d09	Save compression settings on access node for distributed hypertables 1. Add compression_state column for hypertable catalog by renaming compressed column for the hypertable catalog table. compression_state is a tri-state column. This column indicates if the hypertable has compression enabled (value = 1) or if it is an internal compression table (value = 2). 2. Save compression settings on access node when compression is turned on for a distributed hypertable For a distributed hypertable, that has compression enabled, compression_state is set. We don't create any internal tables on the access node. Fixes #2660	2020-12-02 10:42:57 -05:00
gayyappan	05319cd424	Support analyze of internal compression table This commit modifies analyze behavior as follows: 1. When an internal compression table is analyzed, statistics from the compressed chunk (such as page count and tuple count) is used to update the statistics of the corresponding chunk parent, if it is missing. 2. Analyze compressed chunk instead of raw chunks When the command ANALYZE <hypertable> is executed, a) analyze uncompressed chunks and b) skip the raw chunk, but analyze the compressed chunk.	2020-11-11 15:05:14 -05:00
Brian Rowe	8a11b022bc	Exclude compressed chunks from ANALYZE/VACUUM This change makes sure that ANALYZE and VACUUM commands run without any relations will not clear the stats on compressed chunks that were saved at compression time. It also will skip any distributed tables. Fixes #2576	2020-10-20 09:18:39 -07:00
Brian Rowe	5acf3343b5	Ensure reltuples are preserved during compression This change captures the reltuples and relpages (and relallvisible) statistics from the pg_class table for chunks immediately before truncating them during the compression code path. It then restores the values after truncating, as there is no way to keep postgresql from clearing these values during this operation. It also properly uses these values properly during planning, working around some postgresql code which substitutes in arbitrary sizing for tables which don't see to hold data. Fixes #2524	2020-10-19 07:21:38 -07:00
Erik Nordström	4623db14ad	Use consistent column names in views Make all views that reference hypertables use `hypertable_schema` and `hypertable_name`.	2020-10-05 15:18:47 +02:00
Erik Nordström	202692f1ef	Make tests use the new continuous aggregate API Tests are updated to no longer use continuous aggregate options that will be removed, such as `refresh_lag`, `max_interval_per_job` and `ignore_invalidation_older_than`. `REFRESH MATERIALIZED VIEW` has also been replaced with `CALL refresh_continuous_aggregate()` using ranges that try to replicate the previous refresh behavior. The materializer test (`continuous_aggregate_materialize`) has been removed, since this tested the "old" materializer code, which is no longer used without `REFRESH MATERIALIZED VIEW`. The new API using `refresh_continuous_aggregate` already allows manual materialization and there are two previously added tests (`continuous_aggs_refresh` and `continuous_aggs_invalidate`) that cover the new refresh path in similar ways. When updated to use the new refresh API, some of the concurrency tests, like `continuous_aggs_insert` and `continuous_aggs_multi`, have slightly different concurrency behavior. This is explained by different and sometimes more conservative locking. For instance, the first transaction of a refresh serializes around an exclusive lock on the invalidation threshold table, even if no new threshold is written. The previous code, only took the heavier lock once, and if, a new threshold was written. This new, and stricter locking, means that insert processes that read the invalidation threshold will block for a short time when there are concurrent refreshes. However, since this blocking only occurs during the first transaction of the refresh (which is quite short), it probably doesn't matter too much in practice. The relaxing of locks to improve concurrency and performance can be implemented in the future.	2020-09-11 16:07:21 +02:00
Mats Kindahl	9565cbd0f7	Continuous aggregates support WITH NO DATA This commit will add support for `WITH NO DATA` when creating a continuous aggregate and will refresh the continuous aggregate when creating it unless `WITH NO DATA` is provided. All test cases are also updated to use `WITH NO DATA` and an additional test case for verifying that both `WITH DATA` and `WITH NO DATA` works as expected. Closes #2341	2020-09-11 14:02:41 +02:00
Dmitry Simonenko	e10b437712	Make hypertable_approximate_row_count return row count only This change renames function to approximate_row_count() and adds support for regular tables. Return a row count estimate for a table instead of a table list.	2020-09-02 12:18:34 +03:00
Mats Kindahl	c054b381c6	Change syntax for continuous aggregates We change the syntax for defining continuous aggregates to use `CREATE MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates a view, while `CREATE MATERIALIZED VIEW` creates a table. Raise an error if `CREATE VIEW` is used to create a continuous aggregate and redirect to `CREATE MATERIALIZED VIEW`. In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous aggregates and continuous aggregates cannot be dropped with `DROP VIEW`. Continuous aggregates are altered using `ALTER MATERIALIZED VIEW` rather than `ALTER VIEW`, so we ensure that it works for `ALTER MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to change a continuous aggregate. Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the partial view as well as with the direct view, so this is handled as a special case. Fixes #2233 Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com> Co-authored-by: Mats Kindahl <mats@timescale.com>	2020-08-27 17:16:10 +02:00
Brian Rowe	8e1e6036af	Preserve pg_stats on chunks before compression This change will ensure that the pg_statistics on a chunk are updated immediately prior to compression. It also ensures that these stats are not overwritten as part of a global or hypertable targetted ANALYZE. This addresses the issue that chunk will no longer generate valid statistics durings an ANALYZE once the data's been moved to the compressed table. Unfortunately any compressed rows will not be captured in the parent hypertable's pg_statistics as there is no way to change how PostGresQL samples child tables in PG11. This approach assumes that the compressed table remains static, which is mostly correct in the current implementation (though it is possible to remove compressed segments). Once we start allowing more operations on compressed chunks this solution will need to be revisited. Note that in PG12 an approach leveraging table access methods will not have a problem analyzing compressed tables.	2020-08-21 10:48:15 -07:00
gayyappan	9f13fb9906	Add functions for compression stats Add chunk_compression_stats and hypertable_compression_stats functions to get before/after compression sizes	2020-08-03 10:19:55 -04:00
gayyappan	7d3b4b5442	New size utils functions Add hypertable_detailed_size , chunk_detailed_size, hypertable_size functions. Remove hypertable_relation_size, hypertable_relation_size_pretty, and indexes_relation_size_pretty Remove size information from hypertables view.	2020-07-29 15:30:39 -04:00
gayyappan	926a1c9850	Add compression settings view Add informational view that lists the settings used while enabling compression on a hypertable.	2020-07-23 12:40:12 -04:00
Sven Klemm	3d1a7ca3ac	Fix delete on tables involving hypertables with compression The DML blocker to block INSERTs and UPDATEs on compressed hypertables would trigger if the UPDATE or DELETE referenced any hypertable with compressed chunks. This patch changes the logic to only block if the target of the UPDATE or DELETE is a compressed chunk.	2020-07-20 13:22:49 +02:00
Oleg Smirnov	0e9f1ee9f5	Enable compression for tables with compound foreign key When enabling compression on a hypertable the existing constraints are being cloned to the new compressed hypertable. During validation of existing constraints a loop through the conkey array is performed, and constraint name is erroneously added to the list multiple times. This fix moves the addition to the list outside the conkey loop. Fixes #2000	2020-07-02 12:22:30 +02:00
gayyappan	b93b30b0c2	Add counts to compression statistics Store information related to compressed and uncompressed row counts after compressing a chunk. This is saved in compression_chunk_size table.	2020-06-19 15:58:04 -04:00
Mats Kindahl	a089843ffd	Make table mandatory for drop_chunks The `drop_chunks` function is refactored to make table name mandatory for the function. As a result, the function was also refactored to accept the `regclass` type instead of table name plus schema name and the parameters were reordered to match the order for `show_chunks`. The commit also refactor the code to pass the hypertable structure between internal functions rather than the hypertable relid and moving error checks to the PostgreSQL function. This allow the internal functions to avoid some lookups and use the information in the structure directly and also give errors earlier instead of first dropping chunks and then error and roll back the transaction.	2020-06-17 06:56:50 +02:00
Stephen Polcyn	b57d2ac388	Cleanup TODOs and FIXMEs Unless otherwise listed, the TODO was converted to a comment or put into an issue tracker. test/sql/ - triggers.sql: Made required change tsl/test/ - CMakeLists.txt: TODO complete - bgw_policy.sql: TODO complete - continuous_aggs_materialize.sql: TODO complete - compression.sql: TODO complete - compression_algos.sql: TODO complete tsl/src/ - compression/compression.c: - row_compressor_decompress_row: Expected complete - compression/dictionary.c: FIXME complete - materialize.c: TODO complete - reorder.c: TODO complete - simple8b_rle.h: - compressor_finish: Removed (obsolete) src/ - extension.c: Removed due to age - adts/simplehash.h: TODOs are from copied Postgres code - adts/vec.h: TODO is non-significant - planner.c: Removed - process_utility.c - process_altertable_end_subcmd: Removed (PG will handle case)	2020-05-18 20:16:03 -04:00
Derek Marsh	88773323f4	Ignore dropped chunks in compressed_chunk_stats	2020-04-16 16:34:46 +02:00
Ruslan Fomkin	16897d2238	Drop FK constraints on chunk compression Drop Foreign Key constraints from uncompressed chunks during the compression. This allows to cascade data deletion in FK-referenced tables to compressed chunks. The foreign key constrains are restored during decompression.	2020-04-14 23:12:15 +02:00
Erik Nordström	afb4c7ba51	Refactor planner hooks This change refactors our main planner hooks in `planner.c` with the intention of providing a consistent way to classify planned relations across hooks. In our hooks, we'd like to know whether a planned relation (`RelOptInfo`) is one of the following: * Hypertable * Hypertable child (a hypertable can appear as a child of itself) * Chunk as a child of hypertable (from expansion) * Chunk as standalone (operation directly on chunk) * Any other relation Previously, there was no way to consistently know which of these one was dealing with. Instead, a mix of various functions was used without "remembering" the classification for reuse in later sections of the code. When classifying relations according to the above categories, the only source of truth about a relation is our catalog metadata. In case of hypertables, this is cached in the hypertable cache. However, this cache is read-through, so, in case of a cache miss, the metadata will always be scanned to resolve a new entry. To avoid unnecessary metadata scans, this change introduces a way to do cache-only queries. This requires maintaining a single warmed cache throughout planning and is enabled by using a planner-global cache object. The pre-planning query processing warms the cache by populating it with all hypertables in the to-be-planned query.	2020-04-14 23:12:15 +02:00
Ruslan Fomkin	bddcf2a78a	Enable collation test with compression Enables back the test of collation error as part of the compression test.	2020-04-14 23:12:15 +02:00
Joshua Lockerman	949b88ef2e	Initial support for PostgreSQL 12 This change includes a major refactoring to support PostgreSQL 12. Note that many tests aren't passing at this point. Changes include, but are not limited to: - Handle changes related to table access methods - New way to expand hypertables since expansion has changed in PostgreSQL 12 (more on this below). - Handle changes related to table expansion for UPDATE/DELETE - Fixes for various TimescaleDB optimizations that were affected by planner changes in PostgreSQL (gapfill, first/last, etc.) Before PostgreSQL 12, planning was organized something like as follows: 1. construct add `RelOptInfo` for base and appendrels 2. add restrict info, joins, etc. 3. perform the actual planning with `make_one_rel` For our optimizations we would expand hypertables in the middle of step 1; since nothing in the query planner before `make_one_rel` cared about the inheritance children, we didn’t have to be too precises about where we were doing it. However, with PG12, and the optimizations around declarative partitioning, PostgreSQL now does care about when the children are expanded, since it wants as much information as possible to perform partition-pruning. Now planning is organized like: 1. construct add RelOptInfo for base rels only 2. add restrict info, joins, etc. 3. expand appendrels, removing irrelevant declarative partitions 4. perform the actual planning with make_one_rel Step 3 always expands appendrels, so when we also expand them during step 1, the hypertable gets expanded twice, and things in the planner break. The changes to support PostgreSQL 12 attempts to solve this problem by keeping the hypertable root marked as a non-inheritance table until `make_one_rel` is called, and only then revealing to PostgreSQL that it does in fact have inheritance children. While this strategy entails the least code change on our end, the fact that the first hook we can use to re-enable inheritance is `set_rel_pathlist_hook` it does entail a number of annoyances: 1. this hook is called after the sizes of tables are calculated, so we must recalculate the sizes of all hypertables, as they will not have taken the chunk sizes into account 2. the table upon which the hook is called will have its paths planned under the assumption it has no inheritance children, so if it's a hypertable we have to replan it's paths Unfortunately, the code for doing these is static, so we need to copy them into our own codebase, instead of just using PostgreSQL's. In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also changed and are now planned in two stages: - In stage 1, the statement is planned as if it was a `SELECT` and all leaf tables are discovered. - In stage 2, the original query is planned against each leaf table, discovered in stage 1, directly, not part of an Append. Unfortunately, this means we cannot look in the appendrelinfo during UPDATE/DELETE planning, in particular to determine if a table is a chunk, as the appendrelinfo is not at the point we wish to do so initialized. This has consequences for how we identify operations on chunks (sometimes for blocking and something for enabling functionality).	2020-04-14 23:12:15 +02:00
Sven Klemm	039607dc1a	Add rescan function to CompressChunkDml CustomScan node The CompressChunkDml custom scan was missing a rescan function leading to a segfault in plans that required a rescan.	2020-03-25 01:37:53 +01:00
Erik Nordström	474db5e448	Fix continuous aggs DDL test on PG9.6 The test `continuous_aggs_ddl` failed on PostgreSQL 9.6 because it had a line that tested compression on a hypertable when this feature is not supported in 9.6. This prohibited a large portion of the test to run on 9.6. This change moves the testing of compression on a continuous aggregate to the `compression` test instead, which only runs on supported PostgreSQL versions. A permission check on a view is also removed, since similar tests are already in the `continuous_aggs_permissions` tests. The permission check was the only thing that caused different output across PostgreSQL versions, so therefore the test no longer requires version-specific output files and has been simplified to use the same output file irrespective of PostgreSQL version.	2020-03-12 14:02:16 +01:00
Sven Klemm	030443a8e2	Fix compressing interval columns When trying to compress a chunk that had a column of datatype interval delta-delta compression would be selected for the column but our delta-delta compression does not support interval and would throw an errow when trying to compress a chunk. This PR changes the compression selected for interval to dictionary compression.	2020-03-06 21:44:31 +01:00
Matvey Arye	94f3cff709	Fix bug with parent table in decompression Fix bug with transparent decompression getting the hypertable parent table. This can happen with self-referencing updates. Fixes #1555	2020-01-07 17:33:01 -05:00
gayyappan	87786f1520	Add compressed table size to existing views Some information views report hypertable sizes. Include compressed table size in the calculation when applicable.	2019-10-29 19:02:58 -04:00

1 2

69 Commits