timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-17 02:53:51 +08:00

Author	SHA1	Message	Date
Erik Nordström	c76a0cff68	Add parallel support for partialize_agg() Make `partialize_agg()` support parallel query execution. To make this work, the finalize node need combine the individual partials from each parallel worker, but the final step that turns the resulting partial into the finished aggregate should not happen. Thus, in the case of distributed hypertables, each data node can run a parallel query to compute a partial, and the access node can later combine and finalize these partials into the final aggregate. Esssentially, there will be one combine step (minus final) on each data node, and then another one plus final on the access node. To implement this, the finalize aggregate plan is simply modified to elide the final step, and to reserialize the partial. It is only possible to do this at the plan stage; if done at the path stage, the PostgreSQL planner will hit assertions that assume that the node has certain values (e.g., it doesn't expect combine Paths to skip the final step).	2023-03-08 14:14:25 +01:00
Konstantina Skovola	5a3cacd06f	Fix sub-second intervals in hierarchical caggs Previously we used date_part("epoch", interval) and integer division internally to determine whether the top cagg's interval is a multiple of its parent's. This led to precision loss and wrong results in the case of intervals with sub-second components. Fixed by using the `ts_interval_value_to_internal` function to convert intervals to appropriate integer representation for division. Fixes #5277	2023-03-07 13:25:49 +02:00
Dmitry Simonenko	830c37b5b0	Fix concurrent locking with chunk_data_node table Concurrent insert into dist hypertable after a data node marked as unavailable would produce 'tuple concurrently deleted` error. The problem occurs because of missing tuple level locking during scan and concurrent delete from chunk_data_node table afterwards, which should be treated as `SELECT … FOR UPDATE` case instead. Based on the fix by @erimatnor. Fix #5153	2023-03-06 18:40:59 +02:00
Ildar Musin	4c0075010d	Add hooks for hypertable drops To properly clean up the OSM catalog we need a way to reliably track hypertable deletion (including internal hypertables for CAGGS).	2023-03-06 15:10:49 +01:00
Fabrízio de Royes Mello	32046832d3	Fix Hierarchical CAgg chunk_interval_size When a Continuous Aggregate is created the `chunk_interval_size` is defined my the `chunk_interval_size` of the original hypertable multiplied by a fixed factor of 10. The problem is currently when we create a Hierarchical Continuous Aggregate the same factor is applied and it lead to an exponential `chunk_interval_size`. Fixed it by just copying the `chunk_interval_size` from the base Continuous Aggregate for an Hierachical Continuous Aggreagate. Fixes #5382	2023-03-03 12:31:24 -03:00
Nikhil Sontakke	1423b55d18	Fix perf regression due to DML HA We added checks via #4846 to handle DML HA when replication factor is greater than 1 and a datanode is down. Since each insert can go to a different chunk with a different set of datanodes, we added checks on every insert to check if DNs are unavailable. This increased CPU consumption on the AN leading to a performance regression for RF > 1 code paths. This patch fixes this regression. We now track if any DN is marked as unavailable at the start of the transaction and use that information to reduce unnecessary checks for each inserted row.	2023-03-03 18:34:05 +05:30
Pallavi Sontakke	6be14423d5	Flag test space_constraint.sql.in for release run (#5380 ) It was incorrectly flagged as requiring a debug build. Disable-check: force-changelog-changed	2023-03-03 15:52:34 +05:30
Erik Nordström	386d31bc6e	Make copy fetcher more async Make the copy fetcher more asynchronous by separating the sending of the request for data from the receiving of the response. By doing that, the async append node can send the request to each data node before it starts reading the first response. This can massively improve the performance because the response isn't returned until the remote node has finished executing the query and is ready to return the first tuple.	2023-03-02 15:07:23 +01:00
Sotiris Stamokostas	750e69ede1	Renamed size_utils.sql Renamed: tsl/test/sql/size_utils.sql tsl/test/expected/size_utils.out To: tsl/test/sql/size_utils_tsl.sql tsl/test/expected/size_utils_tsl.out because conflicting with test/sql/size_utils.sql	2023-03-02 13:20:08 +02:00
Jan Nidzwetzki	7887576afa	Handle MERGE command in reference join pushdown At the moment, the MERGE command is not supported on distributed hypertables. This patch ensures that the join pushdown code ignores the invocation by the MERGE command.	2023-02-28 15:49:19 +01:00
shhnwz	e6f6eb3ab8	Fix for inconsistent num_chunks Different num_chunks values reported by timescaledb_information.hypertables and timescaledb_information.chunks. View definition of hypertables was not filtering dropped and osm_chunks. Fixes #5338	2023-02-28 16:32:03 +05:30
gayyappan	2f7e0433a9	Create index fails if hypertable has foreign table chunk We cannot create indexes on foreign tables. This PR modifies process_index_chunk to skip OSM chunks	2023-02-27 12:56:52 -05:00
Fabrízio de Royes Mello	152ef02d74	Fix uninitialized bucket_info variable The `bucket_info` variable is initialized by `caggtimebucketinfo_init` function called inside the following branch: `if (rte->relkind == RELKIND_RELATION \|\| rte->relkind == RELKIND_VIEW)` If for some reason we don't enter in this branch then the `bucket_info` will not be initialized leading to an uninitialized variable when returning `bucket_info` at the end of the `cagg_validate_query` function. Fixed it by initializing with zeros the `bucket_info` variable when declaring it. Found by coverity scan.	2023-02-24 15:54:53 -03:00
noctarius aka Christoph Engelbert	0118e6b952	Support CAGG names in hypertable_(detailed_)size This small patch adds support for continuous aggregates to the `hypertable_detailed_size` (and with that `hypertable_size`). It adds an additional check to see if a continuous aggregate exists if a hypertable with the given regclass name isn't found.	2023-02-24 10:48:31 -03:00
Jan Nidzwetzki	e0be9eaa28	Allow pushdown of reference table joins This patch adds the functionality that is needed to perform distributed, parallel joins on reference tables on access nodes. This code allows the pushdown of a join if: * (1) The setting "ts_guc_enable_per_data_node_queries" is enabled * (2) The outer relation is a distributed hypertable * (3) The inner relation is marked as a reference table * (4) The join is a left join or an inner join	2023-02-23 14:32:12 +01:00
Dmitry Simonenko	f12a361ef7	Add timeout argument to the ping_data_node() This PR introduces a timeout argument and a new logic to the timescale_internal.ping_data_node() function which allows to handle io timeouts for nodes being unresponsive. Fix #5312	2023-02-21 19:52:03 +02:00
Mats Kindahl	0cbd7407a6	Get PortalContext when starting job When executing functions, SPI assumes that `TopTransactionContext` is used for atomic execution contexts and `PortalContext` is used for non-atomic contexts. Since jobs need to be able to commit and start transactions, they are executing in a non-atomic context hence `PortalContext` will be used, but `PortalContext` is not set when starting the job. This is not a problem for PL/PgSQL executor, but for other executors (such as PL/Python) it would be. This commit fixes the issue by setting the `PortalContext` variable to the portal context created for the portal and restores it (to NULL) after execution. Fixes #5326	2023-02-20 10:54:05 +01:00
Fabrízio de Royes Mello	c7f46393e7	Change usage of term nested to hierarchical To don't make developers confused the right name for Continuous Aggregates on top of another Continuous Aggregates is `Hierarchical Continuous Aggregates`, so changed the usage of term `nested` for `hierarchical`.	2023-02-18 11:04:51 -03:00
Nikhil Sontakke	d50de8a72d	Fix uninitialized `bucket_info.htpartcolno` warning Found by coverity.	2023-02-17 16:42:20 +01:00
Jacob Champion	20e468f40c	Fix use of TextDatumGetCString() TextDatumGetCString() was made typesafe in upstream HEAD (16devel), so now the compiler catches this. As Tom puts it in ac50f84866: "TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text* is not a Datum. Although this coding will accidentally fail to fail on all known platforms, it risks leaking memory if a detoast step is needed, unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used elsewhere.	2023-02-14 07:54:16 -08:00
Alexander Kuzmenkov	fd66f5936a	Warn about mismatched chunk cache sizes Just noticed abysmal INSERT performance when experimenting with one of our customers' data set, and turns out my cache sizes were misconfigured, leading to constant hypertable chunk cache thrashing. Show a warning to detect this misconfiguration. Also use more generous defaults, we're not supposed to run on a microwave (unlike Postgres).	2023-02-14 19:32:41 +04:00
Zoltan Haindrich	9d3866a50e	Accept all compression options on caggs Enable to properly handle 'compress_segmentby' and 'compress_orderby' compression options on continous aggregates. ALTER MATERIALIZED VIEW test_table_cagg SET ( timescaledb.compress = true, timescaledb.compress_segmentby = 'device_id' ); Fixes #5161	2023-02-13 22:21:18 +01:00
Sven Klemm	ef25fb9ec7	Add dist_ref_table_join generated test files to .gitignore	2023-02-11 11:31:26 +01:00
Rafia Sabih	ece15d66a4	Enable real time aggregation for caggs with joins	2023-02-10 22:12:29 +05:30
Konstantina Skovola	348796f9d9	Fix next_start calculation for fixed schedules This patch fixes several issues with next_start calculation. - Previously, the offset was added twice in some cases. This is fixed by this patch. - Additionally, schedule intervals with month components were not handled correctly. Internally, time_bucket with origin is used to calculate the next start. However, in the case of month intervals, the timestamp calculated for a bucket is always aligned on the first day of the month, regardless of origin. Therefore, previously the result was aligned with origin by adding the difference between origin and its respective time bucket. This difference was computed as a fixed length interval in terms of days and time. That computation led to incorrect computation of next start occasionally, for example when a job should be executed on the last day of a month. That is fixed by adding an appropriate interval of months to initial_start and letting Postgres handle this computation properly. Fixes #5216	2023-02-09 17:57:17 +02:00
Sven Klemm	756ef68d0a	Fix compression_hypertable ordering reliance The hypertable_compression test had on implicit reliance on the ordering of tuples when querying the materialized results. This patch makes the ordering explicit in this test.	2023-02-09 15:23:07 +01:00
Alexander Kuzmenkov	063a9dae29	Improve cost model for data node scans 1) Simplify the path generation for the parameterized data node scans. 1) Adjust the data node scan cost if it's an index scan, instead of always treating it as a sequential scan. 1) Hard-code the grouping estimation for distributed hypertable, instead of using the totally bogus per-column ndistinct value. 1) Add the GUC to disable parameterized data node scan. 1) Add more tests.	2023-02-08 16:12:01 +04:00
Zoltan Haindrich	cad2440b58	Compression can't be enabled on caggs The continuous aggregate creation failed in case segmentby/orderby columns needed quotation.	2023-02-07 21:01:56 +01:00
Rafia Sabih	4cb76bc053	Cosmetic changes to create.c	2023-02-06 22:39:57 +05:30
Sven Klemm	8132908c97	Refactor chunk decompression functions Restructure the code inside decompress_chunk slightly to make core loop reusable by other functions.	2023-02-06 14:52:06 +01:00
Erik Nordström	206056ca12	Fix dist_hypertable test A previous change accidentally broke the dist_hypertable test so that it prematurely exited. This change restores the test so that it executes properly.	2023-02-03 13:35:36 +01:00
Erik Nordström	b81033b835	Make data node command execution interruptible The function to execute remote commands on data nodes used a blocking libpq API that doesn't integrate with PostgreSQL interrupt handling, making it impossible for a user or statement timeout to cancel a remote command. Refactor the remote command execution function to use a non-blocking API and integrate with PostgreSQL signal handling via WaitEventSets. Partial fix for #4958. Refactor remote command execution function	2023-02-03 13:15:28 +01:00
Konstantina Skovola	6bc8980216	Fix year not multiple of day/month in nested CAgg Previously all intervals were converted to seconds using "epoch" with date_part. However, this treats a year as 365.25 days to account for leap years, leading to the unexpected situation that a year is not a multiple of a day or a month. Fixed by treating month-only intervals as multiples of 30 days. Fixes #5231	2023-02-02 12:14:37 +02:00
Sven Klemm	789bb26dfb	Lock down search_path in SPI calls	2023-02-01 07:54:03 +01:00
Fabrízio de Royes Mello	c0f2ed1809	Mark cagg_watermark parallel safe The `cagg_watermark` function perform just read-only operations so is safe to make it parallel safe to take advantage of the Postgres parallel query. Since 2.7 when we introduced the new Continuous Aggregate format we don't use partials anymore and those aggregate functions `partialize_agg` and `finalize_agg` are not parallel safe, so make no sense don't take advantage of Postgres parallel query for realtime Continuous Aggregates.	2023-01-31 13:07:19 -03:00
Fabrízio de Royes Mello	e6173d1241	Remove unused function prototype	2023-01-31 09:36:59 -03:00
Erik Nordström	d489ed6f32	Fix use of prepared statement in async module Broken code caused the async connection module to never send queries using prepared statements. Instead, queries were sent using the parameterized query statement instead. Fix this so that prepared statements are used when created.	2023-01-31 12:01:03 +01:00
Lakshmi Narayanan Sreethar	1a3e7ad7d1	Run dist_move_chunk as a solo test in PG15 When run in a parallel group, the dist_move_chunk test can get into a deadlock with another test running a 'DROP DATABASE' command. So, mark it as a solo test to disallow it from running in a parallel group. Closes #4972	2023-01-31 14:26:17 +05:30
Lakshmi Narayanan Sreethar	03b740cd70	Enable telemetry_stats testcase The telemetry_stats testcase was accidentally disabled by PR #5162.	2023-01-31 13:25:02 +05:30
Erik Nordström	5d12a3883d	Make connection establishment interruptible Refactor the data node connection establishment so that it is interruptible, e.g., by ctrl-c or `statement_timeout`. Previously, the connection establishment used blocking libpq calls. By instead using asynchronous connection APIs and integrating with PostgreSQL interrupt handling, the connection establishment can be canceled by an interrupt caused by a statement timeout or a user. Fixes #2757	2023-01-30 17:48:59 +01:00
Sven Klemm	b229b3aefd	Small decompress_chunk refactor Refactor the decompression code to move the decompressor initialization into a separate function.	2023-01-30 16:47:16 +01:00
Erik Nordström	cce0e18c36	Manage life-cycle of connections via memory contexts Tie the life cycle of a data node connection to the memory context it is created on. Previously, a data node connection was automatically closed at the end of a transaction, although often a connection needs to live beyond a single transaction. For example, a connection cache is maintained for data node connections, and, for such cases, a flag was set on a connection to avoid closing it automatically. Instead of tying connections to transactions, they are now life-cycle managed via memory contexts. This simplifies the handling of connections and avoids having to create exceptions to closing connections at transaction end.	2023-01-30 15:46:21 +01:00
Mats Kindahl	5661ff1523	Add role-level security to job error log Since the job error log can contain information from many different sources and also from many different jobs it is important to ensure that visibility of the job error log entries is restricted to job owners. This commit extend the view `timescaledb_information.job_errors` with role-based checks so that a user can only see entries for jobs that she has permission to view and restrict the permissions to `_timescaledb_internal.job_errors` so that users only can view the job error log through the view. A special case is added so that the superuser and the database owner can see all log entries, even if there is no associated job id with the log entry. Closes #5217	2023-01-30 12:13:00 +01:00
Sven Klemm	334864127d	Stop blocking RETURNING for compressed chunks Recent refactorings in the INSERT into compressed chunk code path allowed us to support this feature but the check to prevent users from using this feature was not removed as part of that patch. This patch removes the blocker code and adds a minimal test case.	2023-01-30 09:49:52 +01:00
Rafia Sabih	043092a97f	Fix timestamp out of range When start or end for a refresh job is null, then it gives an error while bucketing because start and end are already min and max timestamp value before bucketing. Hence, skip bucketing for this case. Partly fixes #4117	2023-01-26 20:11:21 +05:30
Rafia Sabih	a67b90e977	Allow joins in continuous aggregates Enable the support of having join in the query used for creating the continuous aggregates. It has follwoing restrictions- 1. Join can involve only one hypertable and one normal table 2. Join should be a inner join 3. Join condition can only be equality	2023-01-24 19:57:24 +05:30
Jan Nidzwetzki	9ae3ae33b7	Add scan plan logic for remote joins This patch adds the missing functionality to create scan plans for remote joins. Most of the code is a backport from PG Upstream.	2023-01-23 14:04:42 +01:00
Jan Nidzwetzki	28dbeaa2ca	Add cost estimation for remote joins This patch adds the missing functionality to estimate the costs of remote joins. Most of the code is a backport from PG Upstream.	2023-01-19 15:52:37 +01:00
Erik Nordström	1676259840	Fix repartition behavior when attaching data node When attaching a data node and specifying `repartition=>false`, the current number of partitions should remain instead of recalculating the partitioning based on the number of data nodes. Fixes #5157	2023-01-18 16:20:49 +01:00
Jan Nidzwetzki	19065bbdf3	Introduce a FDW option to mark reference tables With this patch, the ability to mark reference tables (tables that exist on all data nodes of a multi-node installation) via an FDW option has been added.	2023-01-18 15:11:02 +01:00

1 2 3 4 5 ...

1641 Commits