timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-06-03 03:43:06 +08:00

Author	SHA1	Message	Date
Zoltan Haindrich	975e9ca166	Fix segfault after column drop on compressed table Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes #5458 #5411	2023-04-06 08:49:54 +02:00
Bharathy	1fb058b199	Support UPDATE/DELETE on compressed hypertables. This patch does following: 1. Executor changes to parse qual ExprState to check if SEGMENTBY column is specified in WHERE clause. 2. Based on step 1, we build scan keys. 3. Executor changes to do heapscan on compressed chunk based on scan keys and move only those rows which match the WHERE clause to staging area aka uncompressed chunk. 4. Mark affected chunk as partially compressed. 5. Perform regular UPDATE/DELETE operations on staging area. 6. Since there is no Custom Scan (HypertableModify) node for UPDATE/DELETE operations on PG versions < 14, we don't support this feature on PG12 and PG13.	2023-04-05 17:19:45 +05:30
Erik Nordström	2e6c6b5c58	Refactor and optimize distributed COPY Refactor the code path that handles remote distributed COPY. The main changes include: * Use a hash table to lookup data node connections instead of a list. * Refactor the per-data node buffer code that accumulates rows into bigger CopyData messages. * Reduce the default number of rows in a CopyData message to 100. This seems to improve throughput, probably striking a better balance between message overhead and latency. * The number of rows to send in each CopyData message can now be changed via a new foreign data wrapper option.	2023-04-04 15:35:54 +02:00
Rafia Sabih	ff5959f8f9	Handle when FROM clause is missing in continuous aggregate definition It now errors out for such a case. Fixes #5500	2023-03-29 22:29:16 +02:00
Konstantina Skovola	cb81c331ae	Allow named time_bucket arguments in Cagg definition Fixes #5450	2023-03-28 18:45:41 +03:00
Rafia Sabih	98218c1d07	Enable joins for heirarchical continuous aggregates The joins could be between a continuous aggregate and hypertable, continuous aggregate and a regular Postgres table, and continuous aggregate and a regular Postgres view.	2023-03-28 15:12:54 +02:00
Mats Kindahl	777c599a34	Do not segfault on large histogram() parameters There is a bug in `width_bucket()` causing an overflow and subsequent NaN value as a result of dividing with `+inf`. The NaN value is interpreted as an integer and hence generates an index out of range for the buckets. This commit fixes this by generating an error rather than segfaulting for bucket indexes that are out of range.	2023-03-28 12:47:02 +02:00
Erik Nordström	a51d21efbe	Fix issue creating dimensional constraints During chunk creation, the chunk's dimensional CHECK constraints are created via an "upcall" to PL/pgSQL code. However, creating dimensional constraints in PL/pgSQL code sometimes fails, especially during high-concurrency inserts, because PL/pgSQL code scans metadata using a snapshot that might not see the same metadata as the C code. As a result, chunk creation sometimes fail during constraint creation. To fix this issue, implement dimensional CHECK-constraint creation in C code. Other constraints (FK, PK, etc.) are still created via an upcall, but should probably also be rewritten in C. However, since these constraints don't depend on recently updated metadata, this is left to a future change. Fixes #5456	2023-03-24 10:55:08 +01:00
Nikhil Sontakke	7e43f45ccb	Ensure superuser perms during copy/move chunk There is a security loophole in current core Postgres, due to which it's possible for a non-superuser to gain superuser access by attaching dependencies like expression indexes, triggers, etc. before logical replication commences. To avoid this, we now ensure that the chunk objects that get created for the subscription are done so as a superuser. This avoids malicious dependencies by regular users.	2023-03-23 13:26:47 +05:30
Fabrízio de Royes Mello	38fcd1b76b	Improve Realtime Continuous Aggregate performance When calling the `cagg_watermark` function to get the watermark of a Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query in the underlying materialization hypertable. The problem is that a `SELECT MAX(time_dimention)` query can be expensive because it will scan all hypertable chunks increasing the planning time for a Realtime Continuous Aggregates. Improved it by creating a new catalog table to serve as a cache table to store the current Continous Aggregate watermark in the following situations: - Create CAgg: store the minimum value of hypertable time dimension data type; - Refresh CAgg: store the last value of the time dimension materialized in the underlying materialization hypertable (or the minimum value of materialization hypertable time dimension data type if there's no data materialized); - Drop CAgg Chunks: the same as refresh cagg. Closes #4699, #5307	2023-03-22 16:35:23 -03:00
shhnwz	699fcf48aa	Stats improvement for Uncompressed Chunks During the compression autovacuum use to be disabled for uncompressed chunk and enable after decompression. This leads to postgres maintainence issue. Let's not disable autovacuum for uncompressed chunk anymore. Let postgres take care of the stats in its natural way. Fixes #309	2023-03-22 23:51:13 +05:30
Erik Nordström	63b416b6b0	Use consistent snapshots when scanning metadata Invalidate the catalog snapshot in the scanner to ensure that any lookups into `pg_catalog` uses a snapshot that is consistent with the snapshot used to scan TimescaleDB metadata. This fixes an issue where a chunk could be looked up without having a proper relid filled in, causing an assertion failure (`ASSERT_IS_VALID_CHUNK`). When a chunk is scanned and found (in `chunk_tuple_found()`), the Oid of the chunk table is filled in using `get_relname_relid()`, which could return InvalidOid due to use of a different snapshot when scanning `pg_class`. Calling `InvalidateCatalogSnapshot()` before starting the metadata scan in `Scanner` ensures the pg_catalog snapshot used is refreshed. Due to the difficulty of reproducing this MVCC issue, no regression or isolation test is provided, but it is easy to hit this bug when doing highly concurrent COPY:s into a distributed hypertable.	2023-03-21 10:34:23 +01:00
Bharathy	cc51e20e87	Add support for ON CONFLICT DO UPDATE for compressed hypertables This patch fixes execution of INSERT with ON CONFLICT DO UPDATE by removing error and allowing UPDATE do happen on the given compressed hypertable.	2023-03-20 22:55:27 +05:30
Mats Kindahl	67ff84e8f2	Add check for malloc failure in libpq calls The functions `PQconndefaults` and `PQmakeEmptyPGresult` calls `malloc` and can return NULL if it fails to allocate memory for the defaults and the empty result. It is checked with an `Assert`, but this will be removed in production builds. Replace the `Assert` with an checks to generate an error in production builds rather than trying to de-reference the pointer and cause a crash.	2023-03-16 14:20:54 +01:00
Zoltan Haindrich	790b322b24	Fix DEFAULT value handling in decompress_chunk The sql function decompress_chunk did not filled in default values during its operation. Fixes #5412	2023-03-16 09:16:50 +01:00
Sven Klemm	65562f02e8	Support unique constraints on compressed chunks This patch allows unique constraints on compressed chunks. When trying to INSERT into compressed chunks with unique constraints any potentially conflicting compressed batches will be decompressed to let postgres do constraint checking on the INSERT. With this patch only INSERT ON CONFLICT DO NOTHING will be supported. For decompression only segment by information is considered to determine conflicting batches. This will be enhanced in a follow-up patch to also include orderby metadata to require decompressing less batches.	2023-03-13 12:04:38 +01:00
Jan Nidzwetzki	356a20777c	Handle user-defined FDW options properly This patch changes the way user-defined FDW options (e.g., startup costs, per-tuple costs) are handled. So far, these values were retrieved in apply_fdw_and_server_options() but reset to default values afterward.	2023-03-13 10:39:52 +01:00
Maheedhar PV	5e0391392a	Out of on_proc_exit slots on guc license change Problem: When the guc timescaledb.license = 'timescale' is set in the conf file and a SIGHUP is sent to postgress process and a reload of the tsl module is triggered. This reload happens in 2 phases 1. tsl_module_load is called which will load the module only if not already loaded and 2.The ts_module_init is called for every ts_license_guc_assign_hook irrespective of if it is new load.This ts_module_init initialization function also registers a on_proc_exit function to be called on exit. The list of on_proc_exit methods are maintained in a fixed array on_proc_exit_list of size MAX_ON_EXITS (20) which gets filled up on repeated SIGHUPs and hence an error. Fix: The fix is to make the ts_module_init() register the on_proc_exit callback, only in case the module is reloaded and not in every init call. Closes #5233	2023-03-13 06:24:01 +05:30
Jan Nidzwetzki	7b8177aa74	Fix file trailer handling in the COPY fetcher The copy fetcher fetches tuples in batches. When the last element in the batch is the file trailer, the trailer was not handled correctly. The existing logic did not perform a PQgetCopyData in that case. Therefore the state of the fetcher was not set to EOF and the copy operation was not correctly finished at this point. Fixes: #5323	2023-03-09 14:29:06 +01:00
Bharathy	f54dd7b05d	Fix SEGMENTBY columns predicates to be pushed down WHERE clause with SEGMENTBY column of type text/bytea non-equality operators are not pushed down to Seq Scan node of compressed chunk. This patch fixes this issue. Fixes #5286	2023-03-08 19:17:43 +05:30
Erik Nordström	c76a0cff68	Add parallel support for partialize_agg() Make `partialize_agg()` support parallel query execution. To make this work, the finalize node need combine the individual partials from each parallel worker, but the final step that turns the resulting partial into the finished aggregate should not happen. Thus, in the case of distributed hypertables, each data node can run a parallel query to compute a partial, and the access node can later combine and finalize these partials into the final aggregate. Esssentially, there will be one combine step (minus final) on each data node, and then another one plus final on the access node. To implement this, the finalize aggregate plan is simply modified to elide the final step, and to reserialize the partial. It is only possible to do this at the plan stage; if done at the path stage, the PostgreSQL planner will hit assertions that assume that the node has certain values (e.g., it doesn't expect combine Paths to skip the final step).	2023-03-08 14:14:25 +01:00
Konstantina Skovola	5a3cacd06f	Fix sub-second intervals in hierarchical caggs Previously we used date_part("epoch", interval) and integer division internally to determine whether the top cagg's interval is a multiple of its parent's. This led to precision loss and wrong results in the case of intervals with sub-second components. Fixed by using the `ts_interval_value_to_internal` function to convert intervals to appropriate integer representation for division. Fixes #5277	2023-03-07 13:25:49 +02:00
Sven Klemm	d386aa1def	Release 2.10.1 This release contains bug fixes since the 2.10.0 release. We recommend that you upgrade at the next available opportunity. Bugfixes * #5159 Support Continuous Aggregates names in hypertable_(detailed_)size * #5226 Fix concurrent locking with chunk_data_node table * #5317 Fix some incorrect memory handling * #5336 Use NameData and namestrcpy for names * #5343 Set PortalContext when starting job * #5360 Fix uninitialized bucket_info variable * #5362 Make copy fetcher more async * #5364 Fix num_chunks inconsistency in hypertables view * #5367 Fix column name handling in old-style continuous aggregates * #5378 Fix multinode DML HA performance regression * #5384 Fix Hierarchical Continuous Aggregates chunk_interval_size Thanks * @justinozavala for reporting an issue with PL/Python procedures in the background worker * @Medvecrab for discovering an issue with copying NameData when forming heap tuples. * @pushpeepkmonroe for discovering an issue in upgrading old-style continuous aggregates with renamed columns * @pushpeepkmonroe for discovering an issue in upgrading old-style continuous aggregates with renamed columns	2023-03-07 01:23:38 +01:00
Dmitry Simonenko	830c37b5b0	Fix concurrent locking with chunk_data_node table Concurrent insert into dist hypertable after a data node marked as unavailable would produce 'tuple concurrently deleted` error. The problem occurs because of missing tuple level locking during scan and concurrent delete from chunk_data_node table afterwards, which should be treated as `SELECT … FOR UPDATE` case instead. Based on the fix by @erimatnor. Fix #5153	2023-03-06 18:40:59 +02:00
Fabrízio de Royes Mello	32046832d3	Fix Hierarchical CAgg chunk_interval_size When a Continuous Aggregate is created the `chunk_interval_size` is defined my the `chunk_interval_size` of the original hypertable multiplied by a fixed factor of 10. The problem is currently when we create a Hierarchical Continuous Aggregate the same factor is applied and it lead to an exponential `chunk_interval_size`. Fixed it by just copying the `chunk_interval_size` from the base Continuous Aggregate for an Hierachical Continuous Aggreagate. Fixes #5382	2023-03-03 12:31:24 -03:00
Mats Kindahl	a6ff7ba6cc	Rename columns in old-style continuous aggregates For continuous aggregates with the old-style partial aggregates renaming columns that are not in the group-by clause will generate an error when upgrading to a later version. The reason is that it is implicitly assumed that the name of the column is the same as for the direct view. This holds true for new-style continous aggregates, but is not always true for old-style continuous aggregates. In particular, columns that are not part of the `GROUP BY` clause can have an internally generated name. This commit fixes that by extracting the name of the column from the partial view and use that when renaming the partial view column and the materialized table column.	2023-03-03 14:02:37 +01:00
Erik Nordström	386d31bc6e	Make copy fetcher more async Make the copy fetcher more asynchronous by separating the sending of the request for data from the receiving of the response. By doing that, the async append node can send the request to each data node before it starts reading the first response. This can massively improve the performance because the response isn't returned until the remote node has finished executing the query and is ready to return the first tuple.	2023-03-02 15:07:23 +01:00
shhnwz	e6f6eb3ab8	Fix for inconsistent num_chunks Different num_chunks values reported by timescaledb_information.hypertables and timescaledb_information.chunks. View definition of hypertables was not filtering dropped and osm_chunks. Fixes #5338	2023-02-28 16:32:03 +05:30
Jan Nidzwetzki	e0be9eaa28	Allow pushdown of reference table joins This patch adds the functionality that is needed to perform distributed, parallel joins on reference tables on access nodes. This code allows the pushdown of a join if: * (1) The setting "ts_guc_enable_per_data_node_queries" is enabled * (2) The outer relation is a distributed hypertable * (3) The inner relation is marked as a reference table * (4) The join is a left join or an inner join	2023-02-23 14:32:12 +01:00
Dmitry Simonenko	f12a361ef7	Add timeout argument to the ping_data_node() This PR introduces a timeout argument and a new logic to the timescale_internal.ping_data_node() function which allows to handle io timeouts for nodes being unresponsive. Fix #5312	2023-02-21 19:52:03 +02:00
Mats Kindahl	8a51a76d00	Fix changelog message for NameData issue	2023-02-20 14:26:56 +01:00
Oleg Tselebrovskiy	0746517c77	Fix some incorrect memory handling While running TimescaleDB under valgrind I've found two cases of incorrect memory handling. Case 1: When creating timescaledb extension, during the insertion of metadata there is some junk in memory that is not zeroed before writing there. Changes in metadata.c fix this. Case 2: When executing GRANT smth ON ALL TABLES IN SCHEMA some_schema and deconstructing this statement into granting to individual tables, process of copying names of those tables is wrong. Currently, you aren't copying the data itself, but an address to data on a page in some buffer. There's a problem - when the page in this buffer changes, copied address would lead to wrong data. Changes in process_utility.c fix this by allocating memory and then copying needed relname there. Fixes #5311	2023-02-20 14:26:56 +01:00
Mats Kindahl	0cbd7407a6	Get PortalContext when starting job When executing functions, SPI assumes that `TopTransactionContext` is used for atomic execution contexts and `PortalContext` is used for non-atomic contexts. Since jobs need to be able to commit and start transactions, they are executing in a non-atomic context hence `PortalContext` will be used, but `PortalContext` is not set when starting the job. This is not a problem for PL/PgSQL executor, but for other executors (such as PL/Python) it would be. This commit fixes the issue by setting the `PortalContext` variable to the portal context created for the portal and restores it (to NULL) after execution. Fixes #5326	2023-02-20 10:54:05 +01:00
Maheedhar PV	91b4a66eb9	Release 2.10.0 (#5324 ) This release contains new features and bug fixes since the 2.9.3 release. This release is high priority for upgrade. We strongly recommend that you upgrade as soon as possible. Features * #5241 Allow RETURNING clause when inserting into compressed chunks * #5245 Manage life-cycle of connections via memory contexts * #5246 Make connection establishment interruptible * #5253 Make data node command execution interruptible * #5243 Enable real-time aggregation for continuous aggregates with joins * #5262 Extend enabling compression on a continuous aggregrate with 'compress_segmentby' and 'compress_orderby' parameters Bugfixes * #4926 Fix corruption when inserting into compressed chunks * #5218 Add role-level security to job error log * #5214 Fix use of prepared statement in async module * #5290 Compression can't be enabled on continuous aggregates when segmentby/orderby columns need quotation * #5239 Fix next_start calculation for fixed schedules	2023-02-20 11:06:05 +05:30
Mats Kindahl	38b71d0e70	Use NameData and namestrcpy for names Using `strlcpy` to copy variables holding PostgreSQL names can cause issues since names are fixed-size types of length 64. This means that any data that follows the initial null-terminated string will also be part of the data. Instead of using `const char*` for PostgreSQL names, use `NameData` type for PostgreSQL names and use `namestrcpy` to copy them rather than `strlcpy`.	2023-02-17 10:43:46 +01:00
Zoltan Haindrich	9d3866a50e	Accept all compression options on caggs Enable to properly handle 'compress_segmentby' and 'compress_orderby' compression options on continous aggregates. ALTER MATERIALIZED VIEW test_table_cagg SET ( timescaledb.compress = true, timescaledb.compress_segmentby = 'device_id' ); Fixes #5161	2023-02-13 22:21:18 +01:00
Rafia Sabih	ece15d66a4	Enable real time aggregation for caggs with joins	2023-02-10 22:12:29 +05:30
Konstantina Skovola	348796f9d9	Fix next_start calculation for fixed schedules This patch fixes several issues with next_start calculation. - Previously, the offset was added twice in some cases. This is fixed by this patch. - Additionally, schedule intervals with month components were not handled correctly. Internally, time_bucket with origin is used to calculate the next start. However, in the case of month intervals, the timestamp calculated for a bucket is always aligned on the first day of the month, regardless of origin. Therefore, previously the result was aligned with origin by adding the difference between origin and its respective time bucket. This difference was computed as a fixed length interval in terms of days and time. That computation led to incorrect computation of next start occasionally, for example when a job should be executed on the last day of a month. That is fixed by adding an appropriate interval of months to initial_start and letting Postgres handle this computation properly. Fixes #5216	2023-02-09 17:57:17 +02:00
Zoltan Haindrich	cad2440b58	Compression can't be enabled on caggs The continuous aggregate creation failed in case segmentby/orderby columns needed quotation.	2023-02-07 21:01:56 +01:00
Lakshmi Narayanan Sreethar	fb3ad7d6c6	Release 2.9.3 This release contains bug fixes since the 2.9.2 release. This release is high priority for upgrade. We strongly recommend that you upgrade as soon as possible. Bugfixes * #4804 Skip bucketing when start or end of refresh job is null * #5108 Fix column ordering in compressed table index not following the order of a multi-column segment by definition * #5187 Don't enable clang-tidy by default * #5255 Fix year not being considered as a multiple of day/month in hierarchical continuous aggregates * #5259 Lock down search_path in SPI calls	2023-02-03 20:04:18 +05:30
Erik Nordström	b81033b835	Make data node command execution interruptible The function to execute remote commands on data nodes used a blocking libpq API that doesn't integrate with PostgreSQL interrupt handling, making it impossible for a user or statement timeout to cancel a remote command. Refactor the remote command execution function to use a non-blocking API and integrate with PostgreSQL signal handling via WaitEventSets. Partial fix for #4958. Refactor remote command execution function	2023-02-03 13:15:28 +01:00
Konstantina Skovola	6bc8980216	Fix year not multiple of day/month in nested CAgg Previously all intervals were converted to seconds using "epoch" with date_part. However, this treats a year as 365.25 days to account for leap years, leading to the unexpected situation that a year is not a multiple of a day or a month. Fixed by treating month-only intervals as multiples of 30 days. Fixes #5231	2023-02-02 12:14:37 +02:00
Sven Klemm	789bb26dfb	Lock down search_path in SPI calls	2023-02-01 07:54:03 +01:00
Erik Nordström	d489ed6f32	Fix use of prepared statement in async module Broken code caused the async connection module to never send queries using prepared statements. Instead, queries were sent using the parameterized query statement instead. Fix this so that prepared statements are used when created.	2023-01-31 12:01:03 +01:00
Erik Nordström	5d12a3883d	Make connection establishment interruptible Refactor the data node connection establishment so that it is interruptible, e.g., by ctrl-c or `statement_timeout`. Previously, the connection establishment used blocking libpq calls. By instead using asynchronous connection APIs and integrating with PostgreSQL interrupt handling, the connection establishment can be canceled by an interrupt caused by a statement timeout or a user. Fixes #2757	2023-01-30 17:48:59 +01:00
Erik Nordström	cce0e18c36	Manage life-cycle of connections via memory contexts Tie the life cycle of a data node connection to the memory context it is created on. Previously, a data node connection was automatically closed at the end of a transaction, although often a connection needs to live beyond a single transaction. For example, a connection cache is maintained for data node connections, and, for such cases, a flag was set on a connection to avoid closing it automatically. Instead of tying connections to transactions, they are now life-cycle managed via memory contexts. This simplifies the handling of connections and avoids having to create exceptions to closing connections at transaction end.	2023-01-30 15:46:21 +01:00
Mats Kindahl	5661ff1523	Add role-level security to job error log Since the job error log can contain information from many different sources and also from many different jobs it is important to ensure that visibility of the job error log entries is restricted to job owners. This commit extend the view `timescaledb_information.job_errors` with role-based checks so that a user can only see entries for jobs that she has permission to view and restrict the permissions to `_timescaledb_internal.job_errors` so that users only can view the job error log through the view. A special case is added so that the superuser and the database owner can see all log entries, even if there is no associated job id with the log entry. Closes #5217	2023-01-30 12:13:00 +01:00
Sven Klemm	334864127d	Stop blocking RETURNING for compressed chunks Recent refactorings in the INSERT into compressed chunk code path allowed us to support this feature but the check to prevent users from using this feature was not removed as part of that patch. This patch removes the blocker code and adds a minimal test case.	2023-01-30 09:49:52 +01:00
Rafia Sabih	043092a97f	Fix timestamp out of range When start or end for a refresh job is null, then it gives an error while bucketing because start and end are already min and max timestamp value before bucketing. Hence, skip bucketing for this case. Partly fixes #4117	2023-01-26 20:11:21 +05:30
Rafia Sabih	a67b90e977	Allow joins in continuous aggregates Enable the support of having join in the query used for creating the continuous aggregates. It has follwoing restrictions- 1. Join can involve only one hypertable and one normal table 2. Join should be a inner join 3. Join condition can only be equality	2023-01-24 19:57:24 +05:30

1 2 3 4 5 ...

369 Commits