timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-16 18:43:18 +08:00

Author	SHA1	Message	Date
Nikhil	2ffa1bf436	Implement cleanup for chunk copy/move A chunk copy/move operation is carried out in stages and it can fail in any of them. We track the last completed stage in the "chunk_copy_operation" catalog table. In case of failure, a "chunk_copy_cleanup" function can be invoked to bring the chunk back to its original state on the source datanode and all transient objects like replication slot, publication, subscription, empty chunk, metadata updates, etc are cleaned up. Includes test case changes for each and every stage induced failure. To avoid confusion between chunk copy activity and chunk copy operation this patch also consistently uses "operation" everywhere now instead of "activity"	2021-07-29 16:53:12 +03:00
Erik Nordström	bea2613455	Add constraints when copying chunks across data nodes Chunk constraints are now added after a chunk has been copied from one data node to another. The constraints are added when the chunk is made visible on the destination node, i.e., after data has been copied and the chunk's metadata is created. As an alternative, the constraints could be added when the chunk table is first created, but before the metadata for the chunk is added. This would have the benefit of validating each copied (inserted) row against the constraints during the data copy phase. However, this would also necessitate decoupling the step of creating the constraint metadata from the creation of the actual constraints since the other chunk metadata that is referenced does not yet exist. Such decoupling would require validating that the metadata actually matches the constraints of the table when the metadata is later created. One downside of adding the constraints after data copying is that it necessitates validating all the chunk's rows against the constraints after insertion as opposed to during insertion. If this turns out to be a performance issue, validation could be initially deferred. This is left as a future optimization.	2021-07-29 16:53:12 +03:00
Dmitry Simonenko	38c1781748	Copy/move chunk refactoring Remove copy_chunk_data() function and code needed to support it, such as the 'transactional' argument. Rework copy chunk logic using separate stages. Introduce copy_chunk() API function as an internal wrapper for the move_chunk().	2021-07-29 16:53:12 +03:00
Nikhil	f6b0250557	Implement wrapper API for copy/move chunk The building blocks required for implementing end-to-end copy/move chunk functionality have now been wrapped in a procedure. A procedure is required because multiple transactions are needed to carry out the activity across the access node and the involved two data nodes. The following steps are encapsulated in this procedure 1) Create an empty chunk table on the destination data node 2) Copy the data from the src data node chunk to this newly created destination node chunk. This is done via inbuilt PostgreSQL logical replication functionality 3) Attach this chunk to the hypertable on the dst data node 4) Remove this chunk from the src data node to complete the move if requested A new catalog table "chunk_copy_activity" has been added to track the progress of the above stages. A unique id gets assigned to each activity and it is updated with the completed stages as things progress.	2021-07-29 16:53:12 +03:00
Erik Nordström	b8ff780c50	Add ability to create chunk from existing table The `create_chunk` API has been extended to allow creating a chunk from an existing relational table. The table is turned into a chunk by attaching it to the root hypertable via inheritance. The purpose of this functionality is to allow copying a chunk to another node. First, the chunk table and data is copied. After that, the `create_chunk` can be executed to make the new table part of the hypertable. Currently, the relational table used to create the chunk has to match the hypertable in terms of constraints, triggers, etc. PostgreSQL itself enforces the existence of same-named CHECK constraints, but no enforcement currently exists for other objects, including triggers UNIQUE, PRIMARY KEY, or FOREIGN KEY constraints. Such enforcement can be implemented in the future, if deemed necessary. Another option is to automatically add all the required objects (triggers, constraints) based on the hypertable equivalents. However, that might also lead to duplicate objects in case some of them exist on the table prior to creating the chunk.	2021-07-29 16:53:12 +03:00
Ruslan Fomkin	404f1cdbad	Create chunk table from access node Creates a table for chunk replica on the given data node. The table gets the same schema and name as the chunk. The created chunk replica table is not added into metadata on the access node or data node. The primary goal is to use it during copy/move chunk.	2021-07-29 16:53:12 +03:00
Ruslan Fomkin	34e99a1c23	Return error for NULL input to create_chunk_table Gives errors if any argument of create_chunk_table is NULL instead of being STRICT. Utilizes newly added macros for this.	2021-07-29 16:53:12 +03:00
Ruslan Fomkin	28ccecbe7c	Create an empty chunk table Adds an internal API function to create an empty chunk table according the given hypertable for the given chunk table name and dimension slices. This functions creates a chunk table inheriting from the hypertable, so it guarantees the same schema. No TimescaleDB's metadata is updated. To be able to create the chunk table in a tablespace attached to the hyeprtable, this commit allows calculating the tablespace id without the dimension slice to exist in the catalog. If there is already a chunk, which collides on dimension slices, the function fails to create the chunk table. The function will be used internally in multi-node to be able to replicate a chunk from one data node to another.	2021-07-29 16:53:12 +03:00
Sven Klemm	ff5d7e42bb	Adjust code to PG14 reltuples changes PG14 changes the initial value of pg_class.reltuples to -1 to allow differentiating between an empty relation and a relation where ANALYZE has not yet run. https://github.com/postgres/postgres/commit/3d351d916b	2021-06-29 16:35:35 +02:00
Erik Nordström	98110af75b	Constify parameters and return values of core APIs Harden core APIs by adding the `const` qualifier to pointer parameters and return values passed by reference. Adding `const` to APIs has several benefits and potentially reduces bugs. * Allows core APIs to be called using `const` objects. * Callers know that objects passed by reference are not modified as a side-effect of a function call. * Returning `const` pointers enforces "read-only" usage of pointers to internal objects, forcing users to copy objects when mutating them or using explicit APIs for mutations. * Allows compiler to apply optimizations and helps static analysis. Note that these changes are so far only applied to core API functions. Further work can be done to improve other parts of the code.	2021-06-14 22:09:10 +02:00
Sven Klemm	fb863f12c7	Remove support for PG11 Remove support for compiling against PostgreSQL 11. This patch also removes PG11 specific compatibility macros.	2021-06-01 20:21:06 +02:00
Sven Klemm	6e437c2d95	Fix use after free in chunk_api_get_chunk_stats	2021-04-13 15:31:41 +02:00
Ruslan Fomkin	639aef76a4	Refactor chunk creation for future extension Separates chunk preparation and metadata update. Separates preparation of constraints names, since there is no overlap between preparing names for dimension constraints and other constraints. Factors out creation of json string describing dimension slices of a chunk. This refactoring is preparation for implementing new functionalities.	2021-04-06 14:02:22 +02:00
Sven Klemm	e79c0648cf	Fix member access within misaligned address in chunk_update_colstats The array argument passed to array_length is treated as AnyArrayType which is a union of ArrayType and ExpandedArrayHeader, which lead to member access within misaligned address when used on the argument passed to array_length by chunk_update_colstats which is ArrayType.	2020-10-13 21:05:23 +02:00
Mats Kindahl	c321fe0ca0	Check insert privileges to create chunk To create a chunk in a hypertable, it is currently necessary to be the owner of the hypertable of the chunk. If a user has insert privileges only, it will fail with an error message, which causes problems when inserting data into distributed hypertables since the user cannot create new chunks. This commit changes this and only requires that the user has insert privileges on the hypertable of the chunk for allowing creation of a new chunk. Closes #2393	2020-09-21 18:16:40 +02:00
Brian Rowe	6b62ed543c	Fetch collations from data nodes during ANALYZE This change fixes the stats collecting code to also return the slot collation fields for PG12. This fixes a bug (#2093) where running an ANALYZE in PG12 would break queries on distributed tables.	2020-07-20 10:54:44 -07:00
Erik Nordström	596515eb0f	Fix ANALYZE on replicated distributed hypertable With replicated chunks, the function to import column stats would experience errors when updating `pg_statistics`, since it tried to write identical stats from several replica chunks. This change fixes this issue by filtering duplicate stats rows received from data nodes. In the future, this could be improved by only requesting stats from "primary" chunks on each data node, thus avoiding duplicates without having to filter the result. However, this would complicate the function interface as it would require sending a list of chunks instead of just getting the stats for all chunks in a hypertable.	2020-06-18 12:38:18 +02:00
Sven Klemm	db617bf1d6	Fix typos in comments and documentation	2020-06-10 15:09:31 +02:00
Erik Nordström	8887f26baf	Fix array construction issue for remote colstats When fetching remote column statistics (`pg_statistic`) from data nodes, the `stanumbers` field was not turned into an array correctly. This caused values to be corrupted when importing them to the access node. This issue has been fixed along with some compiler warning issues (e.g., mixed declaration and code).	2020-05-27 17:31:09 +02:00
Brian Rowe	fad33fe954	Collect column stats for distributed tables. This change adds a new command to return a subset of the column stats for a hypertable (column width, percent null, and percent distinct). As part of the execution of this command on an access node, these stats will be collected for distributed chunks and updated on the access node.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	71e2c35d48	Run distributed VACUUM/ANALYZE without FDW API Run VACUUM/ANALYZE and automatically import the updated stats using the distributed DDL functionality instead of FDW analyze wrappers.	2020-05-27 17:31:09 +02:00
niksa	02f7d7aa48	Fetch data using either cursor or row-by-row This change introduces two ways of fetching data from data nodes: one using cursors and another one using row-by-row mode. The major benefit of row-by-row mode is that it enables running parallel plans on data nodes. The default data fetcher uses row-by-row mode. A new GUC `timescaledb.remote_data_fetcher` has been added to enable switching between these two implementations (rowbyrow or cursor).	2020-05-27 17:31:09 +02:00
Erik Nordström	6a9db8a621	Add function to fetch remote chunk relation stats A new function, `get_chunk_relstats()`, allows fetching relstats (basically `pg_class.{relpages,reltuples`) from remote chunks on data nodes and writing it to the `pg_class` entry for the corresponding local chunk. The function expects either a chunk or a hypertable as input and returns the relstats for the given chunk or all chunks for the given hypertable, respectively. Importing relstats as described is useful as part of a distributed ANALYZE/VACUUM that won't require fetching all data into the access node for local sampling (like the current implemention does). In a future change, this function will be called as part of a local ANALYZE on the access node that runs ANALYZE on all data nodes followed by importing of the resulting relstats for the analyzed chunks.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	4e004c5564	Unify to use a constant in array declarations Replaces a variable array length with a constant, which is commonly used in the code. The change is asserted.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	79f6223631	Replace UserMappings with a connection ID This change replace UserMappings with newly introduced TSConnectionId object, which represent a pair of foreign server id and local user id. Authentication has been moved to non-password based, since original UserMappings were used to store a data node user passwords as well. This is a temporary step, until introduction of certificate based authentication. List of changes: * add_data_node() password and bootstrap_password arguments removed * introduced authentication using pgpass file * RemoteTxn format string which represents tx changed to tx-version-xid-server_id-user_id * data_node_dispatch, remote transaction cache, connection cache hash tables keys switched to TSConnectionId instead of user mappings * remote_connection_open() been rework to exclude user options * Tests upgraded, user mappings and passwords usage has been excluded	2020-05-27 17:31:09 +02:00
Erik Nordström	b07461ec00	Refactor and harden data node management This change refactors and hardens parts of data node management functionality. * A number of of permissions checks have been added to data node management functions. This includes checking that the user has proper permissions for both table and server objects. * Permissions checks are now done when creating remote chunks on data nodes. * The add_data_node() API function has been simplified and now returns more intuitive status about created objects (foreign server, database, extension). It is no longer necessary to specify a user to connect with as this is always assumed to be the current user. The bootstrap user can still be specified explicitly, however, as that user might require elevated permissions on the remote node to bootstrap. * Functions that capture exceptions without re-throwing, such as `ping_data_node()` and `get_user_mapping()`, have been refactored to not do this as the transaction state and memory contexts are not in states where it is safe to proceed as normal. * Data node management functions now consistently check that any foreign servers operated on are actually TimescaleDB server objects. * Tests now run with a superuser a regular user specific to clustering. These users have password auth enabled in `pg_hba.conf`, which is required by the connection library when connecting as a non-superuser. Tests have been refactored to bootstrap data nodes using these user roles.	2020-05-27 17:31:09 +02:00
Brian Rowe	79fb46456f	Rename server to data node The timescale clustering code so far has been written referring to the remote databases as 'servers'. This terminology is a bit overloaded, and in particular we don't enforce any network topology limitations that the term 'server' would suggest. In light of this we've decided to change to use the term 'node' when referring to the different databases in a distributed database. Specifically we refer to the frontend as an 'access node' and to the backends as 'data nodes', though we may omit the access or data qualifier where it's unambiguous. As the vast bulk of the code so far has been written for the case where there was a single access node, almost all instances of 'server' were references to data nodes. This change has updated the code to rename those instances.	2020-05-27 17:31:09 +02:00
Brian Rowe	dd3847a7e0	Rename files in preparation for large refactor This change includes the only rename changes required by the renaming of server to data node across the clustering codebase. This change is being committed separately from the bulk of the rename changes to prevent git from losing the file history of renamed files (merging the rename with extensive code modifications resulted in git treating some of the file moves as a file delete and new file creation).	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	a99ae05723	Cleanup PG version checks for multinode Since distributed hypertables will only be support on PG11 or greater, ensure that we do not compile multinode-related files on previous versions. Also raise appropriate errors when trying to invoke multinode-related functionality on versions prior to PG11.	2020-05-27 17:31:09 +02:00
niksa	86858e36e9	Support multiple async requests per connection The idea here is to allow multiple async requests to be created for the same connection. Since connection can process only one request at the time only that means that one request can be running and the rest needs to be deferred. The deferred async request will run on get response if the connection is not in use by running async request. This support should pave the way for async creation of cursors.	2020-05-27 17:31:09 +02:00
Erik Nordström	3ddbc386f0	Only support multinode on PG11 and greater Multinode-related APIs now raise errors when called any PostgreSQL version below 11, as these versions do not have the required features to support multinode or have different behavior. Raising errors at runtime on affected APIs is preferred over excluding these functions altogether. Having a different user-facing SQL API would severly complicate the upgrade process for the extension. A new CMake check has been added to disable multinode features on unsupported PostgreSQL versions. It also generates a macro in `config.h` that can be used in code to check for multinode support.	2020-05-27 17:31:09 +02:00
Erik Nordström	e2371558f7	Create chunks on remote servers This change ensures that chunk replicas are created on remote (datanode) servers whenever a chunk is created in a local distributed hypertable. Remote chunks are created using the `create_chunk()` function, which has been slightly refactored to allow specifying an explicit chunk table name. The one making the remote call also records the resulting remote chunk IDs in its `chunk_server` mappings table. Since remote command invokation without super-user permissions requires password authentication, the test configuration files have been updated to require password authentication for a cluster test user that is used in tests.	2020-05-27 17:31:09 +02:00
Erik Nordström	596be8cda1	Add mappings table for remote chunks A frontend node will now maintain mappings from a local chunk to the corresponding remote chunks in a `chunk_server` table. The frontend creates local chunks as foreign tables and adds entries to `chunk_server` for each chunk it creates on remote data node. Currently, the creation of remote chunks is not implemented, so a dummy chunk_id for the remote chunk will be added instead for testing purposes.	2020-05-27 17:31:09 +02:00
Erik Nordström	ae587c9964	Add API function for explicit chunk creation This adds an internal API function to create a chunk using explicit constraints (dimension slices). A function to export a chunk in a format consistent with the chunk creation function is also added. The chunk export/create functions are needed for distributed hypertables so that an access node can create chunks on data nodes according to its own (global) partitioning configuration.	2020-05-27 17:31:09 +02:00

34 Commits