timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-24 06:53:59 +08:00

Author	SHA1	Message	Date
David Kohn	66544c7564	Reset restoring gucs rather than explicitly setting 'off' Setting the `timescaledb.restoring` guc explicitly to 'off' for the db meant that the setting got exported in `pg_dumpall` and some other cases where that setting would then conflict with the setting set by the pre_restore function causing it to be overridden and causing errors on restore. This changes to `RESET` so that instead it will take the system default and not be dumped separately as an override.	2020-06-22 11:43:27 -04:00
gayyappan	b93b30b0c2	Add counts to compression statistics Store information related to compressed and uncompressed row counts after compressing a chunk. This is saved in compression_chunk_size table.	2020-06-19 15:58:04 -04:00
Mats Kindahl	a089843ffd	Make table mandatory for drop_chunks The `drop_chunks` function is refactored to make table name mandatory for the function. As a result, the function was also refactored to accept the `regclass` type instead of table name plus schema name and the parameters were reordered to match the order for `show_chunks`. The commit also refactor the code to pass the hypertable structure between internal functions rather than the hypertable relid and moving error checks to the PostgreSQL function. This allow the internal functions to avoid some lookups and use the information in the structure directly and also give errors earlier instead of first dropping chunks and then error and roll back the transaction.	2020-06-17 06:56:50 +02:00
Erik Nordström	9d533f31c2	Improve connection handling during transactions This change refactors how connections are handled during remote transactions. In particular, the connection cache now stays consistent during transactions, even during rollbacks. Previously, the connection cache was replaced on every rollback, even if the rollback was intentional (i.e, not due to an error). This made it hard to debug connections since the cache became completely empty. Connections could also be left in the cache in a bad state after failed transactions. This has been fixed by moving connection checks to the cache and tying transaction state changes to each connection. This ensures that such checks are done in one canonical place instead of being spread out throughout the code. Given how tightly coupled a remote transaction is with its connection, it might make sense to remove the separate remote transaction store and instead put this information in each connection. This is left to a future change, however. In addition to the above changes, this commit includes: * Showing transaction depth and invalidation in the transaction store * Invalidation on individual connections instead of replacing the whole cache * Closing of connections to a local database that is being dropped to prevent "in use" errors. * Ability to add callbacks to async requests that are executed when a response is received. This is used by remote transactions to mark connections as having successfully completed a transaction. Thus, on errors, it is easy to detect connections that are in bad states. * Error checks on each connection instead of having global error tracking for each remote transaction. This change removes the global error state for distributed transactions.	2020-06-13 12:05:41 +02:00
Erik Nordström	31d5254c2e	Add internal function to show connection cache The connection cache for remote transactions can now be examined using a function that shows all connections in the cache. This allows easier debugging and validation both in tests and on live systems. In particular, we'd like to know that connections are in good state post commit or rollback and that we don't leave bad connections in the cache. The remote transaction test (`remote_txn`) has been updated to show the connection cache as remote transactions are executed. Unfortunately, the whole cache is replaced on every (sub-)transaction rollback, which makes it hard to debug the connection state of a particular remote transaction. Further, some connections are left in the cache in a bad state after, e.g., connection loss. These issues will be fixed with an upcoming change.	2020-06-13 12:05:41 +02:00
Sven Klemm	c39989bca9	Remove check for PG 10 in update script generation Since PG 9.6 is no longer supported version not less than 10 is always true now and this check can be removed and remote_txn.sql can always be added.	2020-06-05 13:34:43 +02:00
Sven Klemm	36d43503c1	Change update script generation to not use scratch files This patch changes the update script generation to not use scratch files and removes the sql fragments to set and unset the post_update_stage from CMakeLists.txt and puts them into dedicated files.	2020-06-04 15:05:31 +02:00
Sven Klemm	663463771b	Use EXECUTE FUNCTION instead of EXECUTE PROCEDURE Replace EXECUTE PROCEDURE with EXECUTE FUNCTION because the former is deprecated in PG11+. Unfortunately some test output will still have EXECUTE PROCEDURE because pg_get_triggerdef in PG11 still generates a definition with EXECUTE PROCEDURE.	2020-06-02 17:33:05 +02:00
Mats Kindahl	92b6c03e43	Remove cascade option from drop_chunks This commit removes the `cascade` option from the function `drop_chunks` and `add_drop_chunk_policy`, which will now never cascade drops to dependent objects. The tests are fixed accordingly and verbosity turned up to ensure that the dependent objects are printed in the error details.	2020-06-02 16:08:51 +02:00
Ruslan Fomkin	effdc478ae	Check replication factor for exceeding data nodes set_replication_factor will check if the replication factor is bigger than the amount of attached data nodes. It returns an error in such case.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	c44a202576	Implement altering replication factor Implements SQL function set_replication_factor, which changes replication factor of a distributed hypertable. The change of the replication factor doesn't affect existing chunks. Newly created chunks are replicated according to new replication factor.	2020-05-27 17:31:09 +02:00
Brian Rowe	d49e9a5739	Add repartition option on detach/delete_data_node This change adds a new parameter to the detach_data_node and delete_data_node functions that will allow the user to automatically shrink their space dimension to match the number of nodes.	2020-05-27 17:31:09 +02:00
Brian Rowe	fad33fe954	Collect column stats for distributed tables. This change adds a new command to return a subset of the column stats for a hypertable (column width, percent null, and percent distinct). As part of the execution of this command on an access node, these stats will be collected for distributed chunks and updated on the access node.	2020-05-27 17:31:09 +02:00
Mats Kindahl	222bf75910	Use template1 as secondary connection database The `postgres` database might not exists on a data node, but `template1` will always exist so if a connection using `postgres` fails, we use `template1` as a secondary database. This is similar to how `connectMaintenanceDatabase` in the PostgreSQL code base works.	2020-05-27 17:31:09 +02:00
Erik Nordström	6a9db8a621	Add function to fetch remote chunk relation stats A new function, `get_chunk_relstats()`, allows fetching relstats (basically `pg_class.{relpages,reltuples`) from remote chunks on data nodes and writing it to the `pg_class` entry for the corresponding local chunk. The function expects either a chunk or a hypertable as input and returns the relstats for the given chunk or all chunks for the given hypertable, respectively. Importing relstats as described is useful as part of a distributed ANALYZE/VACUUM that won't require fetching all data into the access node for local sampling (like the current implemention does). In a future change, this function will be called as part of a local ANALYZE on the access node that runs ANALYZE on all data nodes followed by importing of the resulting relstats for the analyzed chunks.	2020-05-27 17:31:09 +02:00
Mats Kindahl	c2366ece59	Don't clear dist_uuid in delete_data_node When deleting a data node it currently clear the `dist_uuid` in the database on the data node, which require it to be able to connect to the data node and would also mean that it is possible to re-add the data node to a new cluster without checking that it is in a consistent state. This commit remove the code that clear the `dist_uuid` and hence do not need to connect to the data nodel. All tests are updated to reflect the fact that no connection will be made to the data node and that the `dist_uuid` is not cleared.	2020-05-27 17:31:09 +02:00
niksa	94979412ef	Fix chunks_in function declaration We need to mark this function as stable and parallel safe so the planner can pick the most optimal plan.	2020-05-27 17:31:09 +02:00
Mats Kindahl	0d71f952f8	Add bootstrap option to add_data_node When the access node executes `add_data_node`, bootstrapping the data node is done by: 1. Optionally creating the database on the remote server. 2. Creating a schema for the TimescaleDB extension objects. 3. Creating the TimescaleDB extension in the database. After bootstrapping, the `dist_uuid` of the data node and access node is set to the `uuid` of the access node. If `bootstrap` is `true`, bootstrapping of the data node is done. If `boostrap` is `false`, bootstrapping is not done, but the procedure attempts to connect to the database and verify that the TimescaleDB extension is loaded and that the `dist_uuid` is clear. If it is not possible to connect to the database, or if `dist_uuid` is set, `add_data_node` will fail.	2020-05-27 17:31:09 +02:00
Erik Nordström	7f3bc09eb6	Generalize deparsing of remote function calls Certain functions invoked on an access node need to be passed on to data nodes to ensure any mutations happen also on those nodes. Examples of such functions are `drop_chunks`, `add_dimension`, `set_chunk_time_interval`, etc. So far, the approach has been to deparse these "manually" on a case-by-case basis. This change implements a generalized deparsing function that deparses the function based on the function call info (`FunctionCallInfo`) that holds the information about any invoked function that can be used to deparse the function call. The `drop_chunks` function has been updated to use this generalized deparsing functionality when it is invoking remote nodes.	2020-05-27 17:31:09 +02:00
Mats Kindahl	8145d75c3f	Remove bootstrap_user from add_data_node This commit changes so that the same user is used both on the access node and the data nodes when executing a `add_data_node`, which means that the `bootstrap_user` parameter is removed. Since most tests assume that you can pass a separate user with superuser privileges to `add_data_node`, this affected a lot of tests.	2020-05-27 17:31:09 +02:00
Mats Kindahl	6e9f644714	Require host parameter in add_data_node Change `add_data_node` so that host parameter is required. If the host parameter is not provided, or is `NULL`, an error will be printed. Also change logic for how the default value for `port` is picked. Now it will by default use the port given in the configuration file. The commit update all the result files, add the `host` parameter to all calls of `add_data_node` and add a few tests to check that an error is given when `host` is not provided.	2020-05-27 17:31:09 +02:00
Mats Kindahl	33923548c7	Remove cascade option from delete_data_node The `cascade` option was added earlier since it was necessary to allow cascading the delete of user mappings when removing the server objects. Since the user mappings are removed from the code, the `cascade` option is not needed any more. This commit remove the option and fix all the tests.	2020-05-27 17:31:09 +02:00
Mats Kindahl	77776faf20	Fix port usage for add_data_node() For a statement which only specify the database, we expect the data node to be created on the same Postgres instance as the one where the statement is executed. SELECT * FROM add_data_node('data1', database => 'base1'); However, if the port for the server is changed in the configuration file to not use the default port, the command will try to connect to the wrong Postgres server, namly the one listening on port 5432. This commit fixes this by letting `host` and `port` parameters be NULL by default and use the following logic to decide what port should be used. - If a port is explicitly provided, use that. - If a port is not provided but a host is provided, it is assumed that the intention is to connect to a default-installed Postgres server on a different address, so use the default Postgres port (5432). - If neither port nor host is provided, it assumed that the intention is to connect to the same server as where the command is executed, so use the port that was written in the configuration file. The default host to use is still 'localhost', but it is not written explicitly in the function definition in `ddl_api.sql`. The commit also fixes one warning where an uninitialized variable could be used.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	c8563b2d46	Add distributed_exec() function This function allows users to execute a SQL query on a list of data nodes. The purpose is to provide users a way to, e.g., create roles on data nodes. The current implementation is quite straightforward. Just execute any provided query on a list of data nodes. The query will execute with the current user role. The function does not return or print any result values. In case of error, it will print the data node name and a related error message.	2020-05-27 17:31:09 +02:00
Brian Rowe	a50db32c18	Check data node for valid postgres version This change will check if the postgres version of the data node is 11 or greater during the add_data_node call. It will also now print a more meaningful error message if the data node validation fails.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	79f6223631	Replace UserMappings with a connection ID This change replace UserMappings with newly introduced TSConnectionId object, which represent a pair of foreign server id and local user id. Authentication has been moved to non-password based, since original UserMappings were used to store a data node user passwords as well. This is a temporary step, until introduction of certificate based authentication. List of changes: * add_data_node() password and bootstrap_password arguments removed * introduced authentication using pgpass file * RemoteTxn format string which represents tx changed to tx-version-xid-server_id-user_id * data_node_dispatch, remote transaction cache, connection cache hash tables keys switched to TSConnectionId instead of user mappings * remote_connection_open() been rework to exclude user options * Tests upgraded, user mappings and passwords usage has been excluded	2020-05-27 17:31:09 +02:00
Brian Rowe	31953f0dc6	Verify configuration before adding data node This change will call a function on a remote database to validate its configuration before following through with an add_data_node call. Right now the check will ensure that the data is able to use prepared transactions, but further checks can be easily added in the future. Since this uses the timescaledb extension to validate the remote database, it runs at the end of bootstrapping. We may want to consider adding code to undo our bootstrapping changes if this check fails.	2020-05-27 17:31:09 +02:00
Brian Rowe	3d3824dbc1	Fix some issues with num_dist_tables This change fixes a couple issues with the num_dist_tables column of the timescaledb_information.data_node view. The first fix will allow the column to correctly report 0 when no tables are yet created (it currently will count a NULL table as 1 in this case). The second fix addresses a bug in the dist_util_remote_hypertable_info function which was causing the code to only see the first hypertable returned. This second bug will also cause incorrect results for many of our usage reporting views and utilities when there are multiple distributed hypertables.	2020-05-27 17:31:09 +02:00
Mats Kindahl	ac3f0bcb92	Change order of parameters in attach_data_node All data node functions except `attach_data_node` take the node name as the first parameter. This commit changes the order of the two first parameters to `attach_data_node` so that the node name is the first parameter and the hypertable is the second parameter.	2020-05-27 17:31:09 +02:00
Erik Nordström	5309cd6c5f	Repartition hypertables when attaching data node Distributed hypertables are now repartitioned when attaching new data nodes and the current number of partition (slices) in the first closed (space) dimension is less than the number of data nodes. Increasing the number of partitions is necessary to make use of a newly attached data node. However, repartitioning is optional and can be avoided via a boolean parameter in `attach_server()`. In addition to the above repartitioning, this change also adds informational messages to `create_hypertable` and `set_number_partitions` to raise awareness of situations when the number of partitions in the space dimensions is lower than the number of attached data nodes.	2020-05-27 17:31:09 +02:00
Erik Nordström	9108ddad15	Fix corner cases when detaching data nodes This change fixes the following: * Refactor the code for setting the default data node for a chunk. The `set_chunk_default_data_node()` API function now takes a `regclass`/`oid` instead of separate schema + table names and returns `true` when a new data node is set and `false` if called with a data node that is already the default. Like before, exceptions are thrown on errors. It also does proper permissions checks. The related code has been moved from `data_node.c` to `chunk.c` since this is an operation on a chunk, and the code now also lives in the `tsl` directory since this is non-trivial logic that should fall under the TSL license. * When setting the default data node on a chunk (failing over to another data node), it is now verified that the new data node actually has a replica of the chunk and that the corresponding foreign server belongs to the "right" foreign data wrapper. * Error messages and permissions handling have been tweaked.	2020-05-27 17:31:09 +02:00
Erik Nordström	b07461ec00	Refactor and harden data node management This change refactors and hardens parts of data node management functionality. * A number of of permissions checks have been added to data node management functions. This includes checking that the user has proper permissions for both table and server objects. * Permissions checks are now done when creating remote chunks on data nodes. * The add_data_node() API function has been simplified and now returns more intuitive status about created objects (foreign server, database, extension). It is no longer necessary to specify a user to connect with as this is always assumed to be the current user. The bootstrap user can still be specified explicitly, however, as that user might require elevated permissions on the remote node to bootstrap. * Functions that capture exceptions without re-throwing, such as `ping_data_node()` and `get_user_mapping()`, have been refactored to not do this as the transaction state and memory contexts are not in states where it is safe to proceed as normal. * Data node management functions now consistently check that any foreign servers operated on are actually TimescaleDB server objects. * Tests now run with a superuser a regular user specific to clustering. These users have password auth enabled in `pg_hba.conf`, which is required by the connection library when connecting as a non-superuser. Tests have been refactored to bootstrap data nodes using these user roles.	2020-05-27 17:31:09 +02:00
Brian Rowe	79fb46456f	Rename server to data node The timescale clustering code so far has been written referring to the remote databases as 'servers'. This terminology is a bit overloaded, and in particular we don't enforce any network topology limitations that the term 'server' would suggest. In light of this we've decided to change to use the term 'node' when referring to the different databases in a distributed database. Specifically we refer to the frontend as an 'access node' and to the backends as 'data nodes', though we may omit the access or data qualifier where it's unambiguous. As the vast bulk of the code so far has been written for the case where there was a single access node, almost all instances of 'server' were references to data nodes. This change has updated the code to rename those instances.	2020-05-27 17:31:09 +02:00
Brian Rowe	dd3847a7e0	Rename files in preparation for large refactor This change includes the only rename changes required by the renaming of server to data node across the clustering codebase. This change is being committed separately from the bulk of the rename changes to prevent git from losing the file history of renamed files (merging the rename with extensive code modifications resulted in git treating some of the file moves as a file delete and new file creation).	2020-05-27 17:31:09 +02:00
Brian Rowe	e110a42a2b	Add space usage utilities to distributed database This change adds a new utility function for postgres `server_hypertable_info`. This function will contact a provided node and pull down the space information for all the distributed hypertables on that node. Additionally, a new view `distributed_server_info` has been added to timescaledb_information. This view leverages the new remote_hypertable_data function to display a list of nodes, along with counts of tables, chunks, and total bytes used by distributed data. Finally, this change also adds a `hypertable_server_relation_size` function, which, given the name of a distributed hypertable, will print the space information for that hypertable on each node of the distributed database.	2020-05-27 17:31:09 +02:00
niksa	0da34e840e	Fix server detach/delete corner cases Prevent server delete if the server contains data, unless user specifies `force => true`. In case the server is the only data replica, we don't allow delete/detach unless table/chunks are dropped. The idea is to have the same semantics for delete as for detach since delete actually calls detach We also try to update pg_foreign_table when we delete server if there is another server containing the same chunk. An internal function is added to enable updating foreign table server which might be useful in some cases since foreign table server is considered a default server for that particular chunk. Since this command needs to work even if the server we're trying to remove is non responsive, we're not removing any data on the remote data node.	2020-05-27 17:31:09 +02:00
niksa	2fd99c6f4b	Block new chunks on data nodes This functionality enables users to block or allow creation of new chunks on a data node for one or more hypertables. Use cases for this include the ability to block new chunks when a data node is running low on disk space or to affect chunk distribution across data nodes. Sometimes blocking data nodes for new chunks can make a hypertable under-replicated. For that case an additional argument `force => true` can be supplied to force blocking new chunks. Here are some examples. Block for a specific hypertable: `SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');` Block for all hypertables on the server: `SELECT * FROM block_new_chunks_on_server('server_1', force =>true);` Unblock: `SELECT * FROM allow_new_chunks_on_server('server_1', true);` This change adds the `force` argument to `detach_server` as well. If detaching or blocking new chunks will make a hypertable under-replicated then `force => true` needs to used.	2020-05-27 17:31:09 +02:00
niksa	d8d13d9475	Allow detaching servers from hypertables A server can now be detached from one or more distributed hypertables so that it no longer in use. We only allow detaching a server if there is no data on the server and detaching it doesn't risk making a hypertable under-replicated. A user can detach a server for a specific hypertable, or for all hypertables to which the server is attached. `SELECT * FROM detach_server('server1', 'my_hypertable');` `SELECT * FROM detach_server('server2');`	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	96727fa5c4	Add support for distributed peer ID This change makes it possible for a data node to distinguish between regular client connections and distributed database connections (from the access node). This functionality will be needed for decision making based on the connection type, for example allow or block a DDL commands on a data node.	2020-05-27 17:31:09 +02:00
Brian Rowe	59e3d7f1bd	Add create_distributed_hypertable command This change adds a variant of the create_hypertable command that will ensure the created table is distributed.	2020-05-27 17:31:09 +02:00
niksa	6f3848e744	Add function to check server liveness Try connecting to a server and running `SELECT 1`. It returns true if succeed. If fails false is returned. There can be many reasons to fail: no valid UserMapping, server is down or failed running `SELECT 1`. More information about failure is written to server log. `timescaledb_information.server` view is updated to show server status.	2020-05-27 17:31:09 +02:00
Brian Rowe	5c643e0ac4	Add distributed group id and enforce topology This change adds a distributed database id to the installation data for a database. It also provides a number of utilities that can be used for getting/setting/clearing this value or using it to determing if a database is a frontend, backend, or not a member of distributed database. This change also includes modifications to the add_server and delete_server functions to check the distributed id to ensure the operation is allowed, and then update or clear it appropriately. After this changes it will no longer be possible to add a database as a backend to multiple frontend databases, nor will it be possible to add a frontend database as a backend to any other database.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	11aab55094	Add support for basic distributed DDL This is straightforward implementation which allows to execute limited set of DDL commands on distributed hypertable.	2020-05-27 17:31:09 +02:00
Brian Rowe	b1c6172d0a	Add attach_server function This adds an attach_server function which is used to associate a server with an existing hypertable.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	d8982c3e15	Add add_server() support for remote server bootstrapping This patch adds functionality for automatic database and extension creation on remote server. New function arguments: bootstrap_database, bootstrap_user and bootstrap_password.	2020-05-27 17:31:09 +02:00
Matvey Arye	e7ba327f4c	Add resolve and heal infrastructure for 2PC This commit adds the ability to resolve whether or not 2PC transactions have been committed or aborted and also adds a heal function to resolve transactions that have been prepared but not committed or rolled back. This commit also removes the server id of the primary key on the remote_txn table and adds another index. This was done because the `remote_txn_persistent_record_exists` should not rely on the server being contacted but should rather just check for the existance of the id. This makes the resolution safe to setups where two frontend server definitions point to the same database. While this may not be a properly configured setup, it's better if the resolution process is robust to this case.	2020-05-27 17:31:09 +02:00
Matvey Arye	0e109d209d	Add tables for saving 2pc persistent records The remote_txn table records commit decisions for 2pc transactions. A successful 2pc transaction will have one row per remote connection recorded in this table. In effect it is a mapping between the distributed transaction and an identifier for each remote connection. The records are needed to protect against crashes after a frontend send a `COMMIT TRANSACTION` to one node but not all nodes involved in the transaction. Towards this end, the commitment of remote_txn rows represent a crash-safe irrevocable promise that all participating datanodes will eventually get a `COMMIT TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`. The irrevocable nature of the commit of these records means that this can only happen after the system is sure all participating transactions will succeed. Thus it can only happen after all datanodes have succeeded on a `PREPARE TRANSACTION` and will happen as part of the frontend's transaction commit..	2020-05-27 17:31:09 +02:00
Erik Nordström	e2371558f7	Create chunks on remote servers This change ensures that chunk replicas are created on remote (datanode) servers whenever a chunk is created in a local distributed hypertable. Remote chunks are created using the `create_chunk()` function, which has been slightly refactored to allow specifying an explicit chunk table name. The one making the remote call also records the resulting remote chunk IDs in its `chunk_server` mappings table. Since remote command invokation without super-user permissions requires password authentication, the test configuration files have been updated to require password authentication for a cluster test user that is used in tests.	2020-05-27 17:31:09 +02:00
Erik Nordström	125f793307	Add password parameter to add_server() Establishing a remote connection requires a password, unless the connection is made as a superuser. Therefore, this change adds the option to specify a password in the `add_server()` command. This is a required parameter unless called as a superuser.	2020-05-27 17:31:09 +02:00
Matvey Arye	3779af400d	Change license header to new format in SQL files The license header for SQL test files has been updated, but some tests haven't had this new header applied. This change makes sure the new header is applied to all test files.	2020-05-27 17:31:09 +02:00

1 2 3 4 5 ...

456 Commits