timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-25 07:40:48 +08:00

Author	SHA1	Message	Date
Mats Kindahl	29ce1510a5	Allow extension on data node Before this commit, an existing extension would cause an error and abort the addition of a data node if `bootstrap` was `true`. To allow extension to already exist on the data node, this commit will first check if the extension exists on the data node. If the extension exists, it will be validated, otherwise the extension will be created on the data node.	2020-05-27 17:31:09 +02:00
niksa	66255eb5cb	Improve planner debug output To better understand choices that planner makes we need to print all the paths (with costs) that planner considered. Otherwise it might be hard to understand why certain path is not picked (eg. due to high startup/total costs) since it will never show up in relation path list that we print. This should help while working on improving distributed cost mode. This fix focuses only on paths that involve data nodes.	2020-05-27 17:31:09 +02:00
niksa	3bd1f914f1	Use qualified table name for chunks_in If a table contains a column with a same name as table name then query parser will get confused when parsing `chunks_in` function. The parser would think that we are passing in column instead of a table. Using qualified table name fixes this problem. Note that we needed to expand table using .* in order to avoid parser confusion caused by schema.table syntax.	2020-05-27 17:31:09 +02:00
Mats Kindahl	267a13ec98	Fix data node extension version check Currently, if the major version of the extension on the access node is later than the version of the extension on the data node, the data node is accepted. Since major versions are not compatible, it should not be accepted. Changed the check to only accept the data node if: - The major version is the same on the data node and the access node. - The minor version on the data node is same or earlier than than access node. In addition, the code will print a warning if the version on the data node is older than the version on the access node.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	11ef10332e	Add number of compressed hypertables to stat This change includes telemetry fixes which extends HypertablesStat with num_hypertables_compressed. It also updates the way how the number of regular hypertables is calculated, which is now treated as a non-compressed and not related to continuous aggregates.	2020-05-27 17:31:09 +02:00
Mats Kindahl	d5f5d92790	Refactor to add database representation Adding a `Database` structure to keep track of database name, collation, encoding, and character type.	2020-05-27 17:31:09 +02:00
niksa	aa327518d6	Row-by-row fetcher hardening Fix dangling pointers when closing async response results. Remove unnecessary data fetch call.	2020-05-27 17:31:09 +02:00
Mats Kindahl	fdc7138bda	Validate database on data node When a data node needs bootstrapping and the database to be bootstrapped already exists, it was blindly assumed to be configured correctly. With this commit, we validate the database if it already existed before proceeding and raise an error if it is not correctly configured. When validating the data node and bootstrap is `true`, we are connected to the `postgres` database rather than the database to validate. This means that we cannot use `current_database()` and instead pass the database name as a parameter to `data_node_validate_database`.	2020-05-27 17:31:09 +02:00
Mats Kindahl	c14948ad98	Propagate grants to data nodes Before this commit, grants and revokes where not propagated to data nodes. After this commit, grant and revokes on a distributed hypertable are propagated to the data nodes of the hypertable.	2020-05-27 17:31:09 +02:00
Mats Kindahl	96dd266a0b	Propagate grants when creating hypertables When creating a hypertable, grants were not propagated to the table on the remote node, which causes later statements to fail when not executed as the owner of the table. This commit deparse grant statements from the table definition and add the grants to the deparsed statement to send when creating the table on the data node.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	ef823a3060	Remove unnecessary check from distributed DDL Since NULL value for replication factor in SQL DDL corresponds to HYPERTABLE_REGULAR now, which is different from HYPERTABLE_DISTRIBUTED_MEMBER, there is no need to check for non-NULL value and comparing with HYPERTABLE_DISTRIBUTED_MEMBER is enough.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	6aec69f9c4	Rename exported test functions to follow convention Rename exported functions used in distributed tests to follow the convention of ts_ prefix, which was recently forced in non-distributed tests.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	78a5ba5bf2	Fix uninitialized warning in test help function Test code in remote_exec fails to build due to the maybe uninitialized error on 32-bit alpine package on a string variable. This fix moves initialization to the string variable declaration, refactors a loop to have a single place with exit condition, which checks for both NULL value and empty string.	2020-05-27 17:31:09 +02:00
Erik Nordström	5933c1785e	Avoid attaching data nodes without permissions This change will make `create_distributed_hypertable` attach only those data nodes that the creating user has USAGE permission on instead of trying to attach all data nodes (even when permissions do not allow it). This prevents an ERROR when a user tries to create a distributed hypertable and lacks USAGE on one or more data nodes. This new behavior only applies when no explicit set of data nodes is specified. When an explicit set is specified the behavior remains unchanged, i.e., USAGE is required on all the explicitly specified data nodes. Note that, with distributed hypertables, USAGE permission on a data node (foreign server) only governs the ability to attach it to a hypertable. This is analogous to the regular behavior of foreign servers where USAGE governs who can create foreign tables using the server. The actual usage of the server once associated with a table is not affected as that would break the table if permissions are revoked. The new behavior introduced with this change makes it simpler to use `create_distributed_hypertable` in a multi-user and multi-permission environment where users have different permissions on data nodes (DNs). For instance, imagine user A being allowed to attach DN1 and DN2, while user B can attach DN2 and DN3. Without this change, `create_distributed_hypertable` will always fail since it tries to attach DN1, DN2, and DN3 irrespective of the user that calls the function. Even worse, if only DN1 and DN2 existed initially, user A would be able to create distributed hypertables without errors, but, as soon as DN3 is added, `create_distributed_hypertable` would start to fail for user A. The only way to avoid the above described errors when creating distributed hypertables is for users to pass an explicit set of data nodes that only includes the data nodes they have USAGE on. If a user is forced to do that, the result would in any case be the same as that introduced with this change. Unfortunately, users currently have no way of knowing which data nodes they have USAGE on unless they query the PostgreSQL catalogs. In many cases, a user might not even care about the data nodes they are using if they aren't DBAs themselves. To summarize the new behavior: * If public (everyone) has USAGE on all data nodes, there is no change in behavior. * If a user has USAGE on a subset of data nodes, it will by default attach only those data nodes since it cannot attach the other ones anyway. * If the user specifies an explicit set of data nodes to attach, all of those nodes still require USAGE permissions. (Behavior unchanged.)	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	af6ca6e3ae	Fix test code to not leak and improve safety Fixes test help function remote_exec with: - Free always the memory allocated by open_memstream and skip printing no result. Fixes memory leak. - Replacing strtok with strtok_r to avoid risks of a data race.	2020-05-27 17:31:09 +02:00
Mats Kindahl	3cf11b444f	Change collation in data_node_bootstrap test The collation used in `data_node_bootstrap.sql` was not available for the installation used by the sanitizer, causing the sanitizer tests to fail. This commit change the collation and ctype to use types available on that platform.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	1afbac9ac8	Release 2.0.0-beta3 For beta releases, upgrading from an earlier version of the extension (including previous beta releases) is not supported. This release improves performance for queries executed on distributed hypertables, fixes minor issues and blocks few SQL API functions, which are not supported on distributed hypertables. It also adds information about distributed databases in the telemetry. 2.0.0-beta3	2020-05-27 17:31:09 +02:00
niksa	255db087ba	Remove remote estimates Remote estimation is increasing total query time since it involves additional round trips to data nodes. Since now we can more efficiently fetch chunk stats from data nodes (using get_chunk_relstats) - fetching and parsing remote estimates becomes obsolete.	2020-05-27 17:31:09 +02:00
niksa	02f7d7aa48	Fetch data using either cursor or row-by-row This change introduces two ways of fetching data from data nodes: one using cursors and another one using row-by-row mode. The major benefit of row-by-row mode is that it enables running parallel plans on data nodes. The default data fetcher uses row-by-row mode. A new GUC `timescaledb.remote_data_fetcher` has been added to enable switching between these two implementations (rowbyrow or cursor).	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	0a31e72540	Block tablespace api for a distributed hypertable	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	296d134a1e	Add telemetry to track distributed databases This change extends telemetry report and adds new 'distributed_db' section which includes following keys: 'distributed_member', 'data_nodes_count' and 'distributed_hypertables_count'.	2020-05-27 17:31:09 +02:00
Erik Nordström	af0a75f8fe	Compute basic data node rel stats from chunks Planning of a data node rel during a distributed query should use the accumulated stats from the individual chunks that the data node rel represents. Since the data node rel is not a real base rel (i.e., it doesn't correspond to a real relation) it doesn't have any statistics in the `pg_catalog` that can be used for planning. Thus, some functions, such as `set_baserel_size_estimates` will return strange estimates for data node rels when the planner believes it has stats (e.g., after an ANALYZE). This change fixes this issue by not relying on the planner to compute rel estimates for data node rels. Instead the accumulated estimates based on the chunks queried by the data node rel are used. This also obviates the need to compute these stats again. Given the new size estimates that this change enables, some plan/test outputs have changed and tests updated to deal with that.	2020-05-27 17:31:09 +02:00
Mats Kindahl	0050810803	Create data node with character set The access node and the data node need to have the same encoding, the same ctype, and the same collation, so the following changes where made when bootstrapping the database on the data node: - Explicity use `template0` - Explicitly set the encoding of the database on the access node - Explicitly set the `LC_CTYPE` of the database on the access node - Explicitly set the `LC_COLLATE` to the collation of the database on the access node When not bootstrapping, it is checked that the encoding, `LC_CTYPE`, and collation matches what the access node is using.	2020-05-27 17:31:09 +02:00
Mats Kindahl	d31135e2cc	Add reloptkind to debug output This commit adds the reloptkind to the debug optimizer printout so that we can see the kind of relation that is being built.	2020-05-27 17:31:09 +02:00
Erik Nordström	97cbaf55a3	Fix variuos compiler warnings and nits This change fixes a number of compiler/code warnings.	2020-05-27 17:31:09 +02:00
Erik Nordström	f923a8ab8e	Prepare for next development cycle This change prepares the repo for developing the 2.0.0-beta3. The CMake configuration required an update to handle both a beta and dev suffix in the version string.	2020-05-27 17:31:09 +02:00
Erik Nordström	57bc2e6fe8	Release 2.0.0-beta2 For beta releases, upgrading from an earlier version of the extension (including previous beta releases) is not supported. This release introduces distributed hypertables, a major new feature that allows hypertables to scale out across multiple nodes for increased performance and fault tolerance. Please review the documentation to learn how to configure and use distributed hypertables, including current limitations. 2.0.0-beta2	2020-05-27 17:31:09 +02:00
Erik Nordström	7ff4d4b4e3	Fix push down when hitting only one node For some queries, with only one node "hit', push down can be forced by matching the partitioning attributes with the group by expression. However, push down didn't happen in some of these cases because of a bug in `force_group_by_push_down` that didn't properly update the number of attributes in the partitioning expression. This change fixes the bug to always set the number of partitioning attributes to match the group by clause.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	7e198c8432	Prevent unexpected retry in bgw test Set max retries to 0 for the drop chunks background job in the background worker reorder drop chunk test, so it will not do a retry, which is not expected by the test, and thus the test passes on ARM.	2020-05-27 17:31:09 +02:00
Erik Nordström	7ef5e2e21c	Fix output row estimation for ordered upper rels The number of output rows estimated for "remote" upper rels could sometimes erroneously be zero. This happened when computing the estimate for upper rels with different pathkeys: in case of several different path keys the estimation was not recalculated and instead relied on cached values from the first calculation on the same rel. However, the number of output rows was never cached so the second pathkey estimated for the upper rel would always produce zero output rows. This has now been fixed by storing the output rows in the upper rel after the first estimation. This fix affects some query plans so a number of tests are affected.	2020-05-27 17:31:09 +02:00
Mats Kindahl	f0d69aa0eb	Don't assert on connection loss If `remote_txn_check_for_leaked_prepared_statements` do not have a working connection, it will abort by crashing the server. Since there are tests that kill the remote server in different phases of the 2PC, bad timing might cause the server to crash rather than generate an error. This commit replace the assertion with both a status check and a check that the correct number of rows and columns are returned and will generate an status message with the error message, if any.	2020-05-27 17:31:09 +02:00
Erik Nordström	6a9db8a621	Add function to fetch remote chunk relation stats A new function, `get_chunk_relstats()`, allows fetching relstats (basically `pg_class.{relpages,reltuples`) from remote chunks on data nodes and writing it to the `pg_class` entry for the corresponding local chunk. The function expects either a chunk or a hypertable as input and returns the relstats for the given chunk or all chunks for the given hypertable, respectively. Importing relstats as described is useful as part of a distributed ANALYZE/VACUUM that won't require fetching all data into the access node for local sampling (like the current implemention does). In a future change, this function will be called as part of a local ANALYZE on the access node that runs ANALYZE on all data nodes followed by importing of the resulting relstats for the analyzed chunks.	2020-05-27 17:31:09 +02:00
Mats Kindahl	3a35b984f8	Remove Postgres FDW from test Removing usage of Postgres FDW from the test `partitionwise_distributed` since that triggers a memory error in memory checkers.	2020-05-27 17:31:09 +02:00
Mats Kindahl	34b5ef4d18	Add debug printout for optimizer Debug printouts are added at two locations: * Inside `get_foreign_upper_paths` a notice with the existing paths is send back together with the stage if the `show_upper` is set to show the stage. * Inside `set_rel_pathlist` a notice is printed with the existing paths if the `show_rel` debug flag is set. The debug printouts are sent back to the client as debug messages (`DEBUG2`), allowing developers to quickly experiment and interactively see what effect different statements has with the respect to the paths generated. In addition, the handling of `debug_optimizer_flag` was not correct and is fixed here. * If no `=` was provided to `show_upper`, it triggered a segmentation fault. * If the flag was reset, the internal data structure was not updated. * If just one flag was updated in a `SET` command, the other flag was kept intact.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	a8ca09b307	Support VACUUM on distributed hypertables This patch upgrades existing dist_cmd functions to support executing commands which cannot be run inside an active transaction, such as the VACUUM command.	2020-05-27 17:31:09 +02:00
Dmitry Simonenko	2b1b1bdf87	Show NOTICE message on a distributed DROP DATABASE This patch adds a way to check and print a notice message to a user who wants to drop an access node database. Since database drop can only be done using different database connection, this functionality is implemented inside loader extension. Functionality of the security labels are used in order to mark a distributed database and make this information accessible by other databases in pg_shseclabel table.	2020-05-27 17:31:09 +02:00
Brian Rowe	5910f68305	Improve pushdown handling of time functions This change allows certain mutable functions to be whitelisted so that they can be safely pushed down to data nodes. Additionally, this change will no longer prevent queries containing the `now` function from being pushed down to data nodes, but will instead replace the function call with the transaction start time (which is the same value which would be used had the query been run solely on the access node).	2020-05-27 17:31:09 +02:00
Erik Nordström	a443fe5ba9	Fix int to datum conversion in connection tests This fixes a problem in the connection tests when converting ints to datums on certain platforms (e.g., ARM). The appropriate datum conversion macros weren't used when returning datum results, which caused errors on ARM platforms.	2020-05-27 17:31:09 +02:00
Mats Kindahl	c2366ece59	Don't clear dist_uuid in delete_data_node When deleting a data node it currently clear the `dist_uuid` in the database on the data node, which require it to be able to connect to the data node and would also mean that it is possible to re-add the data node to a new cluster without checking that it is in a consistent state. This commit remove the code that clear the `dist_uuid` and hence do not need to connect to the data nodel. All tests are updated to reflect the fact that no connection will be made to the data node and that the `dist_uuid` is not cleared.	2020-05-27 17:31:09 +02:00
Mats Kindahl	5fd363da4c	Disable warnings for remote_txn For the tests that are using the `remote_node_set_kill_event` function to perform a hard termination and precise stages of the 2PC the connection will close with different warning messages depending on the timing of connection close and SSL close. To avoid a flaky test, this commit set `client_min_messages` to `ERROR` for the duration of the transaction. Since checks are done after the transaction that the transaction is properly rolled back in the event of a crash, the warning messages does not offer any additional benefits.	2020-05-27 17:31:09 +02:00
Mats Kindahl	ad8b70ac12	Add debug_optimizer_flags GUC option Add a new `debug_optimizer_flags` option where you can request optimization debug notices. With this commit, the flags `show_upper` and `show_rel` are added. The flag `show_upper` will permit sending back a message with the resulting relation after creating upper paths. Since this is called at different stages, the flag support setting specific stages where printouts should be done using the format `show_upper=window,final`. The flag `show_rel` will permit sending back the resulting relation after executing `set_rel_pathlist` to add new paths for consideration. The actual implementation to sent the notices will be in a separate commit.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	4e004c5564	Unify to use a constant in array declarations Replaces a variable array length with a constant, which is commonly used in the code. The change is asserted.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	e76d450715	Fix memory access bug in extension check The fix copies result of remote call to obtain the owner and returns the copied value to the caller.	2020-05-27 17:31:09 +02:00
Ruslan Fomkin	5cdef03880	Fix bug in allow or block new chunks Process arguments of data node allow or block new chunks SQL API functions separately, since the number of optional arguments is different between allow and block functions. This fixes the bug with memory access.	2020-05-27 17:31:09 +02:00
Erik Nordström	380103080d	Remove collation checks for foreign expressions This change removes collation checks for foreign expressions. With distributed hypertables, we assume that all participating nodes have the same collation configuration so there should be no difference (from a collation perspective) between executing an expression locally and remotely.	2020-05-27 17:31:09 +02:00
niksa	cfc72be01d	Show explain from data nodes We want to get more insights about what plans are executed on data nodes. If a user runs explain with verbose option we will connect to each data node, run explain on data node and print output together with existing explain output.	2020-05-27 17:31:09 +02:00
niksa	b01cc6ef1b	Fix hard crash when aborting transaction If something is wrong with a connection we would not be able to start a transaction on the remote node. So when abort happens we shoud not try to abort a transaction that hasn't started. We use xact_depth to check if transaction actually started. If we find one we remove it from our tx cache which should result in removing the invalid connection as well.	2020-05-27 17:31:09 +02:00
niksa	27f3effcb1	Fix Result node pruning in AsyncAppend The new custom AsyncAppend plan was not set for the case when we remove a Result node.	2020-05-27 17:31:09 +02:00
Mats Kindahl	ac7456bdf1	Add ENABLE_OPTIMIZER_DEBUG option In order to debug the optimizer it is necessary to provide the `OPTIMIZER_DEBUG` preprocessor symbol, so added an option to enable this for the code. It still requires the PostgreSQL source code to be built with this flag.	2020-05-27 17:31:09 +02:00
Mats Kindahl	caf9ea80f9	Some minor refactorings in data_node.c There is a read of the server identifier into `server_id` in `add_data_node_internal` which is not subsequently used, so it was removed.	2020-05-27 17:31:09 +02:00

1 2 3 4 5 ...

2106 Commits