Before this commit, an existing extension would cause an error and
abort the addition of a data node if `bootstrap` was `true`. To allow
extension to already exist on the data node, this commit will first
check if the extension exists on the data node. If the extension
exists, it will be validated, otherwise the extension will be created
on the data node.
To better understand choices that planner makes we need to print
all the paths (with costs) that planner considered. Otherwise it
might be hard to understand why certain path is not picked (eg. due
to high startup/total costs) since it will never show up in relation
path list that we print. This should help while working on improving
distributed cost mode. This fix focuses only on paths that involve
data nodes.
If a table contains a column with a same name as table name then query
parser will get confused when parsing `chunks_in` function. The parser
would think that we are passing in column instead of a table. Using
qualified table name fixes this problem. Note that we needed to expand
table using .* in order to avoid parser confusion caused by
schema.table syntax.
Currently, if the major version of the extension on the access node is
later than the version of the extension on the data node, the data node
is accepted. Since major versions are not compatible, it should not be
accepted.
Changed the check to only accept the data node if:
- The major version is the same on the data node and the access node.
- The minor version on the data node is same or earlier than than
access node.
In addition, the code will print a warning if the version on the data
node is older than the version on the access node.
This change includes telemetry fixes which extends HypertablesStat
with num_hypertables_compressed. It also updates the way how the
number of regular hypertables is calculated, which is now treated as a
non-compressed and not related to continuous aggregates.
When a data node needs bootstrapping and the database to be
bootstrapped already exists, it was blindly assumed to be configured
correctly. With this commit, we validate the database if it already
existed before proceeding and raise an error if it is not correctly
configured.
When validating the data node and bootstrap is `true`, we are connected
to the `postgres` database rather than the database to validate.
This means that we cannot use `current_database()` and instead pass the
database name as a parameter to `data_node_validate_database`.
Before this commit, grants and revokes where not propagated to data
nodes. After this commit, grant and revokes on a distributed hypertable
are propagated to the data nodes of the hypertable.
When creating a hypertable, grants were not propagated to the table on
the remote node, which causes later statements to fail when not
executed as the owner of the table.
This commit deparse grant statements from the table definition and add
the grants to the deparsed statement to send when creating the table on
the data node.
Since NULL value for replication factor in SQL DDL corresponds to
HYPERTABLE_REGULAR now, which is different from
HYPERTABLE_DISTRIBUTED_MEMBER, there is no need to check for non-NULL
value and comparing with HYPERTABLE_DISTRIBUTED_MEMBER is enough.
Test code in remote_exec fails to build due to the maybe uninitialized
error on 32-bit alpine package on a string variable. This fix moves
initialization to the string variable declaration, refactors a loop to
have a single place with exit condition, which checks for both NULL
value and empty string.
This change will make `create_distributed_hypertable` attach only
those data nodes that the creating user has USAGE permission on
instead of trying to attach all data nodes (even when permissions do
not allow it). This prevents an ERROR when a user tries to create a
distributed hypertable and lacks USAGE on one or more data nodes. This
new behavior *only* applies when no explicit set of data nodes is
specified. When an explicit set is specified the behavior remains
unchanged, i.e., USAGE is required on all the explicitly specified
data nodes.
Note that, with distributed hypertables, USAGE permission on a data
node (foreign server) only governs the ability to attach it to a
hypertable. This is analogous to the regular behavior of foreign
servers where USAGE governs who can create foreign tables using the
server. The actual usage of the server once associated with a table is
not affected as that would break the table if permissions are revoked.
The new behavior introduced with this change makes it simpler to use
`create_distributed_hypertable` in a multi-user and multi-permission
environment where users have different permissions on data nodes
(DNs). For instance, imagine user A being allowed to attach DN1 and
DN2, while user B can attach DN2 and DN3. Without this change,
`create_distributed_hypertable` will always fail since it tries to
attach DN1, DN2, and DN3 irrespective of the user that calls the
function. Even worse, if only DN1 and DN2 existed initially, user A
would be able to create distributed hypertables without errors, but,
as soon as DN3 is added, `create_distributed_hypertable` would start
to fail for user A.
The only way to avoid the above described errors when creating
distributed hypertables is for users to pass an explicit set of data
nodes that only includes the data nodes they have USAGE on. If a
user is forced to do that, the result would in any case be the same as
that introduced with this change. Unfortunately, users currently have
no way of knowing which data nodes they have USAGE on unless they
query the PostgreSQL catalogs. In many cases, a user might not even
care about the data nodes they are using if they aren't DBAs
themselves.
To summarize the new behavior:
* If public (everyone) has USAGE on all data nodes, there is no
change in behavior.
* If a user has USAGE on a subset of data nodes, it will by default
attach only those data nodes since it cannot attach the other ones
anyway.
* If the user specifies an explicit set of data nodes to attach, all
of those nodes still require USAGE permissions. (Behavior
unchanged.)
Fixes test help function remote_exec with:
- Free always the memory allocated by open_memstream and skip printing
no result. Fixes memory leak.
- Replacing strtok with strtok_r to avoid risks of a data race.
The collation used in `data_node_bootstrap.sql` was not available for
the installation used by the sanitizer, causing the sanitizer tests to
fail.
This commit change the collation and ctype to use types available on
that platform.
**For beta releases**, upgrading from an earlier version of the
extension (including previous beta releases) is not supported.
This release improves performance for queries executed on distributed
hypertables, fixes minor issues and blocks few SQL API functions,
which are not supported on distributed hypertables. It also adds
information about distributed databases in the telemetry.
Remote estimation is increasing total query time since it
involves additional round trips to data nodes.
Since now we can more efficiently fetch chunk stats from data
nodes (using get_chunk_relstats) - fetching and parsing
remote estimates becomes obsolete.
This change introduces two ways of fetching data from data nodes: one
using cursors and another one using row-by-row mode. The major
benefit of row-by-row mode is that it enables running parallel plans
on data nodes. The default data fetcher uses row-by-row mode. A new
GUC `timescaledb.remote_data_fetcher` has been added to enable
switching between these two implementations (rowbyrow or cursor).
This change extends telemetry report and adds new 'distributed_db'
section which includes following keys: 'distributed_member',
'data_nodes_count' and 'distributed_hypertables_count'.
Planning of a data node rel during a distributed query should use the
accumulated stats from the individual chunks that the data node rel
represents. Since the data node rel is not a real base rel (i.e., it
doesn't correspond to a real relation) it doesn't have any statistics
in the `pg_catalog` that can be used for planning. Thus, some
functions, such as `set_baserel_size_estimates` will return strange
estimates for data node rels when the planner believes it has stats
(e.g., after an ANALYZE).
This change fixes this issue by not relying on the planner to compute
rel estimates for data node rels. Instead the accumulated estimates
based on the chunks queried by the data node rel are used. This also
obviates the need to compute these stats again.
Given the new size estimates that this change enables, some plan/test
outputs have changed and tests updated to deal with that.
The access node and the data node need to have the same encoding, the
same ctype, and the same collation, so the following changes where
made when bootstrapping the database on the data node:
- Explicity use `template0`
- Explicitly set the encoding of the database on the access node
- Explicitly set the `LC_CTYPE` of the database on the access node
- Explicitly set the `LC_COLLATE` to the collation of the database on
the access node
When not bootstrapping, it is checked that the encoding, `LC_CTYPE`,
and collation matches what the access node is using.
This change prepares the repo for developing the 2.0.0-beta3.
The CMake configuration required an update to handle both
a beta and dev suffix in the version string.
**For beta releases**, upgrading from an earlier version of the
extension (including previous beta releases) is not supported.
This release introduces *distributed hypertables*, a major new feature
that allows hypertables to scale out across multiple nodes for
increased performance and fault tolerance. Please review the
documentation to learn how to configure and use distributed
hypertables, including current limitations.
For some queries, with only one node "hit', push down can be forced by
matching the partitioning attributes with the group by expression.
However, push down didn't happen in some of these cases because of a
bug in `force_group_by_push_down` that didn't properly update the
number of attributes in the partitioning expression.
This change fixes the bug to always set the number of partitioning
attributes to match the group by clause.
Set max retries to 0 for the drop chunks background job in the
background worker reorder drop chunk test, so it will not do a retry,
which is not expected by the test, and thus the test passes on ARM.
The number of output rows estimated for "remote" upper rels could
sometimes erroneously be zero. This happened when computing the
estimate for upper rels with different pathkeys: in case of several
different path keys the estimation was not recalculated and instead
relied on cached values from the first calculation on the same
rel. However, the number of output rows was never cached so the second
pathkey estimated for the upper rel would always produce zero output
rows. This has now been fixed by storing the output rows in the upper
rel after the first estimation.
This fix affects some query plans so a number of tests are affected.
If `remote_txn_check_for_leaked_prepared_statements` do not have a
working connection, it will abort by crashing the server. Since there
are tests that kill the remote server in different phases of the 2PC,
bad timing might cause the server to crash rather than generate an
error.
This commit replace the assertion with both a status check and a check
that the correct number of rows and columns are returned and will
generate an status message with the error message, if any.
A new function, `get_chunk_relstats()`, allows fetching relstats
(basically `pg_class.{relpages,reltuples`) from remote chunks on data
nodes and writing it to the `pg_class` entry for the corresponding
local chunk. The function expects either a chunk or a hypertable as
input and returns the relstats for the given chunk or all chunks for
the given hypertable, respectively.
Importing relstats as described is useful as part of a distributed
ANALYZE/VACUUM that won't require fetching all data into the access
node for local sampling (like the current implemention does).
In a future change, this function will be called as part of a local
ANALYZE on the access node that runs ANALYZE on all data nodes
followed by importing of the resulting relstats for the analyzed
chunks.
Debug printouts are added at two locations:
* Inside `get_foreign_upper_paths` a notice with the existing paths is
send back together with the stage if the `show_upper` is set to show
the stage.
* Inside `set_rel_pathlist` a notice is printed with the existing paths
if the `show_rel` debug flag is set.
The debug printouts are sent back to the client as debug messages
(`DEBUG2`), allowing developers to quickly experiment and interactively
see what effect different statements has with the respect to the paths
generated.
In addition, the handling of `debug_optimizer_flag` was not correct and
is fixed here.
* If no `=` was provided to `show_upper`, it triggered a segmentation
fault.
* If the flag was reset, the internal data structure was not updated.
* If just one flag was updated in a `SET` command, the other flag was
kept intact.
This patch upgrades existing dist_cmd functions to support
executing commands which cannot be run inside an active transaction,
such as the VACUUM command.
This patch adds a way to check and print a notice message
to a user who wants to drop an access node database.
Since database drop can only be done using different database
connection, this functionality is implemented inside loader
extension.
Functionality of the security labels are used in order to mark
a distributed database and make this information accessible
by other databases in pg_shseclabel table.
This change allows certain mutable functions to be whitelisted so that
they can be safely pushed down to data nodes. Additionally, this change
will no longer prevent queries containing the `now` function from being
pushed down to data nodes, but will instead replace the function call
with the transaction start time (which is the same value which would be
used had the query been run solely on the access node).
This fixes a problem in the connection tests when converting ints to
datums on certain platforms (e.g., ARM). The appropriate datum
conversion macros weren't used when returning datum results, which
caused errors on ARM platforms.
When deleting a data node it currently clear the `dist_uuid` in the
database on the data node, which require it to be able to connect to
the data node and would also mean that it is possible to re-add the
data node to a new cluster without checking that it is in a consistent
state.
This commit remove the code that clear the `dist_uuid` and hence do not
need to connect to the data nodel. All tests are updated to reflect the
fact that no connection will be made to the data node and that the
`dist_uuid` is not cleared.
For the tests that are using the `remote_node_set_kill_event` function
to perform a hard termination and precise stages of the 2PC the
connection will close with different warning messages depending on the
timing of connection close and SSL close.
To avoid a flaky test, this commit set `client_min_messages` to `ERROR`
for the duration of the transaction. Since checks are done after the
transaction that the transaction is properly rolled back in the event
of a crash, the warning messages does not offer any additional
benefits.
Add a new `debug_optimizer_flags` option where you can request
optimization debug notices. With this commit, the flags `show_upper`
and `show_rel` are added.
The flag `show_upper` will permit sending back a message with the
resulting relation after creating upper paths. Since this is called at
different stages, the flag support setting specific stages where
printouts should be done using the format `show_upper=window,final`.
The flag `show_rel` will permit sending back the resulting relation
after executing `set_rel_pathlist` to add new paths for consideration.
The actual implementation to sent the notices will be in a separate
commit.
Process arguments of data node allow or block new chunks SQL API
functions separately, since the number of optional arguments is
different between allow and block functions. This fixes the bug with
memory access.
This change removes collation checks for foreign expressions. With
distributed hypertables, we assume that all participating nodes have
the same collation configuration so there should be no difference
(from a collation perspective) between executing an expression locally
and remotely.
We want to get more insights about what plans
are executed on data nodes. If a user runs
explain with verbose option we will connect to each data node,
run explain on data node and
print output together with existing explain output.
If something is wrong with a connection we would not be able to
start a transaction on the remote node. So when abort happens we
shoud not try to abort a transaction that hasn't started.
We use xact_depth to check if transaction actually started.
If we find one we remove it from our tx cache which should result in removing
the invalid connection as well.
In order to debug the optimizer it is necessary to provide the
`OPTIMIZER_DEBUG` preprocessor symbol, so added an option to enable
this for the code.
It still requires the PostgreSQL source code to be built with this
flag.