Refactor the data node connection establishment so that it is
interruptible, e.g., by ctrl-c or `statement_timeout`.
Previously, the connection establishment used blocking libpq calls. By
instead using asynchronous connection APIs and integrating with
PostgreSQL interrupt handling, the connection establishment can be
canceled by an interrupt caused by a statement timeout or a user.
Fixes#2757
When attaching a data node and specifying `repartition=>false`, the
current number of partitions should remain instead of recalculating
the partitioning based on the number of data nodes.
Fixes#5157
When deleting a data node with the option `drop_database=>true`, the
database is deleted even if the command fails.
Fix this behavior by dropping the remote database at the end of the
delete data node operation so that other checks fail first.
This function drops chunks on a specified data node if those chunks are
not known by the access node.
Call drop_stale_chunks() automatically when data node becomes
available again.
Fix#4848
PG15 introduced a ProcSignalBarrier mechanism in drop database
implementation to force all backends to close the file handles for
dropped tables. The backend that is executing the drop database command
will emit a new process signal barrier and wait for other backends to
accept it. But the backend which is executing the delete_data_node
function will not be able to process the above mentioned signal as it
will be stuck waiting for the drop database query to return. Thus the
two backends end up waiting for each other causing a deadlock.
Fixed it by using the async API to execute the drop database command
from delete_data_node instead of the blocking remote_connection_cmdf_ok
call.
Fixes#4838
Add a new function, `alter_data_node()`, which can be used to change
the data node's configuration originally set up via `add_data_node()`
on the access node.
The new functions introduces a new option "available" that allows
configuring the availability of the data node. Setting
`available=>false` means that the node should no longer be used for
reads and writes. Only read "failover" is implemented as part of this
change, however.
To fail over reads, the alter data node function finds all the chunks
for which the unavailable data node is the "primary" query target and
"fails over" to a chunk replica on another data node instead. If some
chunks do not have a replica to fail over to, a warning will be
raised.
When a data node is available again, the function can be used to
switch back to using the data node for queries.
Closes#2104
Postgres source code define the macro `OidIsValid()` to check if the Oid
is valid or not (comparing against the `InvalidOid` type). See
`src/include/c.h` in Postgres source three.
Changed all direct comparisons against `InvalidOid` for the `OidIsValid`
call and add a coccinelle check to make sure the future changes will use
it correctly.
Add a new metadata table `dimension_partition` which explicitly and
statefully details how a space dimension is split into partitions, and
(in the case of multi-node) which data nodes are responsible for
storing chunks in each partition. Previously, partition and data nodes
were assigned dynamically based on the current state when creating a
chunk.
This is the first in a series of changes that will add more advanced
functionality over time. For now, the metadata table simply writes out
what was previously computed dynamically in code. Future code changes
will alter the behavior to do smarter updates to the partitions when,
e.g., adding and removing data nodes.
The idea of the `dimension_partition` table is to minimize changes in
the partition to data node mappings across various events, such as
changes in the number of data nodes, number of partitions, or the
replication factor, which affect the mappings. For example, increasing
the number of partitions from 3 to 4 currently leads to redefining all
partition ranges and data node mappings to account for the new
partition. Complete repartitioning can be disruptive to multi-node
deployments. With stateful mappings, it is possible to split an
existing partition without affecting the other partitions (similar to
partitioning using consistent hashing).
Note that the dimension partition table expresses the current state of
space partitions; i.e., the space-dimension constraints and data nodes
to be assigned to new chunks. Existing chunks are not affected by
changes in the dimension partition table, although an external job
could rewrite, move, or copy chunks as desired to comply with the
current dimension partition state. As such, the dimension partition
table represents the "desired" space partitioning state.
Part of #4125
The current check where we deem a DN incompatible if it's on a newer
version is exactly the opposite of what we want it to be. Fix that and
also add relevant test cases.
If a superuser is used to invoke attach_data_node on a hypertable then
we need to ensure that the object created on this data node has the
same original ownership permissions.
Fixes#4433
Add a parameter `drop_remote_data` to `detach_data_node()` which
allows dropping the hypertable on the data node when detaching
it. This is useful when detaching a data node and then immediately
attaching it again. If the data remains on the data node, the
re-attach will fail with an error complaining that the hypertable
already exists.
The new parameter is analogous to the `drop_database` parameter of
`delete_data_node`. The new parameter is `false` by default for
compatibility and ensures that a data node can be detached without
requiring communicating with the data node (e.g., if the data node is
not responding due to a failure).
Closes#4414
Compilers are not smart enough to check that `conn` is initialized
inside the loop so not initializing it gives an error. Added an
initializer to the auto variable to get rid of the error.
Starting with PG15, default permissions on the public schema is
restricted for any non-superuser non-owner. This causes test failures
since tables can no longer be created without explicitly adding
permissions, so we remove grant when bootstrapping the data nodes and
instead grant permissions to the users in the regression tests. This
keeps the default permissions on data nodes, but allow regression tests
to run.
Fixes#3957
Reference: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b073c3cc
Since PG15, by default, non-superuser accounts are not allowed to create
tables in public schema of databases they don't own. This default can be
changed manually. This patch ensures that the permissions are going to be the
same regardless of the used PostgreSQL version.
Without this patch, none of our tests pass on PG15 because they fail with the
"access denied to schema public" error. This is why runner.sh was modified.
Then, some other tests keep failing because when we call create_distributed_hypertable()
we create a new database on each of the data nodes, also not granting enough
permissions to non-privileged users. This is what the fix of data_node.c
addresses.
This is not necessarily the best approach possible, but it preserves the same
behavior on PostgreSQL >= 15 and PostgreSQL < 15. Maybe one day we will come up
with something better (especially when there will be no need to support PG < 15)
but until then the patch seems to be good enough.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b073c3cc
When deleting a data node, it is often convenient to be able to also
drop the database on the data node so that the node can be added again
using the same database name. However, dropping the database is
optional since it should be possible to delete a data node even if it
is no longer responding.
With the new functionality, a data node's database can be dropped as
follows:
```sql
SELECT delete_data_node('dn1', drop_database=>true);
```
Note that the default behavior is still to not drop the database in
order to be compatible with the old behavior. Enabling the option also
makes the function non-transactional, since dropping a database is not
transactional. Therefore, it is not possible to use this option in a
transaction block.
Closes#3876
* A few tweaks to the remote txn resolution logic
* Add logic to delete a specific record in remote_txn table by GID
* Allow heal logic to move on to other cleanup if one specific GID
fails
* Do not rely on ongoing txns while cleaning up entries from remote_txn
table
Includes test case changes to try out various failure scenarios in the
healing function.
Fixes#3219
This patch does refactoring and rework of the logic beside
dist_ddl_preprocess() function.
The idea behind it is to simplify process by splitting
each DDL command logic inside separate function and avoid relaying on
the hypertable list count to make decisions.
This change allows easier to process more complex commands
(such as GRANT), which would require query rewrite or to be
executed on a different data nodes. Additionally this would make it
easier to follow and be more alike as main code path inside
src/process_util.c.
This change fixes a crash that occurred when calling
`distributed_exec` via a direct function call.
The crash was triggered by a dynamic lookup of the function name via
the function Oid in the `FunctionCallInfo` struct in order to generate
error messages for read-only and transaction block checks. However,
this information is provided by the parsing stage, which is not
executed when doing direct function calls, thus leading to a
segmentation fault when trying to dereference a pointer in the
`FunctionCallInfo` that wasn't set.
Note that this problem is not limited to `distributed_exec`; it is
present in all SQL-callable functions that use the same pattern and
macros.
To fix the problem, update the macros and patterns used for checking
for read-only mode and transaction blocks to avoid doing the function
name lookup when the pointer is not set. Instead fall back to the C
function name in that case (via C macro `__func__`).
A test case is added in C code to call `distributed_exec` via a direct
function call within a transaction block in order to hit the
previously crashing error message.
The `distributed_exec` function has also been updated with better
handling of input parameters, like empty arrays of data nodes, or
arrays containing NULL elements.
This change removes a check for `USAGE` privileges on data nodes
required to query the data node using utility commands, such as
`hypertable_size`. Normally, PostgreSQL doesn't require `USAGE` on a
foreign server to query its remote tables. Also, size utilities, like
`pg_table_size` can be used by anyone---even roles without any
privileges on a table. The behavior on distributed hypertables is now
consistent with PostgreSQL.
Fixes#3698
Creates a table for chunk replica on the given data node. The table
gets the same schema and name as the chunk. The created chunk replica
table is not added into metadata on the access node or data node.
The primary goal is to use it during copy/move chunk.
Harden core APIs by adding the `const` qualifier to pointer parameters
and return values passed by reference. Adding `const` to APIs has
several benefits and potentially reduces bugs.
* Allows core APIs to be called using `const` objects.
* Callers know that objects passed by reference are not modified as a
side-effect of a function call.
* Returning `const` pointers enforces "read-only" usage of pointers to
internal objects, forcing users to copy objects when mutating them
or using explicit APIs for mutations.
* Allows compiler to apply optimizations and helps static analysis.
Note that these changes are so far only applied to core API
functions. Further work can be done to improve other parts of the
code.
This change ensures the database and extension is validated whenever
these objects aren't created, instead of only doing validation when
`bootstrap=>false` is passed when adding a data node.
This fixes a corner case where a data node could be added and removed
several times, even though the data node's database was already marked
as having been part of a multi-node setup.
A new test checks that a data node cannot be re-added after deleting
it on the access node, irrespective of whether one bootstraps the data
node or not when it is added.
When the access node bootstraps a data node and creates the extension,
it should use the extension version of the access node. This change
adds the `VERSION` option to the `CREATE EXTENSION` statement sent to
a data node so that the extension versions on the access node and data
nodes will be the same. Without the version option, data nodes will be
bootstrapped with the latest version installed, potentially leading to
data nodes running different versions of the extension compared to the
access node.
When `add_data_node` fails, it often gives an opaque error that it
couldn't connect to the data node. This change adds the libpq
connection error as a detailed message in the error.
This change fixes two issues with `add_data_node`:
1. In one case, a check for a valid connection pointer was not done,
causing a segmentation fault when connection attempts failed.
2. Connections were made with a blocking API that hangs
indefinitely when the receiving end is not responding. The user
couldn't cancel the connection attempt with CTRL-C, since no wait
latch or interrupt checking was used. The code is now updated to
use a non-blocking connection API, where it is possible to wait on
the socket and latch, respecting interrupts.
Initialize a boolean variable used to check for a compatible extension
on a data node. Leaving it uninitialized might lead to a potential
read of a garbage value and unpredictible behavior.
Add an optional password parameter to `add_data_node` so that users
that don't have a password in a `passfile` on the access node can add
data nodes using password authentication. Together with user mappings,
this allows full multinode configuration without relying on passwords
or certificates provided in external/on-disk files.
While wasswords can be provided in the database via a user mapping
object, such a mapping is created on a per-server basis and requires
the foreign server to exist prior to creating the mapping. When adding
a data node, however, bootstrapping and/or validation of the data node
happens at the same time as the server object is created, which means
no user mapping can be created prior to adding the data
node. Therefore, the password must be provided as an argument to add
data node instead of via a user mapping.
Fortunately, using a function parameter might be preferred to a user
mapping since the (plaintext) password won't be stored in the
database. A user mapping for the user that created the data node can
optionally be added after the data node has been added. But it might
be desirable to only create user mappings for unprivileged users that
will mostly interact only with specific distributed hypertables.
We want to check all available extension versions
and not just the installed one. This is because we
might be setting up a cluster for a database that
has different extension version then the `postgres` or
`template1` database which we actually use to perform
this validation.
So instead of using `pg_available_extensions` view we
use `pg_available_extension_versions` that should return
the same list of extension versions no matter which database
we connect to.
This should also make it possible to add a data node that
has run ALTER EXTENSION UPDATE.
This gives no guarantees that installed version will be
compatible because we currently we use default version
(the one specified in control file) when
installing an extension.
The function name is hard-coded in some cases in the C function, so
this commit instead define and use a macro that will extract the
function name from the `fcinfo` structure. This prevents mismatches
between the hard-coded names and the actual function name.
Closes#2579
Errors and messages are overhauled to conform to the official
PostgreSQL style guide. In particular, the following things from the
guide has been given special attention:
* Correct capitalization of first letter: capitalize only for hints,
and detail messages.
* Correct handling of periods at the end of messages (should be elided
for primary message, but not detail and hint messages).
* The primary message should be short, factual, and avoid reference to
implementation details such as specific function names.
Some messages have also been reworded for clarity and to better
conform with the last bullet above (short primary message). In other
cases, messages have been updated to fix references to, e.g., function
parameters that used the wrong parameter name.
Closes#2364
This change makes detach_data_node() function consistent with
other data node management functions by adding missing
if_attach argument.
The function will not show an error in case if data node is not
attached and if_attached is set to true.
Issue: #2506
If the database exists on the data node when executing `add_data_node`
it will generate an error in the data node log, which can cause
problems since there is an error indication in the log but there are no
failing operations.
This commit fixes this by first validating the database and only if it
does not exist, create the database.
Closes#2503
We stop enforcing an extension owner to be the same as a user adding a
data node since that's not strictly necessary. In multi-node setups
it is common that a data node is pre bootstrapped and an extension owner
is already set. This will prevent getting an error when a non
extension owner tries to add a data node.
If the access node is adding itself as a data node using `add_data_node`
it will deadlock since transactions will be opened on both the access
node and data node both trying to update the metadata.
This commit fixes this by updating `set_dist_id` to check if the UUID
being added as `dist_uuid` is the same as the `uuid` of the node. If
that is the case, it raises an error.
Fixes#2133
This change ensures that API functions and DDL operations
which modify data respects read-only transaction state
set by default_transaction_read_only option.
Since the connection cache is no longer replaced on a transaction
rollback, it is not necessary to pin the connection cache (this wasn't
done correctly in some places in any case, e.g.,
`data_node_get_connection`).
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:
remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
If the extension is not available on the data node, a strange error
message will be displayed since the extension cannot be installed. This
commit check for the availability of the extension before trying to
bootstrap the node and print a more helpful informational message if
the extension is not available.