This change refactors how connections are handled during remote
transactions. In particular, the connection cache now stays consistent
during transactions, even during rollbacks. Previously, the connection
cache was replaced on every rollback, even if the rollback was
intentional (i.e, not due to an error). This made it hard to debug
connections since the cache became completely empty.
Connections could also be left in the cache in a bad state after
failed transactions. This has been fixed by moving connection checks
to the cache and tying transaction state changes to each
connection. This ensures that such checks are done in one canonical
place instead of being spread out throughout the code.
Given how tightly coupled a remote transaction is with its connection,
it might make sense to remove the separate remote transaction store
and instead put this information in each connection. This is left to a
future change, however.
In addition to the above changes, this commit includes:
* Showing transaction depth and invalidation in the transaction store
* Invalidation on individual connections instead of replacing the
whole cache
* Closing of connections to a local database that is being dropped to
prevent "in use" errors.
* Ability to add callbacks to async requests that are executed when a
response is received. This is used by remote transactions to mark
connections as having successfully completed a transaction. Thus, on
errors, it is easy to detect connections that are in bad states.
* Error checks on each connection instead of having global error
tracking for each remote transaction. This change removes the global
error state for distributed transactions.
The connection cache for remote transactions can now be examined using
a function that shows all connections in the cache. This allows easier
debugging and validation both in tests and on live systems.
In particular, we'd like to know that connections are in good state
post commit or rollback and that we don't leave bad connections in the
cache.
The remote transaction test (`remote_txn`) has been updated to show
the connection cache as remote transactions are
executed. Unfortunately, the whole cache is replaced on every
(sub-)transaction rollback, which makes it hard to debug the
connection state of a particular remote transaction. Further, some
connections are left in the cache in a bad state after, e.g.,
connection loss.
These issues will be fixed with an upcoming change.
When deleting a data node it currently clear the `dist_uuid` in the
database on the data node, which require it to be able to connect to
the data node and would also mean that it is possible to re-add the
data node to a new cluster without checking that it is in a consistent
state.
This commit remove the code that clear the `dist_uuid` and hence do not
need to connect to the data nodel. All tests are updated to reflect the
fact that no connection will be made to the data node and that the
`dist_uuid` is not cleared.
Certain functions invoked on an access node need to be passed on to
data nodes to ensure any mutations happen also on those
nodes. Examples of such functions are `drop_chunks`, `add_dimension`,
`set_chunk_time_interval`, etc. So far, the approach has been to
deparse these "manually" on a case-by-case basis.
This change implements a generalized deparsing function that deparses
the function based on the function call info (`FunctionCallInfo`) that
holds the information about any invoked function that can be used to
deparse the function call.
The `drop_chunks` function has been updated to use this generalized
deparsing functionality when it is invoking remote nodes.
This change will check if the postgres version of the data node is 11
or greater during the add_data_node call. It will also now print a
more meaningful error message if the data node validation fails.
This change will call a function on a remote database to validate
its configuration before following through with an add_data_node
call. Right now the check will ensure that the data is able to use
prepared transactions, but further checks can be easily added in
the future.
Since this uses the timescaledb extension to validate the remote
database, it runs at the end of bootstrapping. We may want to
consider adding code to undo our bootstrapping changes if this check
fails.
The timescale clustering code so far has been written referring to the
remote databases as 'servers'. This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest. In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database. Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.
As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes. This change has updated the code to rename
those instances.
This change adds a new utility function for postgres
`server_hypertable_info`. This function will contact a provided node
and pull down the space information for all the distributed hypertables
on that node.
Additionally, a new view `distributed_server_info` has been added to
timescaledb_information. This view leverages the new
remote_hypertable_data function to display a list of nodes, along with
counts of tables, chunks, and total bytes used by distributed data.
Finally, this change also adds a `hypertable_server_relation_size`
function, which, given the name of a distributed hypertable, will print
the space information for that hypertable on each node of the
distributed database.
This change makes it possible for a data node to distinguish between
regular client connections and distributed database connections (from
the access node).
This functionality will be needed for decision making based on the
connection type, for example allow or block a DDL commands on a data
node.
This change adds a distributed database id to the installation data for a
database. It also provides a number of utilities that can be used for
getting/setting/clearing this value or using it to determing if a database is
a frontend, backend, or not a member of distributed database.
This change also includes modifications to the add_server and delete_server
functions to check the distributed id to ensure the operation is allowed, and
then update or clear it appropriately. After this changes it will no longer
be possible to add a database as a backend to multiple frontend databases, nor
will it be possible to add a frontend database as a backend to any other
database.