34 Commits

Author SHA1 Message Date
Nikhil
2ffa1bf436 Implement cleanup for chunk copy/move
A chunk copy/move operation is carried out in stages and it can
fail in any of them. We track the last completed stage in the
"chunk_copy_operation" catalog table. In case of failure, a
"chunk_copy_cleanup" function can be invoked to bring the chunk back
to its original state on the source datanode and all transient objects
like replication slot, publication, subscription, empty chunk, metadata
updates, etc are cleaned up.

Includes test case changes for each and every stage induced failure.

To avoid confusion between chunk copy activity and chunk copy operation
this patch also consistently uses "operation" everywhere now instead of
"activity"
2021-07-29 16:53:12 +03:00
Erik Nordström
bea2613455 Add constraints when copying chunks across data nodes
Chunk constraints are now added after a chunk has been copied from one
data node to another. The constraints are added when the chunk is made
visible on the destination node, i.e., after data has been copied and
the chunk's metadata is created.

As an alternative, the constraints could be added when the chunk table
is first created, but before the metadata for the chunk is added. This
would have the benefit of validating each copied (inserted) row
against the constraints during the data copy phase. However, this
would also necessitate decoupling the step of creating the constraint
metadata from the creation of the actual constraints since the other
chunk metadata that is referenced does not yet exist. Such decoupling
would require validating that the metadata actually matches the
constraints of the table when the metadata is later created.

One downside of adding the constraints after data copying is that it
necessitates validating all the chunk's rows against the constraints
after insertion as opposed to during insertion. If this turns out to
be a performance issue, validation could be initially deferred. This
is left as a future optimization.
2021-07-29 16:53:12 +03:00
Dmitry Simonenko
38c1781748 Copy/move chunk refactoring
Remove copy_chunk_data() function and code needed to support it,
such as the 'transactional' argument.

Rework copy chunk logic using separate stages.

Introduce copy_chunk() API function as an internal wrapper for
the move_chunk().
2021-07-29 16:53:12 +03:00
Nikhil
f6b0250557 Implement wrapper API for copy/move chunk
The building blocks required for implementing end-to-end copy/move
chunk functionality have now been wrapped in a procedure.

A procedure is required because multiple transactions are needed to
carry out the activity across the access node and the involved two data
nodes.

The following steps are encapsulated in this procedure

1) Create an empty chunk table on the destination data node

2) Copy the data from the src data node chunk to this newly created
destination node chunk. This is done via inbuilt PostgreSQL logical
replication functionality

3) Attach this chunk to the hypertable on the dst data node

4) Remove this chunk from the src data node to complete the move if
requested

A new catalog table "chunk_copy_activity" has been added to track
the progress of the above stages. A unique id gets assigned to each
activity and it is updated with the completed stages as things
progress.
2021-07-29 16:53:12 +03:00
Erik Nordström
b8ff780c50 Add ability to create chunk from existing table
The `create_chunk` API has been extended to allow creating a chunk
from an existing relational table. The table is turned into a chunk by
attaching it to the root hypertable via inheritance.

The purpose of this functionality is to allow copying a chunk to
another node. First, the chunk table and data is copied. After that,
the `create_chunk` can be executed to make the new table part of the
hypertable.

Currently, the relational table used to create the chunk has to match
the hypertable in terms of constraints, triggers, etc. PostgreSQL
itself enforces the existence of same-named CHECK constraints, but no
enforcement currently exists for other objects, including triggers
UNIQUE, PRIMARY KEY, or FOREIGN KEY constraints. Such enforcement can
be implemented in the future, if deemed necessary. Another option is
to automatically add all the required objects (triggers, constraints)
based on the hypertable equivalents. However, that might also lead to
duplicate objects in case some of them exist on the table prior to
creating the chunk.
2021-07-29 16:53:12 +03:00
Ruslan Fomkin
404f1cdbad Create chunk table from access node
Creates a table for chunk replica on the given data node. The table
gets the same schema and name as the chunk. The created chunk replica
table is not added into metadata on the access node or data node.

The primary goal is to use it during copy/move chunk.
2021-07-29 16:53:12 +03:00
Ruslan Fomkin
34e99a1c23 Return error for NULL input to create_chunk_table
Gives errors if any argument of create_chunk_table is NULL instead of
being STRICT. Utilizes newly added macros for this.
2021-07-29 16:53:12 +03:00
Ruslan Fomkin
28ccecbe7c Create an empty chunk table
Adds an internal API function to create an empty chunk table according
the given hypertable for the given chunk table name and dimension
slices. This functions creates a chunk table inheriting from the
hypertable, so it guarantees the same schema. No TimescaleDB's
metadata is updated.

To be able to create the chunk table in a tablespace attached to the
hyeprtable, this commit allows calculating the tablespace id without
the dimension slice to exist in the catalog.

If there is already a chunk, which collides on dimension slices, the
function fails to create the chunk table.

The function will be used internally in multi-node to be able to
replicate a chunk from one data node to another.
2021-07-29 16:53:12 +03:00
Sven Klemm
ff5d7e42bb Adjust code to PG14 reltuples changes
PG14 changes the initial value of pg_class.reltuples to -1 to allow
differentiating between an empty relation and a relation where
ANALYZE has not yet run.

https://github.com/postgres/postgres/commit/3d351d916b
2021-06-29 16:35:35 +02:00
Erik Nordström
98110af75b Constify parameters and return values of core APIs
Harden core APIs by adding the `const` qualifier to pointer parameters
and return values passed by reference. Adding `const` to APIs has
several benefits and potentially reduces bugs.

* Allows core APIs to be called using `const` objects.
* Callers know that objects passed by reference are not modified as a
  side-effect of a function call.
* Returning `const` pointers enforces "read-only" usage of pointers to
  internal objects, forcing users to copy objects when mutating them
  or using explicit APIs for mutations.
* Allows compiler to apply optimizations and helps static analysis.

Note that these changes are so far only applied to core API
functions. Further work can be done to improve other parts of the
code.
2021-06-14 22:09:10 +02:00
Sven Klemm
fb863f12c7 Remove support for PG11
Remove support for compiling against PostgreSQL 11. This patch also
removes PG11 specific compatibility macros.
2021-06-01 20:21:06 +02:00
Sven Klemm
6e437c2d95 Fix use after free in chunk_api_get_chunk_stats 2021-04-13 15:31:41 +02:00
Ruslan Fomkin
639aef76a4 Refactor chunk creation for future extension
Separates chunk preparation and metadata update. Separates preparation
of constraints names, since there is no overlap between preparing
names for dimension constraints and other constraints. Factors out
creation of json string describing dimension slices of a chunk.

This refactoring is preparation for implementing new functionalities.
2021-04-06 14:02:22 +02:00
Sven Klemm
e79c0648cf Fix member access within misaligned address in chunk_update_colstats
The array argument passed to array_length is treated as AnyArrayType
which is a union of ArrayType and ExpandedArrayHeader, which lead to
member access within misaligned address when used on the argument
passed to array_length by chunk_update_colstats which is ArrayType.
2020-10-13 21:05:23 +02:00
Mats Kindahl
c321fe0ca0 Check insert privileges to create chunk
To create a chunk in a hypertable, it is currently necessary to be the
owner of the hypertable of the chunk. If a user has insert privileges
only, it will fail with an error message, which causes problems when
inserting data into distributed hypertables since the user cannot
create new chunks.

This commit changes this and only requires that the user has insert
privileges on the hypertable of the chunk for allowing creation of a
new chunk.

Closes #2393
2020-09-21 18:16:40 +02:00
Brian Rowe
6b62ed543c Fetch collations from data nodes during ANALYZE
This change fixes the stats collecting code to also return the slot
collation fields for PG12. This fixes a bug (#2093) where running an
ANALYZE in PG12 would break queries on distributed tables.
2020-07-20 10:54:44 -07:00
Erik Nordström
596515eb0f Fix ANALYZE on replicated distributed hypertable
With replicated chunks, the function to import column stats would
experience errors when updating `pg_statistics`, since it tried to
write identical stats from several replica chunks.

This change fixes this issue by filtering duplicate stats rows
received from data nodes.

In the future, this could be improved by only requesting stats from
"primary" chunks on each data node, thus avoiding duplicates without
having to filter the result. However, this would complicate the
function interface as it would require sending a list of chunks
instead of just getting the stats for all chunks in a hypertable.
2020-06-18 12:38:18 +02:00
Sven Klemm
db617bf1d6 Fix typos in comments and documentation 2020-06-10 15:09:31 +02:00
Erik Nordström
8887f26baf Fix array construction issue for remote colstats
When fetching remote column statistics (`pg_statistic`) from data
nodes, the `stanumbers` field was not turned into an array
correctly. This caused values to be corrupted when importing them to
the access node. This issue has been fixed along with some compiler
warning issues (e.g., mixed declaration and code).
2020-05-27 17:31:09 +02:00
Brian Rowe
fad33fe954 Collect column stats for distributed tables.
This change adds a new command to return a subset of the column
stats for a hypertable (column width, percent null, and percent
distinct).  As part of the execution of this command on an access
node, these stats will be collected for distributed chunks and
updated on the access node.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
71e2c35d48 Run distributed VACUUM/ANALYZE without FDW API
Run VACUUM/ANALYZE and automatically import the updated stats
using the distributed DDL functionality instead of FDW
analyze wrappers.
2020-05-27 17:31:09 +02:00
niksa
02f7d7aa48 Fetch data using either cursor or row-by-row
This change introduces two ways of fetching data from data nodes: one
using cursors and another one using row-by-row mode.  The major
benefit of row-by-row mode is that it enables running parallel plans
on data nodes. The default data fetcher uses row-by-row mode. A new
GUC `timescaledb.remote_data_fetcher` has been added to enable
switching between these two implementations (rowbyrow or cursor).
2020-05-27 17:31:09 +02:00
Erik Nordström
6a9db8a621 Add function to fetch remote chunk relation stats
A new function, `get_chunk_relstats()`, allows fetching relstats
(basically `pg_class.{relpages,reltuples`) from remote chunks on data
nodes and writing it to the `pg_class` entry for the corresponding
local chunk. The function expects either a chunk or a hypertable as
input and returns the relstats for the given chunk or all chunks for
the given hypertable, respectively.

Importing relstats as described is useful as part of a distributed
ANALYZE/VACUUM that won't require fetching all data into the access
node for local sampling (like the current implemention does).

In a future change, this function will be called as part of a local
ANALYZE on the access node that runs ANALYZE on all data nodes
followed by importing of the resulting relstats for the analyzed
chunks.
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
4e004c5564 Unify to use a constant in array declarations
Replaces a variable array length with a constant, which is commonly
used in the code. The change is asserted.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
79f6223631 Replace UserMappings with a connection ID
This change replace UserMappings with newly introduced TSConnectionId
object, which represent a pair of foreign server id and local user id.

Authentication has been moved to non-password based, since original
UserMappings were used to store a data node user passwords as
well. This is a temporary step, until introduction of certificate
based authentication.

List of changes:

* add_data_node() password and bootstrap_password arguments removed

* introduced authentication using pgpass file

* RemoteTxn format string which represents tx changed to
  tx-version-xid-server_id-user_id

* data_node_dispatch, remote transaction cache, connection cache hash
  tables keys switched to TSConnectionId instead of user mappings

* remote_connection_open() been rework to exclude user options

* Tests upgraded, user mappings and passwords usage has been excluded
2020-05-27 17:31:09 +02:00
Erik Nordström
b07461ec00 Refactor and harden data node management
This change refactors and hardens parts of data node management
functionality.

* A number of of permissions checks have been added to data node
  management functions. This includes checking that the user has
  proper permissions for both table and server objects.
* Permissions checks are now done when creating remote chunks on data
  nodes.
* The add_data_node() API function has been simplified and now returns
  more intuitive status about created objects (foreign server,
  database, extension). It is no longer necessary to specify a user to
  connect with as this is always assumed to be the current user. The
  bootstrap user can still be specified explicitly, however, as that
  user might require elevated permissions on the remote node to
  bootstrap.
* Functions that capture exceptions without re-throwing, such as
  `ping_data_node()` and `get_user_mapping()`, have been refactored to
  not do this as the transaction state and memory contexts are not in
  states where it is safe to proceed as normal.
* Data node management functions now consistently check that any
  foreign servers operated on are actually TimescaleDB server objects.
* Tests now run with a superuser a regular user specific to
  clustering. These users have password auth enabled in `pg_hba.conf`,
  which is required by the connection library when connecting as a
  non-superuser. Tests have been refactored to bootstrap data nodes
  using these user roles.
2020-05-27 17:31:09 +02:00
Brian Rowe
79fb46456f Rename server to data node
The timescale clustering code so far has been written referring to the
remote databases as 'servers'.  This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest.  In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database.  Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.

As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes.  This change has updated the code to rename
those instances.
2020-05-27 17:31:09 +02:00
Brian Rowe
dd3847a7e0 Rename files in preparation for large refactor
This change includes the only rename changes required by the renaming
of server to data node across the clustering codebase.  This change
is being committed separately from the bulk of the rename changes to
prevent git from losing the file history of renamed files (merging the
rename with extensive code modifications resulted in git treating some
of the file moves as a file delete and new file creation).
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
a99ae05723 Cleanup PG version checks for multinode
Since distributed hypertables will only be support on PG11 or greater,
ensure that we do not compile multinode-related files on previous
versions. Also raise appropriate errors when trying to invoke
multinode-related functionality on versions prior to PG11.
2020-05-27 17:31:09 +02:00
niksa
86858e36e9 Support multiple async requests per connection
The idea here is to allow multiple async requests to be
created for the same connection. Since connection can process only
one request at the time only that means that one request can be
running and the rest needs to be deferred. The deferred async request
will run on get response if the connection is not in use by running
async request.
This support should pave the way for async creation of cursors.
2020-05-27 17:31:09 +02:00
Erik Nordström
3ddbc386f0 Only support multinode on PG11 and greater
Multinode-related APIs now raise errors when called any PostgreSQL
version below 11, as these versions do not have the required features
to support multinode or have different behavior.

Raising errors at runtime on affected APIs is preferred over excluding
these functions altogether. Having a different user-facing SQL API
would severly complicate the upgrade process for the extension.

A new CMake check has been added to disable multinode features on
unsupported PostgreSQL versions. It also generates a macro in
`config.h` that can be used in code to check for multinode support.
2020-05-27 17:31:09 +02:00
Erik Nordström
e2371558f7 Create chunks on remote servers
This change ensures that chunk replicas are created on remote
(datanode) servers whenever a chunk is created in a local distributed
hypertable.

Remote chunks are created using the `create_chunk()` function, which
has been slightly refactored to allow specifying an explicit chunk
table name. The one making the remote call also records the resulting
remote chunk IDs in its `chunk_server` mappings table.

Since remote command invokation without super-user permissions
requires password authentication, the test configuration files have
been updated to require password authentication for a cluster test
user that is used in tests.
2020-05-27 17:31:09 +02:00
Erik Nordström
596be8cda1 Add mappings table for remote chunks
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.

The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.

Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
2020-05-27 17:31:09 +02:00
Erik Nordström
ae587c9964 Add API function for explicit chunk creation
This adds an internal API function to create a chunk using explicit
constraints (dimension slices). A function to export a chunk in a
format consistent with the chunk creation function is also added.

The chunk export/create functions are needed for distributed
hypertables so that an access node can create chunks on data nodes
according to its own (global) partitioning configuration.
2020-05-27 17:31:09 +02:00