Adds an internal API function to create an empty chunk table according
the given hypertable for the given chunk table name and dimension
slices. This functions creates a chunk table inheriting from the
hypertable, so it guarantees the same schema. No TimescaleDB's
metadata is updated.
To be able to create the chunk table in a tablespace attached to the
hyeprtable, this commit allows calculating the tablespace id without
the dimension slice to exist in the catalog.
If there is already a chunk, which collides on dimension slices, the
function fails to create the chunk table.
The function will be used internally in multi-node to be able to
replicate a chunk from one data node to another.
This change fixes the stats collecting code to also return the slot
collation fields for PG12. This fixes a bug (#2093) where running an
ANALYZE in PG12 would break queries on distributed tables.
The chunk_api test fails sometimes because of inconsistent resultset
ordering in one of the queries. This patch adds the missing ORDER BY
clause to that query.
With replicated chunks, the function to import column stats would
experience errors when updating `pg_statistics`, since it tried to
write identical stats from several replica chunks.
This change fixes this issue by filtering duplicate stats rows
received from data nodes.
In the future, this could be improved by only requesting stats from
"primary" chunks on each data node, thus avoiding duplicates without
having to filter the result. However, this would complicate the
function interface as it would require sending a list of chunks
instead of just getting the stats for all chunks in a hypertable.
Tests that run `ANALYZE` on distributed hypertables are susceptible to
non-deterministic behavior because `ANALYZE` does random sampling and
might have a different seed depending on data node. This change fixes
this issue by running `setseed()` on all data node sessions before
`ANALYZE` is run.
Unfortunately, while this makes the behavior on a specific machine
more deterministic, it doesn't produce the exact same statistics
across different machines and/or C-libraries since those might have
different PRNG implementations for `random()`.
This change adds a new command to return a subset of the column
stats for a hypertable (column width, percent null, and percent
distinct). As part of the execution of this command on an access
node, these stats will be collected for distributed chunks and
updated on the access node.
A new function, `get_chunk_relstats()`, allows fetching relstats
(basically `pg_class.{relpages,reltuples`) from remote chunks on data
nodes and writing it to the `pg_class` entry for the corresponding
local chunk. The function expects either a chunk or a hypertable as
input and returns the relstats for the given chunk or all chunks for
the given hypertable, respectively.
Importing relstats as described is useful as part of a distributed
ANALYZE/VACUUM that won't require fetching all data into the access
node for local sampling (like the current implemention does).
In a future change, this function will be called as part of a local
ANALYZE on the access node that runs ANALYZE on all data nodes
followed by importing of the resulting relstats for the analyzed
chunks.
This change refactors and hardens parts of data node management
functionality.
* A number of of permissions checks have been added to data node
management functions. This includes checking that the user has
proper permissions for both table and server objects.
* Permissions checks are now done when creating remote chunks on data
nodes.
* The add_data_node() API function has been simplified and now returns
more intuitive status about created objects (foreign server,
database, extension). It is no longer necessary to specify a user to
connect with as this is always assumed to be the current user. The
bootstrap user can still be specified explicitly, however, as that
user might require elevated permissions on the remote node to
bootstrap.
* Functions that capture exceptions without re-throwing, such as
`ping_data_node()` and `get_user_mapping()`, have been refactored to
not do this as the transaction state and memory contexts are not in
states where it is safe to proceed as normal.
* Data node management functions now consistently check that any
foreign servers operated on are actually TimescaleDB server objects.
* Tests now run with a superuser a regular user specific to
clustering. These users have password auth enabled in `pg_hba.conf`,
which is required by the connection library when connecting as a
non-superuser. Tests have been refactored to bootstrap data nodes
using these user roles.
This change ensures that chunk replicas are created on remote
(datanode) servers whenever a chunk is created in a local distributed
hypertable.
Remote chunks are created using the `create_chunk()` function, which
has been slightly refactored to allow specifying an explicit chunk
table name. The one making the remote call also records the resulting
remote chunk IDs in its `chunk_server` mappings table.
Since remote command invokation without super-user permissions
requires password authentication, the test configuration files have
been updated to require password authentication for a cluster test
user that is used in tests.
The license header for SQL test files has been updated, but some tests
haven't had this new header applied. This change makes sure the new
header is applied to all test files.
This adds an internal API function to create a chunk using explicit
constraints (dimension slices). A function to export a chunk in a
format consistent with the chunk creation function is also added.
The chunk export/create functions are needed for distributed
hypertables so that an access node can create chunks on data nodes
according to its own (global) partitioning configuration.