For a statement which only specify the database, we expect the data
node to be created on the same Postgres instance as the one where the
statement is executed.
SELECT * FROM add_data_node('data1', database => 'base1');
However, if the port for the server is changed in the configuration
file to not use the default port, the command will try to connect to
the wrong Postgres server, namly the one listening on port 5432.
This commit fixes this by letting `host` and `port` parameters be NULL
by default and use the following logic to decide what port should be
used.
- If a port is explicitly provided, use that.
- If a port is not provided but a host is provided, it is assumed that
the intention is to connect to a default-installed Postgres server on
a different address, so use the default Postgres port (5432).
- If neither port nor host is provided, it assumed that the intention
is to connect to the same server as where the command is executed, so
use the port that was written in the configuration file.
The default host to use is still 'localhost', but it is not written
explicitly in the function definition in `ddl_api.sql`.
The commit also fixes one warning where an uninitialized variable could
be used.
This function allows users to execute a SQL query on a list of data
nodes. The purpose is to provide users a way to, e.g., create roles on
data nodes.
The current implementation is quite straightforward. Just execute any
provided query on a list of data nodes. The query will execute with
the current user role. The function does not return or print any
result values. In case of error, it will print the data node name and
a related error message.
This change replace UserMappings with newly introduced TSConnectionId
object, which represent a pair of foreign server id and local user id.
Authentication has been moved to non-password based, since original
UserMappings were used to store a data node user passwords as
well. This is a temporary step, until introduction of certificate
based authentication.
List of changes:
* add_data_node() password and bootstrap_password arguments removed
* introduced authentication using pgpass file
* RemoteTxn format string which represents tx changed to
tx-version-xid-server_id-user_id
* data_node_dispatch, remote transaction cache, connection cache hash
tables keys switched to TSConnectionId instead of user mappings
* remote_connection_open() been rework to exclude user options
* Tests upgraded, user mappings and passwords usage has been excluded
All data node functions except `attach_data_node` take the node name as
the first parameter. This commit changes the order of the two first
parameters to `attach_data_node` so that the node name is the first
parameter and the hypertable is the second parameter.
Distributed hypertables are now repartitioned when attaching new data
nodes and the current number of partition (slices) in the first closed
(space) dimension is less than the number of data nodes. Increasing
the number of partitions is necessary to make use of a newly attached
data node. However, repartitioning is optional and can be avoided via
a boolean parameter in `attach_server()`.
In addition to the above repartitioning, this change also adds
informational messages to `create_hypertable` and
`set_number_partitions` to raise awareness of situations when the
number of partitions in the space dimensions is lower than the number
of attached data nodes.
This change refactors and hardens parts of data node management
functionality.
* A number of of permissions checks have been added to data node
management functions. This includes checking that the user has
proper permissions for both table and server objects.
* Permissions checks are now done when creating remote chunks on data
nodes.
* The add_data_node() API function has been simplified and now returns
more intuitive status about created objects (foreign server,
database, extension). It is no longer necessary to specify a user to
connect with as this is always assumed to be the current user. The
bootstrap user can still be specified explicitly, however, as that
user might require elevated permissions on the remote node to
bootstrap.
* Functions that capture exceptions without re-throwing, such as
`ping_data_node()` and `get_user_mapping()`, have been refactored to
not do this as the transaction state and memory contexts are not in
states where it is safe to proceed as normal.
* Data node management functions now consistently check that any
foreign servers operated on are actually TimescaleDB server objects.
* Tests now run with a superuser a regular user specific to
clustering. These users have password auth enabled in `pg_hba.conf`,
which is required by the connection library when connecting as a
non-superuser. Tests have been refactored to bootstrap data nodes
using these user roles.
The timescale clustering code so far has been written referring to the
remote databases as 'servers'. This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest. In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database. Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.
As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes. This change has updated the code to rename
those instances.
Prevent server delete if the server contains data, unless user
specifies `force => true`. In case the server is the only data
replica, we don't allow delete/detach unless table/chunks are dropped.
The idea is to have the same semantics for delete as for detach since
delete actually calls detach
We also try to update pg_foreign_table when we delete server if there
is another server containing the same chunk.
An internal function is added to enable updating foreign table server
which might be useful in some cases since foreign table server is
considered a default server for that particular chunk.
Since this command needs to work even if the server we're trying to
remove is non responsive, we're not removing any data on the remote
data node.
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.
Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.
Here are some examples.
Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`
Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`
Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`
This change adds the `force` argument to `detach_server` as well. If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
A server can now be detached from one or more distributed hypertables
so that it no longer in use. We only allow detaching a server if there
is no data on the server and detaching it doesn't risk making a
hypertable under-replicated.
A user can detach a server for a specific hypertable, or for all
hypertables to which the server is attached.
`SELECT * FROM detach_server('server1', 'my_hypertable');`
`SELECT * FROM detach_server('server2');`
This patch adds functionality for automatic database and extension
creation on remote server. New function arguments: bootstrap_database, bootstrap_user
and bootstrap_password.
This change ensures that chunk replicas are created on remote
(datanode) servers whenever a chunk is created in a local distributed
hypertable.
Remote chunks are created using the `create_chunk()` function, which
has been slightly refactored to allow specifying an explicit chunk
table name. The one making the remote call also records the resulting
remote chunk IDs in its `chunk_server` mappings table.
Since remote command invokation without super-user permissions
requires password authentication, the test configuration files have
been updated to require password authentication for a cluster test
user that is used in tests.
Establishing a remote connection requires a password, unless the
connection is made as a superuser. Therefore, this change adds the
option to specify a password in the `add_server()` command. This is a
required parameter unless called as a superuser.
In a multi-node (clustering) setup, TimescaleDB needs to track which
remote servers have data for a particular distributed hypertable. It
also needs to know which servers to place new chunks on and to use in
queries against a distributed hypertable.
A new metadata table, `hypertable_server` is added to map a local
hypertable ID to a hypertable ID on a remote server. We require that
the remote hypertable has the same schema and name as the local
hypertable.
When a local server is removed (using `DROP SERVER` or our
`delete_server()`), all remote hypertable mappings for that server
should also be removed.
Servers for a scale-out clustering setup can now be added and deleted
with `add_server()` and `delete_server()`, providing a convenience API
for server management.
While similar functionality can be achieved using the standard
PostgreSQL `CREATE SERVER` and `CREATE USER MAPPING` commands, this
new API makes it easier to add clustering servers and user mappings
consistent with the needs of TimescaleDBs particular clustering setup.
The API currently works with the `postgres_fdw` foreign data
wrapper. It will be updated to use our own foreign data wrapper once
it is available.
Previously, drop_chunks returned an empty table, giving the user
no indication of what (if anything) had happened.
Now, drop_chunks returns a list of the chunks identifiers in the
same style as show_chunks, with the chunk's schema and table name.
Notably, when show_chunks is called directly before drop_chunks, the
output should be the same.
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
TimescaleDB has always supported functions on closed (space)
dimension, i.e., for hash partitioning. However, functions have not
been supported on open (time) dimensions, instead requiring columns to
have a supported time type (e.g, integer or timestamp). This restricts
the tables that can be time partitioned. Tables with custom "time"
types, which can be transformed by a function expression into a
supported time type, are not supported.
This change generalizes partitioning so that both open and closed
dimensions can have an associated partitioning function that
calculates a dimensional value. Fortunately, since we already support
functions on closed dimensions, the changes necessary to support this
on any dimension are minimal. Thus, open dimensions now support an
(optional) partitioning function that transforms the input type to a
supported time type (e.g., integer or timestamp type). Any indexes on
such dimensional columns become expression indexes.
Tests have been added for chunk expansion and the hashagg and sort
transform optimizations on tables that are using a time partitioning
function.
Currently, not all of these optimizations are well supported, but this
could potentially be fixed in the future.
Remove the existing PLPGSQL function that implements drop_chunks, replacing it with a direct call to the C function, which also implements the old PLPGSQL checks in C. Refactor out much of the code shared between the C implementations of show_chunks and drop_chunks.
Timescale provides an efficient and easy to use api to drop individual
chunks from timescale database through drop_chunks. This PR builds on
that functionality and through a new show_chunks function gives the
opportunity to see the chunks that would be dropped if drop_chunks was run.
Additionally, it adds a newer_than option to drop_chunks (also supported
by show_chunks) that allows to see/drop chunks in an interval or newer
than a point in time.
This commit includes:
- Implementation of show_chunks in C
- Additional helper functions to work with chunks
- New version of drop_chunks in sql that uses show_chunks. This
also adds a newer_than option to drop_chunks
- More enhanced tests of drop_chunks and new tests for show_chunks
Among other reasons, show_chunks was implemented in C in order
to be able to have both older_than and newer_than arguments be null. This
was not possible in SQL because the arguments had to have polymorphic types
and whether they are used in function body or not, PL/pgSQL requires these
arguments to typecheck.
Add bool created to return value of create_hypertable and add_dimension.
When if_not_exists is true and creation is skipped because the object
already exists created will be false, otherwise it will be true. This
modifies the functions to return meta data even when no object was
created.
Change the return value of add_dimension to return a record consisting
of dimension_id, schema_name, table_name, column_name. This improves
user feedback about success of the operation but also gives the function
an API returning useful information for non-human consumption.
Change create_hypertable to return a record consisting of
(hypertable_id, schema_name, table_name). This improves user feedback
about success of the operation but also gives the function an API
returning useful information for non-human consumption.
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
Adaptive chunking uses the min and max value of previous chunks
to estimate their "fill factor". Ideally, min and max should be
retreived using an index, but if no index exists we fall back
to a heap scan. A heap scan can be very expensive, so we now
raise a WARNING if no index exists.
This change also renames set_adaptive_chunk_sizing() to simply
set_adaptive_chunking().
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.
Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
Tables can now hold existing data, which is optionally migrated from
the main table to chunks when create_hypertable() is called.
The data migration is similar to the COPY path, with the single
difference that the inserted/copied tuples come from an existing table
instead of being read from a file. After the data has been migrated,
the main table is truncated.
One potential downside of this approach is that all of this happens in
a single transaction, which means that the table is blocked while
migration is ongoing, preventing inserts by other transactions.
This change improves the handling of tablespaces as follows:
- Add if_not_attached / if_attached options to attach_tablespace() and
detach_tablespace(), respectively
- Block DROP tablespace if it is still attached to a table
- Block REVOKE if it means the table owner no longer has CREATE
permissions on an attached tablespace
- Make error messages follow the PostgreSQL style guide
This is a continuation of prior efforts to refactor API functions in C
to:
- improve usage of proper error codes
- use error messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
- move towards doing all metadata updates using a consistent catalog API
Most importantly, `create_hypertable()` has been refactored in C,
which simplifies a lot of code that previously required
upcalls/downcalls between C code and plpgsql code, or duplicated
functionality between the two environments.
The functions for adding and updating dimensions have been refactored
in C to:
- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.
A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
A hypertable's associated schema is used to create and store internal
data tables (chunks). A hypertable creates tables in that schema,
typically with full superuser permissions, regardless of whether the
hypertable's owner or the current user have permissions for the schema.
If the schema doesn't exist, the hypertable will create it when
creating the first chunk, even though the user or table owner does
not have permissions to create schemas in the database.
This change adds proper permissions checks to create_hypertable() so
that users cannot create hypertables with a custom associated schema
unless they have the proper permissions on the schema or the database.
Chunks are also no longer created with internal schema permissions if
the associated schema is something different from the internal schema.
Compatibility with pg_upgrade required 2 changes:
1) search_path on functions cannot be blank for pg_upgrade.
2) The timescaledb.restoring GUC had to apply to more code (now moved to
higher-level check)
`pg_upgrade` must be passed the following option: `-O "-c timescaledb.restoring='on'"`
Tablespaces can now be detached from hypertables using
`tablespace_detach()`. This function can either detach
a tablespace from all tables or only a specific table.
Having the ability to detach tablespace allows more
advanced storage management, for instance, one can detach
tablespaces that are running low on diskspace while attaching
new ones to replace the old ones.
Attaching tablespaces to hypertables is now handled
in native code, with improved permissions checking and
caching of tablespaces in the Hypertable data object.
The user should be able to add time dimensions using INTERVAL when
the column type is TIMESTAMP/TIMESTAMPTZ/DATE, so this change adds
that support.
Additionally it adds some additional tests and checks for
add_dimension, e.g., a nice error when the table is not a
hypertable.
For convenience, the user should be able to specify the new
chunk time intervals using INTERVAL datatype if the hypertable is
using a TIMESTAMP/TIMESTAMPTZ/DATE datatype for its time column.
This PR fixes the handling of drop_chunks when the hypertable's
time field is a TIMESTAMP or DATE field. Previously, such
hypertables needed drop_chunks to be given a timestamptz in UTC.
Now, drop_chunks can take a DATE or TIMESTAMP. Also, the INTERVAL
version of drop_chunks correctly handles these cases.
A consequence of this change is that drop_chunks cannot be called
on multiple tables (with table_name = NULL or schema_name = NULL)
if the tables have different time column types.
This change reduces the usage of SECURITY DEFINER on SQL
functions and fixes related permissions issues. It also
properly checks hypertable permissions relative the current_user
instead of the session_user, which otherwise breaks SET ROLE,
among other things.