72 Commits

Author SHA1 Message Date
Mats Kindahl
77776faf20 Fix port usage for add_data_node()
For a statement which only specify the database, we expect the data
node to be created on the same Postgres instance as the one where the
statement is executed.

    SELECT * FROM add_data_node('data1', database => 'base1');

However, if the port for the server is changed in the configuration
file to not use the default port, the command will try to connect to
the wrong Postgres server, namly the one listening on port 5432.

This commit fixes this by letting `host` and `port` parameters be NULL
by default and use the following logic to decide what port should be
used.

- If a port is explicitly provided, use that.

- If a port is not provided but a host is provided, it is assumed that
  the intention is to connect to a default-installed Postgres server on
  a different address, so use the default Postgres port (5432).

- If neither port nor host is provided, it assumed that the intention
  is to connect to the same server as where the command is executed, so
  use the port that was written in the configuration file.

The default host to use is still 'localhost', but it is not written
explicitly in the function definition in `ddl_api.sql`.

The commit also fixes one warning where an uninitialized variable could
be used.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
c8563b2d46 Add distributed_exec() function
This function allows users to execute a SQL query on a list of data
nodes. The purpose is to provide users a way to, e.g., create roles on
data nodes.

The current implementation is quite straightforward. Just execute any
provided query on a list of data nodes. The query will execute with
the current user role. The function does not return or print any
result values. In case of error, it will print the data node name and
a related error message.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
79f6223631 Replace UserMappings with a connection ID
This change replace UserMappings with newly introduced TSConnectionId
object, which represent a pair of foreign server id and local user id.

Authentication has been moved to non-password based, since original
UserMappings were used to store a data node user passwords as
well. This is a temporary step, until introduction of certificate
based authentication.

List of changes:

* add_data_node() password and bootstrap_password arguments removed

* introduced authentication using pgpass file

* RemoteTxn format string which represents tx changed to
  tx-version-xid-server_id-user_id

* data_node_dispatch, remote transaction cache, connection cache hash
  tables keys switched to TSConnectionId instead of user mappings

* remote_connection_open() been rework to exclude user options

* Tests upgraded, user mappings and passwords usage has been excluded
2020-05-27 17:31:09 +02:00
Mats Kindahl
ac3f0bcb92 Change order of parameters in attach_data_node
All data node functions except `attach_data_node` take the node name as
the first parameter. This commit changes the order of the two first
parameters to `attach_data_node` so that the node name is the first
parameter and the hypertable is the second parameter.
2020-05-27 17:31:09 +02:00
Erik Nordström
5309cd6c5f Repartition hypertables when attaching data node
Distributed hypertables are now repartitioned when attaching new data
nodes and the current number of partition (slices) in the first closed
(space) dimension is less than the number of data nodes. Increasing
the number of partitions is necessary to make use of a newly attached
data node. However, repartitioning is optional and can be avoided via
a boolean parameter in `attach_server()`.

In addition to the above repartitioning, this change also adds
informational messages to `create_hypertable` and
`set_number_partitions` to raise awareness of situations when the
number of partitions in the space dimensions is lower than the number
of attached data nodes.
2020-05-27 17:31:09 +02:00
Erik Nordström
b07461ec00 Refactor and harden data node management
This change refactors and hardens parts of data node management
functionality.

* A number of of permissions checks have been added to data node
  management functions. This includes checking that the user has
  proper permissions for both table and server objects.
* Permissions checks are now done when creating remote chunks on data
  nodes.
* The add_data_node() API function has been simplified and now returns
  more intuitive status about created objects (foreign server,
  database, extension). It is no longer necessary to specify a user to
  connect with as this is always assumed to be the current user. The
  bootstrap user can still be specified explicitly, however, as that
  user might require elevated permissions on the remote node to
  bootstrap.
* Functions that capture exceptions without re-throwing, such as
  `ping_data_node()` and `get_user_mapping()`, have been refactored to
  not do this as the transaction state and memory contexts are not in
  states where it is safe to proceed as normal.
* Data node management functions now consistently check that any
  foreign servers operated on are actually TimescaleDB server objects.
* Tests now run with a superuser a regular user specific to
  clustering. These users have password auth enabled in `pg_hba.conf`,
  which is required by the connection library when connecting as a
  non-superuser. Tests have been refactored to bootstrap data nodes
  using these user roles.
2020-05-27 17:31:09 +02:00
Brian Rowe
79fb46456f Rename server to data node
The timescale clustering code so far has been written referring to the
remote databases as 'servers'.  This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest.  In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database.  Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.

As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes.  This change has updated the code to rename
those instances.
2020-05-27 17:31:09 +02:00
niksa
0da34e840e Fix server detach/delete corner cases
Prevent server delete if the server contains data, unless user
specifies `force => true`. In case the server is the only data
replica, we don't allow delete/detach unless table/chunks are dropped.
The idea is to have the same semantics for delete as for detach since
delete actually calls detach

We also try to update pg_foreign_table when we delete server if there
is another server containing the same chunk.

An internal function is added to enable updating foreign table server
which might be useful in some cases since foreign table server is
considered a default server for that particular chunk.

Since this command needs to work even if the server we're trying to
remove is non responsive, we're not removing any data on the remote
data node.
2020-05-27 17:31:09 +02:00
niksa
2fd99c6f4b Block new chunks on data nodes
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.

Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.

Here are some examples.

Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`

Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`

Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`

This change adds the `force` argument to `detach_server` as well.  If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
2020-05-27 17:31:09 +02:00
niksa
d8d13d9475 Allow detaching servers from hypertables
A server can now be detached from one or more distributed hypertables
so that it no longer in use. We only allow detaching a server if there
is no data on the server and detaching it doesn't risk making a
hypertable under-replicated.

A user can detach a server for a specific hypertable, or for all
hypertables to which the server is attached.

`SELECT * FROM detach_server('server1', 'my_hypertable');`
`SELECT * FROM detach_server('server2');`
2020-05-27 17:31:09 +02:00
Brian Rowe
59e3d7f1bd Add create_distributed_hypertable command
This change adds a variant of the create_hypertable command that will
ensure the created table is distributed.
2020-05-27 17:31:09 +02:00
Brian Rowe
b1c6172d0a Add attach_server function
This adds an attach_server function which is used to associate a
server with an existing hypertable.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
d8982c3e15 Add add_server() support for remote server bootstrapping
This patch adds functionality for automatic database and extension
creation on remote server. New function arguments: bootstrap_database, bootstrap_user
and bootstrap_password.
2020-05-27 17:31:09 +02:00
Erik Nordström
e2371558f7 Create chunks on remote servers
This change ensures that chunk replicas are created on remote
(datanode) servers whenever a chunk is created in a local distributed
hypertable.

Remote chunks are created using the `create_chunk()` function, which
has been slightly refactored to allow specifying an explicit chunk
table name. The one making the remote call also records the resulting
remote chunk IDs in its `chunk_server` mappings table.

Since remote command invokation without super-user permissions
requires password authentication, the test configuration files have
been updated to require password authentication for a cluster test
user that is used in tests.
2020-05-27 17:31:09 +02:00
Erik Nordström
125f793307 Add password parameter to add_server()
Establishing a remote connection requires a password, unless the
connection is made as a superuser. Therefore, this change adds the
option to specify a password in the `add_server()` command.  This is a
required parameter unless called as a superuser.
2020-05-27 17:31:09 +02:00
Erik Nordström
ece582d458 Add mappings table for remote hypertables
In a multi-node (clustering) setup, TimescaleDB needs to track which
remote servers have data for a particular distributed hypertable. It
also needs to know which servers to place new chunks on and to use in
queries against a distributed hypertable.

A new metadata table, `hypertable_server` is added to map a local
hypertable ID to a hypertable ID on a remote server. We require that
the remote hypertable has the same schema and name as the local
hypertable.

When a local server is removed (using `DROP SERVER` or our
`delete_server()`), all remote hypertable mappings for that server
should also be removed.
2020-05-27 17:31:09 +02:00
Erik Nordström
eca7cc337a Add server management API and functionality
Servers for a scale-out clustering setup can now be added and deleted
with `add_server()` and `delete_server()`, providing a convenience API
for server management.

While similar functionality can be achieved using the standard
PostgreSQL `CREATE SERVER` and `CREATE USER MAPPING` commands, this
new API makes it easier to add clustering servers and user mappings
consistent with the needs of TimescaleDBs particular clustering setup.

The API currently works with the `postgres_fdw` foreign data
wrapper. It will be updated to use our own foreign data wrapper once
it is available.
2020-05-27 17:31:09 +02:00
Sven Klemm
a11910b5d5 Mark drop_chunks as VOLATILE and PARALLEL UNSAFE
The drop_chunks function was incorrectly marked as stable and
parallel safe this patch fixes the attributes.
2019-07-22 07:29:33 +02:00
Stephen Polcyn
1dc1850793 Drop_chunks returns list of dropped chunks
Previously, drop_chunks returned an empty table, giving the user
no indication of what (if anything) had happened.
Now, drop_chunks returns a list of the chunks identifiers in the
same style as show_chunks, with the chunk's schema and table name.

Notably, when show_chunks is called directly before drop_chunks, the
output should be the same.
2019-07-19 12:13:24 -04:00
Joshua Lockerman
45fb1fc2c8 Handle drop_chunks on tables that have cont aggs
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
2019-04-26 13:08:00 -04:00
Sven Klemm
f89fd07c5b Remove year from SQL file license text
This changes the license text for SQL files to be identical
with the license text for C files.
2019-01-13 23:30:22 +01:00
Joshua Lockerman
47b5b7d553 Log which chunks are dropped by background workers
We don't want to do this silently, so that users are
able to debug where their chunks went.
2019-01-10 13:53:38 -05:00
Erik Nordström
e4a4f8e2f8 Add support for functions on open (time) dimensions
TimescaleDB has always supported functions on closed (space)
dimension, i.e., for hash partitioning. However, functions have not
been supported on open (time) dimensions, instead requiring columns to
have a supported time type (e.g, integer or timestamp). This restricts
the tables that can be time partitioned. Tables with custom "time"
types, which can be transformed by a function expression into a
supported time type, are not supported.

This change generalizes partitioning so that both open and closed
dimensions can have an associated partitioning function that
calculates a dimensional value. Fortunately, since we already support
functions on closed dimensions, the changes necessary to support this
on any dimension are minimal. Thus, open dimensions now support an
(optional) partitioning function that transforms the input type to a
supported time type (e.g., integer or timestamp type). Any indexes on
such dimensional columns become expression indexes.

Tests have been added for chunk expansion and the hashagg and sort
transform optimizations on tables that are using a time partitioning
function.

Currently, not all of these optimizations are well supported, but this
could potentially be fixed in the future.
2018-12-12 10:14:31 +01:00
Amy Tai
83014ee2b0 Implement drop_chunks in C
Remove the existing PLPGSQL function that implements drop_chunks, replacing it with a direct call to the C function, which also implements the old PLPGSQL checks in C. Refactor out much of the code shared between the C implementations of show_chunks and drop_chunks.
2018-12-06 13:27:12 -05:00
Narek Galstyan
9a3402809f Implement show_chunks in C and have drop_chunks use it
Timescale provides an efficient and easy to use api to drop individual
chunks from timescale database through drop_chunks. This PR builds on
that functionality and through a new show_chunks function gives the
opportunity to see the chunks that would be dropped if drop_chunks was run.
Additionally, it adds a newer_than option to drop_chunks (also supported
by show_chunks) that allows to see/drop chunks in an interval or newer
than a point in time.

This commit includes:
    - Implementation of show_chunks in C
    - Additional helper functions to work with chunks
    - New version of drop_chunks in sql that uses show_chunks. This
      	  also adds a newer_than option to drop_chunks
    - More enhanced tests of drop_chunks and new tests for show_chunks

Among other reasons, show_chunks was implemented in C in order
to be able to have both older_than and newer_than arguments be null. This
was not possible in SQL because the arguments had to have polymorphic types
and whether they are used in function body or not, PL/pgSQL requires these
arguments to typecheck.
2018-11-28 13:46:07 -05:00
Joshua Lockerman
e06733acf0 Fix casing in SQL license header to be consistent with elsewhere 2018-11-15 15:18:58 -05:00
Joshua Lockerman
20ec6914c0 Add license headers to SQL files and test code 2018-10-29 13:28:19 -04:00
Sven Klemm
3e3bb0c796 Add bool created to create_hypertable and add_dimension return value
Add bool created to return value of create_hypertable and add_dimension.
When if_not_exists is true and creation is skipped because the object
already exists created will be false, otherwise it will be true. This
modifies the functions to return meta data even when no object was
created.
2018-10-17 17:12:53 +02:00
Sven Klemm
d9b2dfed6b Change return value of add_dimension to TABLE
Change the return value of add_dimension to return a record consisting
of dimension_id, schema_name, table_name, column_name. This improves
user feedback about success of the operation but also gives the function
an API returning useful information for non-human consumption.
2018-10-15 17:53:00 +02:00
Sven Klemm
a83e2838c9 Change return value of create_hypertable to TABLE
Change create_hypertable to return a record consisting of
(hypertable_id, schema_name, table_name). This improves user feedback
about success of the operation but also gives the function an API
returning useful information for non-human consumption.
2018-10-12 17:40:10 +02:00
Joshua Lockerman
974788516a Prefix public C functions with ts_
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
2018-09-27 11:45:04 -04:00
Erik Nordström
2e7b32cd91 Add WARNING when doing min-max heap scan for adaptive chunking
Adaptive chunking uses the min and max value of previous chunks
to estimate their "fill factor". Ideally, min and max should be
retreived using an index, but if no index exists we fall back
to a heap scan. A heap scan can be very expensive, so we now
raise a WARNING if no index exists.

This change also renames set_adaptive_chunk_sizing() to simply
set_adaptive_chunking().
2018-08-08 17:01:31 +02:00
Erik Nordström
9c9cdca6d3 Add support for adaptive chunk sizing
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.

Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
2018-08-08 17:01:31 +02:00
Mike Futerko
4f2f1a6eb7 Update the error messages to conform with the style guide; Fix tests
An attempt to unify the error messages to conform with the PostgreSQL error
messages style guide. See the link below:
https://www.postgresql.org/docs/current/static/error-style-guide.html
2018-07-10 12:55:02 -04:00
Narek Galstyan
4b4211fe94 Fix some external functions when setting a custom schema
Make sure internal references to timescale functions use the correct schema to refer to these functions. Fixes #554
2018-06-19 10:15:19 -04:00
Erik Nordström
ef744916a8 Migrate table data when creating a hypertable
Tables can now hold existing data, which is optionally migrated from
the main table to chunks when create_hypertable() is called.

The data migration is similar to the COPY path, with the single
difference that the inserted/copied tuples come from an existing table
instead of being read from a file. After the data has been migrated,
the main table is truncated.

One potential downside of this approach is that all of this happens in
a single transaction, which means that the table is blocked while
migration is ongoing, preventing inserts by other transactions.
2018-02-15 23:30:54 +01:00
Erik Nordström
d6baccb9d7 Improve tablespace handling, including blocks for DROP and REVOKE
This change improves the handling of tablespaces as follows:

- Add if_not_attached / if_attached options to attach_tablespace() and
  detach_tablespace(), respectively
- Block DROP tablespace if it is still attached to a table
- Block REVOKE if it means the table owner no longer has CREATE
  permissions on an attached tablespace
- Make error messages follow the PostgreSQL style guide
2018-02-05 23:16:20 +01:00
Erik Nordström
6e011d12fb Refactor hypertable-related API functions
This is a continuation of prior efforts to refactor API functions in C
to:

- improve usage of proper error codes
- use error messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
- move towards doing all metadata updates using a consistent catalog API

Most importantly, `create_hypertable()` has been refactored in C,
which simplifies a lot of code that previously required
upcalls/downcalls between C code and plpgsql code, or duplicated
functionality between the two environments.
2018-01-26 18:42:20 +01:00
Erik Nordström
71962b86ec Refactor dimension-related API functions
The functions for adding and updating dimensions have been refactored
in C to:

- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER

A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.

A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
2018-01-25 19:02:34 +01:00
Erik Nordström
4df8f287a6 Add proper permissions handling for associated (chunk) schemas
A hypertable's associated schema is used to create and store internal
data tables (chunks). A hypertable creates tables in that schema,
typically with full superuser permissions, regardless of whether the
hypertable's owner or the current user have permissions for the schema.
If the schema doesn't exist, the hypertable will create it when
creating the first chunk, even though the user or table owner does
not have permissions to create schemas in the database.

This change adds proper permissions checks to create_hypertable() so
that users cannot create hypertables with a custom associated schema
unless they have the proper permissions on the schema or the database.

Chunks are also no longer created with internal schema permissions if
the associated schema is something different from the internal schema.
2017-12-28 11:24:29 +01:00
Matvey Arye
2fe447ba14 Make TimescaleDB work with pg_upgrade
Compatibility with pg_upgrade required 2 changes:
1) search_path on functions cannot be blank for pg_upgrade.
2) The timescaledb.restoring GUC had to apply to more code (now moved to
   higher-level check)

`pg_upgrade` must be passed the following option: `-O "-c timescaledb.restoring='on'"`
2017-12-19 11:47:49 -05:00
Erik Nordström
176b75e43d Add command to show tablespaces attached to a hypertable
Users can now call `show_tablespaces()` to list the tablespaces
attached to a particular hypertable.
2017-12-09 18:27:50 +01:00
Erik Nordström
6e92383592 Add function to detach tablespaces from hypertables
Tablespaces can now be detached from hypertables using
`tablespace_detach()`. This function can either detach
a tablespace from all tables or only a specific table.

Having the ability to detach tablespace allows more
advanced storage management, for instance, one can detach
tablespaces that are running low on diskspace while attaching
new ones to replace the old ones.
2017-12-09 18:27:50 +01:00
Erik Nordström
e593876cb0 Refactor tablespace handling
Attaching tablespaces to hypertables is now handled
in native code, with improved permissions checking and
caching of tablespaces in the Hypertable data object.
2017-12-09 18:27:50 +01:00
Rob Kiefer
e44e47ed88 Update add_dimension to take INTERVAL times
The user should be able to add time dimensions using INTERVAL when
the column type is TIMESTAMP/TIMESTAMPTZ/DATE, so this change adds
that support.

Additionally it adds some additional tests and checks for
add_dimension, e.g., a nice error when the table is not a
hypertable.
2017-12-07 12:09:35 -05:00
Rob Kiefer
0763e62f8f Update set_chunk_time_interval to take INTERVAL times
For convenience, the user should be able to specify the new
chunk time intervals using INTERVAL datatype if the hypertable is
using a TIMESTAMP/TIMESTAMPTZ/DATE datatype for its time column.
2017-12-07 12:09:35 -05:00
Michael J. Freedman
51854ac5c9 Fix error message to reflect that drop_chunks can take a DATE interval 2017-12-05 11:26:01 -05:00
Matvey Arye
8b772be994 Change time handling in drop_chunks for TIMESTAMP times
This PR fixes the handling of drop_chunks when the hypertable's
time field is a TIMESTAMP or DATE field. Previously, such
hypertables needed drop_chunks to be given a timestamptz in UTC.
Now, drop_chunks can take a DATE or TIMESTAMP. Also, the INTERVAL
version of drop_chunks correctly handles these cases.

A consequence of this change is that drop_chunks cannot be called
on multiple tables (with table_name = NULL or schema_name = NULL)
if the tables have different time column types.
2017-11-27 16:17:42 -05:00
Erik Nordström
1e947da456 Permission fixes and allow SET ROLE
This change reduces the usage of SECURITY DEFINER on SQL
functions and fixes related permissions issues. It also
properly checks hypertable permissions relative the current_user
instead of the session_user, which otherwise breaks SET ROLE,
among other things.
2017-11-27 15:55:26 +01:00
jwdeitch
6594018e32 Handle when create_hypertable is invoked on partitioned table
- create_hypertable will raise exception on invocation
2017-11-21 12:28:16 -05:00