119 Commits

Author SHA1 Message Date
Erik Nordström
7b64bf20f5 Use num data nodes as default num partitions
For convenience, this adds the option to create a distributed
hypertable without specifying the number of partitions in the space
dimension even in the case when no data nodes are specified
(defaulting to the data nodes added to the database).
2020-05-27 17:31:09 +02:00
Erik Nordström
9108ddad15 Fix corner cases when detaching data nodes
This change fixes the following:

* Refactor the code for setting the default data node for a chunk. The
  `set_chunk_default_data_node()` API function now takes a
  `regclass`/`oid` instead of separate schema + table names and
  returns `true` when a new data node is set and `false` if called
  with a data node that is already the default. Like before,
  exceptions are thrown on errors. It also does proper permissions
  checks. The related code has been moved from `data_node.c` to
  `chunk.c` since this is an operation on a chunk, and the code now
  also lives in the `tsl` directory since this is non-trivial logic
  that should fall under the TSL license.
* When setting the default data node on a chunk (failing over to
  another data node), it is now verified that the new data node
  actually has a replica of the chunk and that the corresponding
  foreign server belongs to the "right" foreign data wrapper.
* Error messages and permissions handling have been tweaked.
2020-05-27 17:31:09 +02:00
Erik Nordström
b07461ec00 Refactor and harden data node management
This change refactors and hardens parts of data node management
functionality.

* A number of of permissions checks have been added to data node
  management functions. This includes checking that the user has
  proper permissions for both table and server objects.
* Permissions checks are now done when creating remote chunks on data
  nodes.
* The add_data_node() API function has been simplified and now returns
  more intuitive status about created objects (foreign server,
  database, extension). It is no longer necessary to specify a user to
  connect with as this is always assumed to be the current user. The
  bootstrap user can still be specified explicitly, however, as that
  user might require elevated permissions on the remote node to
  bootstrap.
* Functions that capture exceptions without re-throwing, such as
  `ping_data_node()` and `get_user_mapping()`, have been refactored to
  not do this as the transaction state and memory contexts are not in
  states where it is safe to proceed as normal.
* Data node management functions now consistently check that any
  foreign servers operated on are actually TimescaleDB server objects.
* Tests now run with a superuser a regular user specific to
  clustering. These users have password auth enabled in `pg_hba.conf`,
  which is required by the connection library when connecting as a
  non-superuser. Tests have been refactored to bootstrap data nodes
  using these user roles.
2020-05-27 17:31:09 +02:00
Brian Rowe
79fb46456f Rename server to data node
The timescale clustering code so far has been written referring to the
remote databases as 'servers'.  This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest.  In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database.  Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.

As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes.  This change has updated the code to rename
those instances.
2020-05-27 17:31:09 +02:00
Brian Rowe
e110a42a2b Add space usage utilities to distributed database
This change adds a new utility function for postgres
`server_hypertable_info`.  This function will contact a provided node
and pull down the space information for all the distributed hypertables
on that node.

Additionally, a new view `distributed_server_info` has been added to
timescaledb_information.  This view leverages the new
remote_hypertable_data function to display a list of nodes, along with
counts of tables, chunks, and total bytes used by distributed data.

Finally, this change also adds a `hypertable_server_relation_size`
function, which, given the name of a distributed hypertable, will print
the space information for that hypertable on each node of the
distributed database.
2020-05-27 17:31:09 +02:00
niksa
0da34e840e Fix server detach/delete corner cases
Prevent server delete if the server contains data, unless user
specifies `force => true`. In case the server is the only data
replica, we don't allow delete/detach unless table/chunks are dropped.
The idea is to have the same semantics for delete as for detach since
delete actually calls detach

We also try to update pg_foreign_table when we delete server if there
is another server containing the same chunk.

An internal function is added to enable updating foreign table server
which might be useful in some cases since foreign table server is
considered a default server for that particular chunk.

Since this command needs to work even if the server we're trying to
remove is non responsive, we're not removing any data on the remote
data node.
2020-05-27 17:31:09 +02:00
niksa
2fd99c6f4b Block new chunks on data nodes
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.

Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.

Here are some examples.

Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`

Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`

Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`

This change adds the `force` argument to `detach_server` as well.  If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
2020-05-27 17:31:09 +02:00
niksa
d8d13d9475 Allow detaching servers from hypertables
A server can now be detached from one or more distributed hypertables
so that it no longer in use. We only allow detaching a server if there
is no data on the server and detaching it doesn't risk making a
hypertable under-replicated.

A user can detach a server for a specific hypertable, or for all
hypertables to which the server is attached.

`SELECT * FROM detach_server('server1', 'my_hypertable');`
`SELECT * FROM detach_server('server2');`
2020-05-27 17:31:09 +02:00
Erik Nordström
2f43408eb5 Push down partitionwise aggregates to servers
This change adds support for pushing down FULL partitionwise
aggregates to remote servers. Partial partitionwise aggregates cannot
yet be pushed down since that requires a way to tell the remote server
to compute a specific partial.

NOTE: Push-down aggregates are a PG11 only feature as it builds on top
of partitionwise aggregate push-down only available in
PG11. Therefore, a number of query-running tests now only run on PG11,
since these have different output on PG10.

To make push downs work on a per-server basis, hypertables are now
first expended into chunk append plans. This is useful to let the
planner do chunk exclusion and cost estimation of individual
chunks. The append path is then converted into a per-server plan by
grouping chunks by servers, with reduced cost because there is only
one startup cost per server instead of per chunk.

Future optimizations might consider avoiding the original per-chunk
plan computation, in order to increase planning spead.

To make use of existing PostgreSQL planning code for partitionwise
aggregates, we need to create range table entries for the server
relations even though these aren't "real" tables in the system. This
is because the planner code expects those entries to be present for
any "partitions" it is planning aggregates on (note that in
"declarative partitioning" all partitions are system tables). For this
purpose, we create range table entries for each server that points to
the root hypertable relation. This is in a sense "correct" since each
per-server relation is an identical (albeit partial) hypertable on the
remote server. The upside of pointing the server rel's range table
entry to the root hypertable is that the planner can make use of the
indexes on the hypertable for planning purposes. This leads to more
efficient remote queries when, e.g., ordering is important (i.e., we
get push down sorts for free).
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
96727fa5c4 Add support for distributed peer ID
This change makes it possible for a data node to distinguish between
regular client connections and distributed database connections (from
the access node).

This functionality will be needed for decision making based on the
connection type, for example allow or block a DDL commands on a data
node.
2020-05-27 17:31:09 +02:00
niksa
6f3848e744 Add function to check server liveness
Try connecting to a server and running `SELECT 1`. It returns true
if succeed. If fails false is returned. There can be many reasons to
fail: no valid UserMapping, server is down or failed running `SELECT 1`.
More information about failure is written to server log.

`timescaledb_information.server` view is updated to show server status.
2020-05-27 17:31:09 +02:00
Brian Rowe
5c643e0ac4 Add distributed group id and enforce topology
This change adds a distributed database id to the installation data for a
database.  It also provides a number of utilities that can be used for
getting/setting/clearing this value or using it to determing if a database is
a frontend, backend, or not a member of distributed database.

This change also includes modifications to the add_server and delete_server
functions to check the distributed id to ensure the operation is allowed, and
then update or clear it appropriately.  After this changes it will no longer
be possible to add a database as a backend to multiple frontend databases, nor
will it be possible to add a frontend database as a backend to any other
database.
2020-05-27 17:31:09 +02:00
Brian Rowe
106a5a4bc5 Implement support for remote copy operations
With this change a COPY operation that comes into a timescale frontend
for a distributed hypertable will parse the incoming rows and pass them
to the backends hosting the chunks into which the data will be written.
This will require that the incoming COPY operation is in text or csv
format (no support for binary yet).
2020-05-27 17:31:09 +02:00
Erik Nordström
ed1b9d19f1 Implement per-server batching for remote INSERTs
This change ensures inserted tuples are sent in batches to servers
instead of tuple-by-tuple. This should reduce a lot of back-and-forth
communication that otherwise incur significant overhead. The
per-server batch size can be set using the GUC
`timescaledb.max_insert_batch_size`, which defaults to 1000. Note that
this number is the maximum number of tuples stored per server before
they are flushed, and that the original INSERT statement's tuples will
be split across these servers. That is, if the INSERT statement has
3000 tuples, and there are three backend datanodes, then they will
roughly get 1000 tuples each.

The batch size can determine latency by, e.g., spreading the work
across a number of smaller batches, as opposed to deferring inserts to
one big batch at the end of the transaction. Note that batched tuples
are flushed at the end of execution irrespective of whether the flush
threshold is reached or not.
2020-05-27 17:31:09 +02:00
Matvey Arye
9880bc84e7 Query hypertables on a per-server instead of a per-chunk basis
This optimization enhances the query when querying a hypertable
using the timescale fdw. Previously such queries created execution
nodes and queries on a per-chunk basis. This PR combines all the chunks
belonging to the same hypertable and server together so that only one
query and executor node are created per hypertable-server.

This is accomplished by first changing the chunk expansion code to
not do the table expansion for remote hypertables and instead simply
save the chunk oids in a metadata field in TimescaleDBPrivate.

Next, we intercept the set rel pathlist hook to create paths for the
hypertable-server nodes. This uses the new server chunk assignment
module to choose which chunks will be fetched from which servers. For
now, we have only one assignment strategy but later we can have multiple
strategies each creating it's own paths and having the planner choose
the cheapest path using standard methodologies.

Finally, during the plan creation phase we pass down the server chunk
assignment to the deparser so that it can add a `chunks_in` call to the
where clause. This tells the data node which chunks to use and is
necessary when we have replicated chunks so that two servers don't
return data for the same chunk. An alternative approach for deparsing
`chunks_in` could have been to add a clause to remote_exprs and have
the deparser include that in the WHERE clause that way. However, the
standard way to deparse whole-row expressions is using the ROW syntax
(to protect against different schema definitions on local and remote
nodes). However, 'chunks_in' requires a record, not row reference
so that approach was rejected as being too awkward. Some additional
deparsing changes were made to handle base relations that don't have
an associated foreign-table (i.e. the hypertable-server case).

This commit also changes the way fdw_private is treated on RelOptInfo.
Previously, this could have been NULL, a TimescaleDBPrivate object
or a TsFwdRelationInfo. This led to some bugs and awkwardness as
well as increased lack of type-safety. Now, this field could only
be NULL or TimescaleDBPrivate and the TsFwdRelationInfo object
is now an optional and type-safe member of TimescaleDBPrivate.
2020-05-27 17:31:09 +02:00
Brian Rowe
b1c6172d0a Add attach_server function
This adds an attach_server function which is used to associate a
server with an existing hypertable.
2020-05-27 17:31:09 +02:00
Matvey Arye
e7ba327f4c Add resolve and heal infrastructure for 2PC
This commit adds the ability to resolve whether or not 2PC
transactions have been committed or aborted and also adds a heal
function to resolve transactions that have been prepared but not
committed or rolled back.

This commit also removes the server id of the primary key on the
remote_txn table and adds another index. This was done because the
`remote_txn_persistent_record_exists` should not rely on the server
being contacted but should rather just check for the existance of the
id. This makes the resolution safe to setups where two frontend server
definitions point to the same database. While this may not be a
properly configured setup, it's better if the resolution process is
robust to this case.
2020-05-27 17:31:09 +02:00
Erik Nordström
3ddbc386f0 Only support multinode on PG11 and greater
Multinode-related APIs now raise errors when called any PostgreSQL
version below 11, as these versions do not have the required features
to support multinode or have different behavior.

Raising errors at runtime on affected APIs is preferred over excluding
these functions altogether. Having a different user-facing SQL API
would severly complicate the upgrade process for the extension.

A new CMake check has been added to disable multinode features on
unsupported PostgreSQL versions. It also generates a macro in
`config.h` that can be used in code to check for multinode support.
2020-05-27 17:31:09 +02:00
Matvey Arye
0e109d209d Add tables for saving 2pc persistent records
The remote_txn table records commit decisions for 2pc transactions.
A successful 2pc transaction will have one row per remote connection
recorded in this table. In effect it is a mapping between the
distributed transaction and an identifier for each remote connection.

The records are needed to protect against crashes after a
frontend send a `COMMIT TRANSACTION` to one node
but not all nodes involved in the transaction. Towards this end,
the commitment of remote_txn rows represent a crash-safe irrevocable
promise that all participating datanodes will eventually get a `COMMIT
TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`.

The irrevocable nature of the commit of these records means that this
can only happen after the system is sure all participating transactions
will succeed. Thus it can only happen after all datanodes have succeeded
on a `PREPARE TRANSACTION` and will happen as part of the frontend's
transaction commit..
2020-05-27 17:31:09 +02:00
Erik Nordström
e2371558f7 Create chunks on remote servers
This change ensures that chunk replicas are created on remote
(datanode) servers whenever a chunk is created in a local distributed
hypertable.

Remote chunks are created using the `create_chunk()` function, which
has been slightly refactored to allow specifying an explicit chunk
table name. The one making the remote call also records the resulting
remote chunk IDs in its `chunk_server` mappings table.

Since remote command invokation without super-user permissions
requires password authentication, the test configuration files have
been updated to require password authentication for a cluster test
user that is used in tests.
2020-05-27 17:31:09 +02:00
Matvey Arye
d2b4b6e22e Add remote transaction ID module
The remote transaction ID is used in two phase commit. It is the
identifier sent to the datanodes in PREPARE TRANSACTION and related
postgresql commands.

This is the first in a series of commits for adding two phase
commit support to our distributed txn infrastructure.
2020-05-27 17:31:09 +02:00
Matvey Arye
02c178d9ca Add connection caching infrastructure
This commit adds the ability to cache remote connections
across commands and transactions. This is needed since establishing
new connections is expensive. The cache is invalidated
when the foreign server or user mapping is changed. Because of this
the cache is tied to a user mapping (it is keyed by the user mapping's
oid and requires a user_mapping for invalidation).

We use the syscache invalidation mechanism since foreign servers and
user mappings are already invalidated using this mechanism. This
requires some extra plumbing in our cache invalidation handling.

This cache will be used in txn callback handling and so the regular
auto-release of caches on (sub)txn commits/aborts that happens
with most caches is inappropriate. Therefore we added a new flag
to the cache called `handle_txn_callbacks` that allows a cache
to turn off the auto-release mechanism
2020-05-27 17:31:09 +02:00
Erik Nordström
ece582d458 Add mappings table for remote hypertables
In a multi-node (clustering) setup, TimescaleDB needs to track which
remote servers have data for a particular distributed hypertable. It
also needs to know which servers to place new chunks on and to use in
queries against a distributed hypertable.

A new metadata table, `hypertable_server` is added to map a local
hypertable ID to a hypertable ID on a remote server. We require that
the remote hypertable has the same schema and name as the local
hypertable.

When a local server is removed (using `DROP SERVER` or our
`delete_server()`), all remote hypertable mappings for that server
should also be removed.
2020-05-27 17:31:09 +02:00
Erik Nordström
ae587c9964 Add API function for explicit chunk creation
This adds an internal API function to create a chunk using explicit
constraints (dimension slices). A function to export a chunk in a
format consistent with the chunk creation function is also added.

The chunk export/create functions are needed for distributed
hypertables so that an access node can create chunks on data nodes
according to its own (global) partitioning configuration.
2020-05-27 17:31:09 +02:00
niksa
538e27d140 Add Noop Foreign Data Wrapper
This adds a skeleton TimescaleDB foreign data wrapper (FDW) for
scale-out clustering. It currently works as a noop FDW that can be
used for testing, although the intention is to develop it into a
full-blown implementation.
2020-05-27 17:31:09 +02:00
Erik Nordström
eca7cc337a Add server management API and functionality
Servers for a scale-out clustering setup can now be added and deleted
with `add_server()` and `delete_server()`, providing a convenience API
for server management.

While similar functionality can be achieved using the standard
PostgreSQL `CREATE SERVER` and `CREATE USER MAPPING` commands, this
new API makes it easier to add clustering servers and user mappings
consistent with the needs of TimescaleDBs particular clustering setup.

The API currently works with the `postgres_fdw` foreign data
wrapper. It will be updated to use our own foreign data wrapper once
it is available.
2020-05-27 17:31:09 +02:00
gayyappan
91fe723d3a Drop chunks from materialized hypertables
Add support for dropping chunks from materialized
hypertables. drop_chunks_policy can now be set up
for materialized hypertables.
2020-02-26 11:50:58 -05:00
Erik Nordström
9da50cc686 Move enterprise features to community
As of this change, a number of enterprise features are accessible to
community users. These features include:

* reorder
* policies around reorder and drop chunks

The move chunks feature is still covered by enterprise. Gapfill no
longer warns about expired enterprise license.

Tests have been updated to reflect these changes.
2020-02-20 17:08:03 +01:00
gayyappan
783c8e80ea Drop chunks for materialized hypertable
When drop_chunks is called with cascade_to_materialization = true,
 the materialized data is deleted from the materialization
hypertable, but the chunks are not dropped. This fix drops chunks
if possible and deletes the data only if the materialized chunk
cannot be dropped (which is the case if the materialzied chunk
contains data from multiple raw chunks and some of the raw chunks
are not dropped).

Fixes #1644
2020-01-27 15:27:50 -05:00
Matvey Arye
5eb047413b Allow drop_chunks while keeping continuous aggs
Allow dropping raw chunks on the raw hypertable while keeping
the continuous aggregate. This allows for downsampling data
and allows users to save on TCO. We only allow dropping
such data when the dropped data is older than the
`ignore_invalidation_older_than` parameter on all the associated
continuous aggs. This ensures that any modifications to the
region of data which was dropped should never be reflected
in the continuous agg and thus avoids semantic ambiguity
if chunks are dropped but then again recreated due to an
insert.

Before we drop a chunk we need to make sure to process any
continuous aggregate invalidations that were registed on
data inside the chunk. Thus we add an option to materialization
to perform materialization transactionally, to only process
invalidations, and to process invalidation only before a timestamp.

We fix drop_chunks and policy to properly process
`cascade_to_materialization` as a tri-state variable (unknown,
true, false); Existing policy rows should change false to NULL
(unknown) and true stays as true since it was explicitly set.
Remove the form data for bgw_policy_drop_chunk because there
is no good way to represent the tri-state variable in the
form data.

When dropping chunks with cascade_to_materialization = false, all
invalidations on the chunks are processed before dropping the chunk.
If we are so far behind that even the  completion threshold is inside
the chunks being dropped, we error. There are 2 reasons that we error:
1) We can't safely process new ranges transactionally without taking
   heavy weight locks and potentially locking the entire sytem
2) If a completion threshold is that far behind the system probably has
   some serious issues anyway.
2019-12-30 09:10:44 -05:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Sven Klemm
4d12f5b8f3 Fix transparent decompression sort interaction
The sort optimization adds a new index path to the pathlist of rel
with the modified pathkeys. This optimization needs to happen before
the DecompressChunk paths get generated otherwise those paths will
survive in pathlist and a query on a compressed chunk will target
the empty chunk of the uncompressed hypertable.
2019-10-29 19:02:58 -04:00
Matvey Arye
a4773adb58 Make compression feature use the community license
Compression is a community, not enterprise feature.
2019-10-29 19:02:58 -04:00
Matvey Arye
2bf97e452d Push down quals to segment meta columns
This commit pushes down quals or order_by columns to make
use of the SegmentMetaMinMax objects. Namely =,<,<=,>,>= quals
can now be pushed down.

We also remove filters from decompress node for quals that
have been pushed down and don't need a recheck.

This commit also changes tests to add more segment by and
order-by columns.

Finally, we rename segment meta accessor functions to be smaller
2019-10-29 19:02:58 -04:00
gayyappan
6e60d2614c Add compress chunks policy support
Add and drop compress chunks policy using bgw
infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
b9674600ae Add segment meta min/max
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.

We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
Sven Klemm
47c1d7e323 Add set_rel_pathlist hook for tsl code
Will be needed for compression.
2019-10-29 19:02:58 -04:00
gayyappan
44941f7bd2 Add UI for compress_chunks functionality
Add support for compress_chunks function.

This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.

The parsing code will most likely be changed to use PG raw_parser
function.
2019-10-29 19:02:58 -04:00
gayyappan
1c6aacc374 Add ability to create the compressed hypertable
This happens when compression is turned on for regular hypertables.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00
Sven Klemm
d82ad2c8f6 Add ts_ prefix to all exported functions
This patch adds the `ts_` prefix to exported functions that didnt
have it and removes exports that are not needed.
2019-10-15 14:42:02 +02:00
Matvey Arye
d2f68cbd64 Move the set_integer_now func into Apache2
We decided this should be an OSS capability.
2019-10-11 13:00:55 -04:00
David Kohn
897fef42b6 Add support for moving chunks to different tablespaces
Adds a move_chunk function which to a different tablespace. This is
implemented as an extension to the reorder command.
Given that the heap, toast tables, and indexes are being rewritten
during the reorder operation, adding the ability to modify the tablespace
is relatively simple and mostly requires adding parameters to the relevant
functions for the destination tablespace (and index tablespace). The tests
do not focus on further exercising the reorder infrastructure, but instead
ensure that tablespace movement and permissions checks properly occur.
2019-08-21 12:07:28 -04:00
Narek Galstyan
62de29987b Add a notion of now for integer time columns
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.

The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
2019-08-19 23:23:28 +04:00
Joshua Lockerman
3895e5ce0e Add a setting for max an agg materializes per run
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
45fb1fc2c8 Handle drop_chunks on tables that have cont aggs
For hypetables that have continuous aggregates, calling drop_chunks now
drops all of the rows in the materialization table that were based on
the dropped chunks. Since we don't know what the correct default
behavior for drop_chunks is, we've added a new argument,
cascade_to_materializations, which must be set to true in order to call
drop_chunks on a hypertable which has a continuous aggregate.
drop_chunks is blocked on the materialization tables of continuous
aggregates
2019-04-26 13:08:00 -04:00
Matvey Arye
cc862a3c5a Implement WITH options for continuous aggs
1) Change with clause name to 'timescaledb.continuous'

Used to be timescaledb.continuous_agg as a text field, now is a bool.

2) Add more WITH options for continuous aggs

- Refresh lag control the amount by which the materialization will lag
  behind a the maximum current time value.

- Refresh interval controls how often the background materializer is run.

3) Handle ALTER VIEW on continuous aggs

Handle setting WITH options using continuous views.
Block all other ALTER VIEW commands on user and partial views.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
bf44985ac3 Add REFRESH MATERIALIZED VIEW for continuous aggs
This commit also tests end-to-end materialization.

This commit finishes the materialization path for continuous aggregates,
and adds the ability to use REFRESH MATERIALIZED VIEW <continuous
aggregate> to invoke it.

Materialization is invoked via continous_agg_materialize, and happens in
two transactions:
1. In the first transaction we lock the relevant tables, determine the
   point below which we will end materialization, and update the
   invalidation threshold.
2. In the second transaction, we read the actual data and perform the
  actual deletions and updates to the materialization table.

We materialize in this manner because in order to allow mutations to the
underlying hypertable concurrently with materialization, the
invalidation threshold must be updated strictly before materialization
starts; anything else could cause us to lose invalidations. (Simply
blocking all mutations to the table while materialization is occurring
is a non-starter)

More precisely, the operations we perform are as follows:
Transaction 1: 1. read the completed threshold for the continuous
                  aggregate
               2. find the maximum timestamp in the hypertable that is
               greater than the old completed threshold, scanning the
               entire table if this is the first materialization
               3. if we found a new maximum value, and said value is
               sufficiently old (exact definition of sufficiently TBD in
               a later PR), update the invalidation threshold to point
               at this new value (said update is performed under an
               AccessExclusiveLock to ensure there are no concurrent
               mutations)

Transaction 2: 1. drain the invalidation log
               2. read the invalidation threshold
               3. delete from the materialization table everything below
                  the invalidation threshold that was invalidated
               4. scan the raw table and insert new materializationa for
                  everything invalidated, and everything between the
                  completed threshold and the invalidation threshold
               5. update the completed threshold to equal the invalidation
                  threshold
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00