475 Commits

Author SHA1 Message Date
Erik Nordström
4538fc6c40 Optimize continuous aggregate refresh
This change ensures a refresh of a continuous aggregate only
re-materializes the part of the aggregate that has been
invalidated. This makes refreshing much more efficient, and sometimes
eliminates the need to materialize data entirely (i.e., in case there
are no invalidations in the refresh window).

The ranges to refresh are the remainders of invalidations after they
are cut by the refresh window (i.e., all invalidations, or parts of
invalidations, that fall within the refresh window). The invalidations
used for a refresh are collected in a tuple store (which spills to
disk) as to not allocate too much memory in case of many
invalidations. Invalidations are, however, merged and deduplicated
before being added to the tuplestore, similar to how invalidations are
processed in the invalidation logs.

Currently, the refreshing proceeds with just materializing all
invalidated ranges in the order they appear in the tuple store, and
the ordering does not matter since all invalidated regions are
refreshed in the same transaction.
2020-08-31 10:22:32 +02:00
Erik Nordström
5b8ff384dd Add infinite invalidations to cagg log
In its initial state, a continuous aggregate should be completely
invalidated. Therefore, this change adds an infinite invalidation
`[-Infinity, +Infinity]` when a continuous aggregate is created.
2020-08-31 10:22:32 +02:00
Sven Klemm
7f93faad02 Fix dist_hypertable test to use unique data node names
Change dist_hypertable test to use unique data node names.
2020-08-29 23:15:20 +02:00
Sven Klemm
0fa778d1de Fix dist_compression test to enable parallel execution
Change dist_compression test to use unique data node name so it can
be run in parallel.
2020-08-29 23:15:20 +02:00
Sven Klemm
90a5995dfb Merge deparse and deparse_fail test 2020-08-29 23:15:20 +02:00
Sven Klemm
6ad98a45bb Move license change test to regresscheck-shared 2020-08-29 23:15:20 +02:00
Sven Klemm
9ae409259a Merge gapfill tests into single test 2020-08-29 23:15:20 +02:00
Erik Nordström
c5a202476e Fix timestamp overflow in time_bucket optimization
An optimization for `time_bucket` transforms expressions of the form
`time_bucket(10, time) < 100` to `time < 100 + 10` in order to do
chunk exclusion and make better use of indexes on the time
column. However, since one bucket is added to the timestamp when doing
this transformation, the timestamp can overflow.

While a check for such overflows already exists, it uses `+Infinity`
(INT64_MAX/DT_NOEND) as the upper bound instead of the actual end of
the valid timestamp range. A further complication arises because
TimescaleDB internally converts timestamps to UNIX epoch time, thus
losing a little bit of the valid timestamp range in the process. Dates
are further restricted by the fact that they are internally first
converted to timestamps (thus limited by the timestamp range) and then
converted to UNIX epoch.

This change fixes the overflow issue by only applying the
transformation if the resulting timestamps or dates stay within the
valid (TimescaleDB-specific) ranges.

A test has also been added to show the valid timestamp and date
ranges, both PostgreSQL and TimescaleDB-specific ones.
2020-08-27 19:16:24 +02:00
Mats Kindahl
c054b381c6 Change syntax for continuous aggregates
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table.  Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.

In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.

Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.

Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.

Fixes #2233

Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
2020-08-27 17:16:10 +02:00
Dmitry Simonenko
5300b68208 Add test for hypertable_approximate_row_count() on dist hypertable
Issues: #1902
2020-08-26 12:07:04 +03:00
Erik Nordström
f8727756a6 Cleanup drop and show chunks
This change removes, simplifies, and unifies code related to
`drop_chunks` and `show_chunks`. As a result of prior changes to
`drop_chunks`, e.g., making table relid mandatory and removing
cascading options, there's an opportunity to clean up and simplify the
rather complex code for dropping and showing chunks.

In particular, `show_chunks` is now consistent with `drop_chunks`; the
relid argument is mandatory, a continuous aggregate can be used in
place of a hypertable, and the input time ranges are checked and
handled in the same way.

Unused code is also removed, for instance, code that cascaded drop
chunks to continuous aggregates remained in the code base while the
option no longer exists.
2020-08-25 14:36:15 +02:00
Brian Rowe
8e1e6036af Preserve pg_stats on chunks before compression
This change will ensure that the pg_statistics on a chunk are
updated immediately prior to compression. It also ensures that
these stats are not overwritten as part of a global or hypertable
targetted ANALYZE.

This addresses the issue that chunk will no longer generate valid
statistics durings an ANALYZE once the data's been moved to the
compressed table. Unfortunately any compressed rows will not be
captured in the parent hypertable's pg_statistics as there is no way
to change how PostGresQL samples child tables in PG11.

This approach assumes that the compressed table remains static, which
is mostly correct in the current implementation (though it is
possible to remove compressed segments). Once we start allowing more
operations on compressed chunks this solution will need to be
revisited. Note that in PG12 an approach leveraging table access
methods will not have a problem analyzing compressed tables.
2020-08-21 10:48:15 -07:00
Dmitry Simonenko
33d5d11821 Check CREATE INDEX with transaction per chunk using dist hypertable
Issue: #836
2020-08-21 12:42:59 +03:00
Sven Klemm
c281dcdb26 Fix segfault in alter_job
When trying to alter a job with NULL config alter_job did not
set the isnull field for config and would segfault when trying
to build the resultset tuple.
2020-08-21 08:45:58 +02:00
Mats Kindahl
aec7c59538 Block data migration for distributed hypertables
Option `migrate_data` does not currently work for distributed
hypertables, so we block it for the time being and generate an error if
an attempt is made to migrate data when creating a distributed
hypertable.

Fixes #2230
2020-08-20 15:07:01 +02:00
Sven Klemm
043c29ba48 Block policy API commands in read_only transaction 2020-08-20 11:23:49 +02:00
Sven Klemm
a9c087eb1e Allow scheduling custom functions as bgw jobs
This patch adds functionality to schedule arbitrary functions
or procedures as background jobs.

New functions:

add_job(
  proc REGPROC,
  schedule_interval INTERVAL,
  config JSONB DEFAULT NULL,
  initial_start TIMESTAMPTZ DEFAULT NULL,
  scheduled BOOL DEFAULT true
)

Add a job that runs proc every schedule_interval. Proc can
be either a function or a procedure implemented in any language.

delete_job(job_id INTEGER)

Deletes the job.

run_job(job_id INTEGER)

Execute a job in the current session.
2020-08-20 11:23:49 +02:00
Mats Kindahl
9bc5c711f4 Fix retention policy on distributed hypertables
If a retention policy is set up on a distributed hypertable, it will
not propagate the drop chunks call to the data nodes since the drop
chunks call is done through an internal call.

This commit fixes this by creating a drop chunks call internally and
executing it as a function. This will then propagate to the data nodes.

Fixes timescale/timescaledb-private#833
Fixes #2040
2020-08-14 07:21:02 +02:00
Erik Nordström
418f283443 Merge continuous aggregate invalidations
This change implements deduplication and merging of invalidation
entries for continuous aggregates in order to reduce the number of
reduntant entries in the continuous aggregate invalidation
log. Merging is done both when copying over entries from the
hypertable to the continuous aggregate invalidation log and when
cutting already existing invalidations in the latter log. Doing this
merging in both steps helps reduce the number of invalidations also
for the continuous aggregates that don't get refreshed by the active
refresh command.

Merging works by scanning invalidations in order of the lowest
modified value, and given this ordering it is possible to merge the
current and next entry into one large entry if they are
overlapping. This can continue until the current and next invalidation
are disjoint or there are no more invalidations to process.

Note, however, that only the continuous aggregate that gets refreshed
will be fully deduplicated. Some redundant entries might exist for
other aggregates since their entries in the continuous aggregate log
aren't cut against the refresh window.

Full deduplication for the refreshed continuous aggregate is only
possible if the continuous aggregate invalidation log is processed
last, since that also includes "old" entries. Therefore, this change
also changes the ordering of how the logs are processed. This also
makes it possible to process the hypertable invalidation log in the
first transaction of the refresh.
2020-08-13 12:35:23 +02:00
Erik Nordström
c01faa72f0 Set invalidation threshold during refresh
The invalidation threshold governs the window of data from the head of
a hypertable that shouldn't be subject to invalidations in order to
reduce write amplification during inserts on the hypertable.

When a continuous aggregate is refreshed, the invalidation threshold
must be moved forward (or initialized if it doesn't previously exist)
whenever the refresh window stretches beyond the current threshold.

Tests for setting the invalidation threshold are also added, including
new isolation tests for concurrency.
2020-08-12 11:16:23 +02:00
Erik Nordström
80720206df Make refresh_continuous_aggregate a procedure
When a continuous aggregate is refreshed, it also needs to move the
invalidation threshold in case the refresh window stretches beyond the
current threshold. The new invalidation threshold must be set in its
own transaction during the refresh, which can only be done if the
refresh command is a procedure.
2020-08-12 11:16:23 +02:00
Erik Nordström
b8ce74921a Fix refresh of integer-time continuous aggregates
The calculation of the max-size refresh window for integer-based
continuous aggregates used the range of 64-bit integers for all
integer types, while the max ranges for 16- and 32-bit integers are
lower. This change adds the missing range boundaries.
2020-08-12 11:16:23 +02:00
Sven Klemm
d547d61516 Refactor continuous aggregate policy
This patch modifies the continuous aggregate policy to store its
configuration in the jobs table.
2020-08-11 22:57:02 +02:00
Dmitry Simonenko
1a8d0eae06 Add check for distributed hypertable to reorder/move_chunk
Ensure that move_chunk() and reorder_chunk() functions cannot
be used with distributed hypertable
2020-08-11 16:12:54 +03:00
gayyappan
eecc93f3b6 Add hypertable_index_size function
Function to compute the size for a specific
index of a hypertable
2020-08-10 18:00:51 -04:00
Sven Klemm
4409bff025 Add unreferenced test files to CMakeLists
The with_clause_parser and continuous_aggs_drop_chunks tests were
not referenced in the CMakeLists leading to those tests never being
run. This patch adds them to the appropriate file and adjusts the
output.
2020-08-07 15:40:57 +02:00
Dmitry Simonenko
0f60b5b33b Add check for distributed hypertable to continuous aggs
Show an error message in case if a distributed hypertable
being used.
2020-08-07 15:31:29 +03:00
Ruslan Fomkin
56b4c10a74 Fix error messages to compression policy
Error messages are improved and formulated in terms of compression
policy.
2020-08-06 19:17:44 +02:00
Ruslan Fomkin
393e5b9c1a Remove enabling enterprise from compression test
Compression is not enterprise feature anymore. Thus enabling
enterprise is not needed in tests.
2020-08-05 14:25:27 +02:00
Erik Nordström
9a7b4aa003 Process invalidations when refreshing continuous aggregate
This change adds intitial support for invalidation processing when
refreshing a continuous aggregate. Note that, currently, invalidations
are only cleared during a refresh, but not yet used to optimize
refreshes. There are two steps to this processing:

1. Invalidations are moved from hypertable invalidation log to the
   cagg invalidation log
2. The cagg invalidation entries are then processed for the continuous
   aggregate that gets refreshed.

The second step involves finding all invalidations that overlap with
the given refresh window and then either deleting them or cutting
them, depending on how they overlap.

Currently, the "invalidation threshold" is not moved up during a
refresh. This would only be required if the refresh window crosses
that threshold and will be addressed in a future change.
2020-08-04 14:22:04 +02:00
Sven Klemm
bb891cf4d2 Refactor retention policy
This patch changes the retention policy to store its configuration
in the bgw_job table and removes the bgw_policy_drop_chunks table.
2020-08-03 22:33:54 +02:00
Mats Kindahl
9049a5d3cb Remove requirement of CASCADE from DROP VIEW
To drop a continuous aggregate it was necessary to use the `CASCADE`
keyword, which would then cascade to the materialized hypertable. Since
this can cascade the drop to other objects that are dependent on the
continuous aggregate, this could accidentally drop more objects than
intended.

This commit fixes this by removing the check for `CASCADE` and adding
the materialized hypertable to the list of objects to drop.

Fixes timescale/timescaledb-private#659
2020-08-03 22:01:21 +02:00
gayyappan
9f13fb9906 Add functions for compression stats
Add chunk_compression_stats and hypertable_compression_stats
functions to get before/after compression sizes
2020-08-03 10:19:55 -04:00
Mats Kindahl
590446c6a7 Remove cascade_to_materialization parameter
The parameter `cascade_to_materialization` is removed from
`drop_chunks` and `add_drop_chunks_policy` as well as associated tables
and test functions.

Fixes #2137
2020-07-31 11:21:36 +02:00
gayyappan
c93f963709 Remove chunk_relation_size
Remove chunk_relation_size and chunk_relation_size_pretty
functions
Fix row_number in chunks view
2020-07-30 16:06:04 -04:00
Mats Kindahl
03d2f32178 Add self-reference check to add_data_node
If the access node is adding itself as a data node using `add_data_node`
it will deadlock since transactions will be opened on both the access
node and data node both trying to update the metadata.

This commit fixes this by updating `set_dist_id` to check if the UUID
being added as `dist_uuid` is the same as the `uuid` of the node.  If
that is the case, it raises an error.

Fixes #2133
2020-07-30 21:19:33 +02:00
Sven Klemm
0d5f1ffc83 Refactor compress chunk policy
This patch changes the compression policy to store its configuration
in the bgw_job table and removes the bgw_policy_compress_chunks table.
2020-07-30 19:58:37 +02:00
Brian Rowe
68aee5144c Rename add_drop_chunks_policy
This change replaces the add_drop_chunks_policy function with
add_retention_policy.  This also renames the older_than parameter
of that function as retention_window.  Likewise, the
remove_drop_chunks_policy is also being renamed
remove_retention_policy.

Fixes #2119
2020-07-30 09:53:21 -07:00
Ruslan Fomkin
5696668500 Test detach_tablespaces on distributed hypertable
Adds a test to call detach_tablespaces on a distributed hypertable.
Since no tablespaces can be attached to distributed hyperatbles, the
test detaches 0 tablespaces. Also a test to detach tablespaces on a
data node is added.
2020-07-30 10:05:25 +02:00
Erik Nordström
84fd3b09b4 Add refresh function for continuous aggregates
This change adds a new refresh function called
`refresh_continuous_aggregate` that allows refreshing a continuous
aggregate over a given window of data, called the "refresh window".

This is the first step in a larger overhaul of the continuous
aggregate feature with the goal of cleaning up the API and separating
policy from the core functionality.

Currently, the refresh function does a brute-force refresh of a window
and it bypasses the whole invalidation framework. Future updates
intend to integrate with this framework (with modifications) to
optimize refreshes. An exclusive lock is take on the continuous
aggregate's internal materialized hypertable in order to protect
against concurrent refreshing. However, as this serializes refreshes,
we might want to relax this locking in the future to allow, e.g.,
concurrent refreshes of non-overlapping windows.

The new refresh functionality includes basic tests for bad input and
refreshing across different windows. Unfortunately, a bug in the
optimization code for `time_bucket` causes timestamps to overflow the
allowed MAX time. Therefore, refresh windows that are close to the MAX
allowed size are not yet supported or tested.
2020-07-30 01:04:32 +02:00
Sven Klemm
5a410736a9 Only run chunk_api test on debug build
The chunk_api test requires a debug build for certain test functions
this patch changes the chunk_api test to only run for debug builds.
2020-07-30 00:00:57 +02:00
gayyappan
7d3b4b5442 New size utils functions
Add hypertable_detailed_size , chunk_detailed_size,
hypertable_size functions.
Remove hypertable_relation_size,
hypertable_relation_size_pretty, and indexes_relation_size_pretty
Remove size information from hypertables view.
2020-07-29 15:30:39 -04:00
Sven Klemm
3e83577916 Refactor reorder policy
This patch changes the reorder policy to store it's configuration
in the bgw_job table and removes the bgw_policy_reorder table.
2020-07-29 12:07:13 +02:00
Mats Kindahl
6f64f959db Propagate privileges from hypertables to chunks
Whenever chunks are created, no privileges are added to the chunks.
For accesses that go through the hypertable permission checks are
ignored so reads and writes will succeed anyway. However, for direct
accesses to the chunks, permission checks are done, which creates
problems for, e.g., `pg_dump`.

This commit fixes this by propagating `GRANT` and `REVOKE` statements
to the chunks when executed on the hypertable, and whenever new chunks
are created, privileges are copied from the hypertable.

This commit do not propagate privileges for distributed hypertables,
this is in a separate commit.
2020-07-28 17:42:52 +02:00
gayyappan
dc61466aef Add chunks and dimensions view
timescaledb_information.chunks view shows metadata
related to chunks.
timescaledb_information.dimensions shows metadata
related to hypertable's dimensions.
2020-07-26 17:10:05 -04:00
Dmitry Simonenko
fca7e36898 Support moving compressed chunks
Allow move_chunk() to work with uncompressed chunk and
automatically move associated compressed chunk to specified
tablespace.

Block move_chunk() execution for compressed chunks.

Issue: #2067
2020-07-24 19:26:15 +03:00
gayyappan
926a1c9850 Add compression settings view
Add informational view that lists the settings
used while enabling compression on a hypertable.
2020-07-23 12:40:12 -04:00
Brian Rowe
6b62ed543c Fetch collations from data nodes during ANALYZE
This change fixes the stats collecting code to also return the slot
collation fields for PG12. This fixes a bug (#2093) where running an
ANALYZE in PG12 would break queries on distributed tables.
2020-07-20 10:54:44 -07:00
Ruslan Fomkin
bdced2b722 Add test of drop_chunks on distributed hypertable
Testing that drop_chunks works correctly on a distributed hypertable.
Tests of different arguments are assumed to be done on a usual
hypertable previously.
2020-07-20 16:21:45 +02:00
Sven Klemm
3d1a7ca3ac Fix delete on tables involving hypertables with compression
The DML blocker to block INSERTs and UPDATEs on compressed hypertables
would trigger if the UPDATE or DELETE referenced any hypertable with
compressed chunks. This patch changes the logic to only block if the
target of the UPDATE or DELETE is a compressed chunk.
2020-07-20 13:22:49 +02:00