This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
When TidRangeScan is child of ChunkAppend or ConstraintAwareAppend node, an
error is reported as "invalid child of chunk append: Node (26)". This patch
fixes the issue by recognising TidRangeScan as a valid child.
Fixes: #4872
The sequence number of the compressed tuple is per segment by grouping
and should be reset when the grouping changes to prevent overflows with
many segmentby columns.
Chunks that are dropped but preserve the catalog entries
have an incorrect status when they are marked as dropped.
This happens if the chunk was previously compressed and then
gets dropped - the status in the catalog tuple reflects the
compression status. This should be reset since the data is now
dropped.
1. Add compression_state column for hypertable catalog
by renaming compressed column for the hypertable catalog
table. compression_state is a tri-state column.
This column indicates if the hypertable has
compression enabled (value = 1) or if it is an internal
compression table (value = 2).
2. Save compression settings on access node when compression
is turned on for a distributed hypertable
For a distributed hypertable, that has compression enabled,
compression_state is set. We don't create any internal tables
on the access node.
Fixes#2660
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.
2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
This change makes sure that ANALYZE and VACUUM commands run without
any relations will not clear the stats on compressed chunks that
were saved at compression time. It also will skip any distributed
tables.
Fixes#2576
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path. It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation. It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.
Fixes#2524
Tests are updated to no longer use continuous aggregate options that
will be removed, such as `refresh_lag`, `max_interval_per_job` and
`ignore_invalidation_older_than`. `REFRESH MATERIALIZED VIEW` has also
been replaced with `CALL refresh_continuous_aggregate()` using ranges
that try to replicate the previous refresh behavior.
The materializer test (`continuous_aggregate_materialize`) has been
removed, since this tested the "old" materializer code, which is no
longer used without `REFRESH MATERIALIZED VIEW`. The new API using
`refresh_continuous_aggregate` already allows manual materialization
and there are two previously added tests (`continuous_aggs_refresh`
and `continuous_aggs_invalidate`) that cover the new refresh path in
similar ways.
When updated to use the new refresh API, some of the concurrency
tests, like `continuous_aggs_insert` and `continuous_aggs_multi`, have
slightly different concurrency behavior. This is explained by
different and sometimes more conservative locking. For instance, the
first transaction of a refresh serializes around an exclusive lock on
the invalidation threshold table, even if no new threshold is
written. The previous code, only took the heavier lock once, and if, a
new threshold was written. This new, and stricter locking, means that
insert processes that read the invalidation threshold will block for a
short time when there are concurrent refreshes. However, since this
blocking only occurs during the first transaction of the refresh
(which is quite short), it probably doesn't matter too much in
practice. The relaxing of locks to improve concurrency and performance
can be implemented in the future.
This commit will add support for `WITH NO DATA` when creating a
continuous aggregate and will refresh the continuous aggregate when
creating it unless `WITH NO DATA` is provided.
All test cases are also updated to use `WITH NO DATA` and an additional
test case for verifying that both `WITH DATA` and `WITH NO DATA` works
as expected.
Closes#2341
This change renames function to approximate_row_count() and adds
support for regular tables. Return a row count estimate for a
table instead of a table list.
We change the syntax for defining continuous aggregates to use `CREATE
MATERIALIZED VIEW` rather than `CREATE VIEW`. The command still creates
a view, while `CREATE MATERIALIZED VIEW` creates a table. Raise an
error if `CREATE VIEW` is used to create a continuous aggregate and
redirect to `CREATE MATERIALIZED VIEW`.
In a similar vein, `DROP MATERIALIZED VIEW` is used for continuous
aggregates and continuous aggregates cannot be dropped with `DROP
VIEW`.
Continuous aggregates are altered using `ALTER MATERIALIZED VIEW`
rather than `ALTER VIEW`, so we ensure that it works for `ALTER
MATERIALIZED VIEW` and gives an error if you try to use `ALTER VIEW` to
change a continuous aggregate.
Note that we allow `ALTER VIEW ... SET SCHEMA` to be used with the
partial view as well as with the direct view, so this is handled as a
special case.
Fixes#2233
Co-authored-by: =?UTF-8?q?Erik=20Nordstr=C3=B6m?= <erik@timescale.com>
Co-authored-by: Mats Kindahl <mats@timescale.com>
This change will ensure that the pg_statistics on a chunk are
updated immediately prior to compression. It also ensures that
these stats are not overwritten as part of a global or hypertable
targetted ANALYZE.
This addresses the issue that chunk will no longer generate valid
statistics durings an ANALYZE once the data's been moved to the
compressed table. Unfortunately any compressed rows will not be
captured in the parent hypertable's pg_statistics as there is no way
to change how PostGresQL samples child tables in PG11.
This approach assumes that the compressed table remains static, which
is mostly correct in the current implementation (though it is
possible to remove compressed segments). Once we start allowing more
operations on compressed chunks this solution will need to be
revisited. Note that in PG12 an approach leveraging table access
methods will not have a problem analyzing compressed tables.
The DML blocker to block INSERTs and UPDATEs on compressed hypertables
would trigger if the UPDATE or DELETE referenced any hypertable with
compressed chunks. This patch changes the logic to only block if the
target of the UPDATE or DELETE is a compressed chunk.
When enabling compression on a hypertable the existing
constraints are being cloned to the new compressed hypertable.
During validation of existing constraints a loop
through the conkey array is performed, and constraint name
is erroneously added to the list multiple times. This fix
moves the addition to the list outside the conkey loop.
Fixes#2000
The `drop_chunks` function is refactored to make table name mandatory
for the function. As a result, the function was also refactored to
accept the `regclass` type instead of table name plus schema name and
the parameters were reordered to match the order for `show_chunks`.
The commit also refactor the code to pass the hypertable structure
between internal functions rather than the hypertable relid and moving
error checks to the PostgreSQL function. This allow the internal
functions to avoid some lookups and use the information in the
structure directly and also give errors earlier instead of first
dropping chunks and then error and roll back the transaction.
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.
test/sql/
- triggers.sql: Made required change
tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete
tsl/src/
- compression/compression.c:
- row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
- compressor_finish: Removed (obsolete)
src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
- process_altertable_end_subcmd: Removed (PG will handle case)
Drop Foreign Key constraints from uncompressed chunks during the
compression. This allows to cascade data deletion in FK-referenced
tables to compressed chunks. The foreign key constrains are restored
during decompression.
This change refactors our main planner hooks in `planner.c` with the
intention of providing a consistent way to classify planned relations
across hooks. In our hooks, we'd like to know whether a planned
relation (`RelOptInfo`) is one of the following:
* Hypertable
* Hypertable child (a hypertable can appear as a child of itself)
* Chunk as a child of hypertable (from expansion)
* Chunk as standalone (operation directly on chunk)
* Any other relation
Previously, there was no way to consistently know which of these one
was dealing with. Instead, a mix of various functions was used without
"remembering" the classification for reuse in later sections of the
code.
When classifying relations according to the above categories, the only
source of truth about a relation is our catalog metadata. In case of
hypertables, this is cached in the hypertable cache. However, this
cache is read-through, so, in case of a cache miss, the metadata will
always be scanned to resolve a new entry. To avoid unnecessary
metadata scans, this change introduces a way to do cache-only
queries. This requires maintaining a single warmed cache throughout
planning and is enabled by using a planner-global cache object. The
pre-planning query processing warms the cache by populating it with
all hypertables in the to-be-planned query.
This change includes a major refactoring to support PostgreSQL
12. Note that many tests aren't passing at this point. Changes
include, but are not limited to:
- Handle changes related to table access methods
- New way to expand hypertables since expansion has changed in
PostgreSQL 12 (more on this below).
- Handle changes related to table expansion for UPDATE/DELETE
- Fixes for various TimescaleDB optimizations that were affected by
planner changes in PostgreSQL (gapfill, first/last, etc.)
Before PostgreSQL 12, planning was organized something like as
follows:
1. construct add `RelOptInfo` for base and appendrels
2. add restrict info, joins, etc.
3. perform the actual planning with `make_one_rel`
For our optimizations we would expand hypertables in the middle of
step 1; since nothing in the query planner before `make_one_rel` cared
about the inheritance children, we didn’t have to be too precises
about where we were doing it.
However, with PG12, and the optimizations around declarative
partitioning, PostgreSQL now does care about when the children are
expanded, since it wants as much information as possible to perform
partition-pruning. Now planning is organized like:
1. construct add RelOptInfo for base rels only
2. add restrict info, joins, etc.
3. expand appendrels, removing irrelevant declarative partitions
4. perform the actual planning with make_one_rel
Step 3 always expands appendrels, so when we also expand them during
step 1, the hypertable gets expanded twice, and things in the planner
break.
The changes to support PostgreSQL 12 attempts to solve this problem by
keeping the hypertable root marked as a non-inheritance table until
`make_one_rel` is called, and only then revealing to PostgreSQL that
it does in fact have inheritance children. While this strategy entails
the least code change on our end, the fact that the first hook we can
use to re-enable inheritance is `set_rel_pathlist_hook` it does entail
a number of annoyances:
1. this hook is called after the sizes of tables are calculated, so we
must recalculate the sizes of all hypertables, as they will not
have taken the chunk sizes into account
2. the table upon which the hook is called will have its paths planned
under the assumption it has no inheritance children, so if it's a
hypertable we have to replan it's paths
Unfortunately, the code for doing these is static, so we need to copy
them into our own codebase, instead of just using PostgreSQL's.
In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also
changed and are now planned in two stages:
- In stage 1, the statement is planned as if it was a `SELECT` and all
leaf tables are discovered.
- In stage 2, the original query is planned against each leaf table,
discovered in stage 1, directly, not part of an Append.
Unfortunately, this means we cannot look in the appendrelinfo during
UPDATE/DELETE planning, in particular to determine if a table is a
chunk, as the appendrelinfo is not at the point we wish to do so
initialized. This has consequences for how we identify operations on
chunks (sometimes for blocking and something for enabling
functionality).
The test `continuous_aggs_ddl` failed on PostgreSQL 9.6 because it had
a line that tested compression on a hypertable when this feature is
not supported in 9.6. This prohibited a large portion of the test to
run on 9.6.
This change moves the testing of compression on a continuous aggregate
to the `compression` test instead, which only runs on supported
PostgreSQL versions. A permission check on a view is also removed,
since similar tests are already in the `continuous_aggs_permissions`
tests.
The permission check was the only thing that caused different output
across PostgreSQL versions, so therefore the test no longer requires
version-specific output files and has been simplified to use the same
output file irrespective of PostgreSQL version.
When trying to compress a chunk that had a column of datatype
interval delta-delta compression would be selected for the column
but our delta-delta compression does not support interval and
would throw an errow when trying to compress a chunk.
This PR changes the compression selected for interval to dictionary
compression.
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.
This also removes the custom type that was used before.
- Fix declaration of functions wrt TSDLLEXPORT consistency
- Empty structs need to be created with '{ 0 }' syntax.
- Alignment sentinels have to use uint64 instead of a struct
with a 0-size member
- Add some more ORDER BY clauses in the tests to constrain
the order of results
- Add ANALYZE after running compression in
transparent-decompression test
This commit pushes down quals or order_by columns to make
use of the SegmentMetaMinMax objects. Namely =,<,<=,>,>= quals
can now be pushed down.
We also remove filters from decompress node for quals that
have been pushed down and don't need a recheck.
This commit also changes tests to add more segment by and
order-by columns.
Finally, we rename segment meta accessor functions to be smaller
Add a sequence id to the compressed table. This id increments
monotonically for each compressed row in a way that follows
the order by clause. We leave gaps to allow for the
possibility to fill in rows due to e.g. inserts down
the line.
The sequence id is global to the entire chunk and does not reset
for each segment-by-group-change since this has the potential
to allow some micro optimizations when ordering by a segment by
columns as well.
The sequence number is a INT32, which allows up to 200 billion
uncompressed rows per chunk to be supported (assuming 1000 rows
per compressed row and a gap of 10). Overflow is checked in the
code and will error if this is breached.
This commit integrates the SegmentMetaMinMax into the
compression logic. It adds metadata columns to the compressed table
and correctly sets it upon compression.
We also fix several errors with datum detoasting in SegmentMetaMinMax
This patch adds a DecompressChunk custom scan node, which will be
used when querying hypertables with compressed chunks to transparently
decompress chunks.
This is useful, if some or all compressed columns are NULL.
The count reflects the number of uncompressed rows that are
in the compressed row. Stored as a 32-bit integer.