This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.
2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
There is a bug in some versions of PG11 where ANALYZE was not
calling CommandCounterIncrement. This is causing us to fail to
update pg_class statistics during compression for those versions.
To work around this, this change adds an explicit
CommandCounterIncrement call after ExecVacuum in PG11.
Fixes#2581
Errors and messages are overhauled to conform to the official
PostgreSQL style guide. In particular, the following things from the
guide has been given special attention:
* Correct capitalization of first letter: capitalize only for hints,
and detail messages.
* Correct handling of periods at the end of messages (should be elided
for primary message, but not detail and hint messages).
* The primary message should be short, factual, and avoid reference to
implementation details such as specific function names.
Some messages have also been reworded for clarity and to better
conform with the last bullet above (short primary message). In other
cases, messages have been updated to fix references to, e.g., function
parameters that used the wrong parameter name.
Closes#2364
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path. It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation. It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.
Fixes#2524
This patch removes enterprise license support and moves
move_chunk() function under community license (TSL).
Licensing validation code been reworked and simplified.
Previously used timescaledb.license_key guc been renamed to
timescaledb.license.
This change also makes testing code more strict against
used license. Apache test suite now can test only apache-licensed
functions.
Fixes#2359
We have some debug messages that are printed as notices, but are more
suitable to have at `DEBUG1` level. This commit removes a notice about
indexes being added and turns it into a `DEBUG1` notice.
This change will ensure that the pg_statistics on a chunk are
updated immediately prior to compression. It also ensures that
these stats are not overwritten as part of a global or hypertable
targetted ANALYZE.
This addresses the issue that chunk will no longer generate valid
statistics durings an ANALYZE once the data's been moved to the
compressed table. Unfortunately any compressed rows will not be
captured in the parent hypertable's pg_statistics as there is no way
to change how PostGresQL samples child tables in PG11.
This approach assumes that the compressed table remains static, which
is mostly correct in the current implementation (though it is
possible to remove compressed segments). Once we start allowing more
operations on compressed chunks this solution will need to be
revisited. Note that in PG12 an approach leveraging table access
methods will not have a problem analyzing compressed tables.
When enabling compression on a hypertable the existing
constraints are being cloned to the new compressed hypertable.
During validation of existing constraints a loop
through the conkey array is performed, and constraint name
is erroneously added to the list multiple times. This fix
moves the addition to the list outside the conkey loop.
Fixes#2000
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:
remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
This patch changes the order in which locks are taken during
compression to avoid taking strong locks for long periods on referenced
tables.
Previously, constraints from the uncompressed chunk were copied to the
compressed chunk before compressing the data. When the uncompressed
chunk had foreign key constraints, this resulted in a
ShareRowExclusiveLock being held on the referenced table for the
remainder of the transaction, which includes the (potentially long)
period while the data is compressed, and prevented any
INSERTs/UPDATEs/DELETEs on the referenced table during the remainder of
the time it took the compression transaction to complete.
Copying constraints after completing the actual data compression does
not pose safety issues (as any updates to referenced keys are caught by
the FK constraint on the uncompressed chunk), and it enables the
compression job to minimize the time during which strong locks are held
on referenced tables.
Fixes#1614.
This change replaces the existing `clang-tidy` linter target with
CMake's built-in support for it. The old way of invoking the linter
relied on the `run-clang-tidy` wrapper script, which is not installed
by default on some platforms. Discovery of the `clang-tidy` tool has
also been improved to work with more installation locations.
As a result, linting now happens at compile time and is enabled
automatically when `clang-tidy` is installed and found.
In enabling `clang-tidy`, several non-trivial issues were discovered
in compression-related code. These might be false positives, but,
until a proper solution can be found, "warnings-as-errors" have been
disabled for that code to allow compilation to succeed with the linter
enabled.
Initial support for compression on distributed hypertables. This
_only_ includes the ability to run `compress_chunk` and
`decompress_chunk` on a distributed hypertable. There is no support
for automation, at least not beyond what one can do individually on
each data node.
Note that an access node keeps no local metadata about which
distributed hypertables have compressed chunks. This information needs
to be fetched directly from data nodes, although such functionality is
not yet implemented. For example, informational views on the access
nodes will not yet report the correct compression states for
distributed hypertables.
In distributed hypertables, chunks are foreign tables and such tables
do not support (or should not support) indexes, certain constraints,
and triggers. Therefore, such objects should not recurse to foreign
table chunks nor add a mappings in the `chunk_constraint` or
`chunk_index` tables.
This change ensures that we properly filter out the indexes, triggers,
and constraints that should not recurse to chunks on distributed
hypertables.
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.
The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.
Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.
test/sql/
- triggers.sql: Made required change
tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete
tsl/src/
- compression/compression.c:
- row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
- compressor_finish: Removed (obsolete)
src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
- process_altertable_end_subcmd: Removed (PG will handle case)
The internal chunk API is updated to avoid returning `Chunk` objects
that are marked `dropped=true` along with some refactoring, hardening,
and cleanup of the internal chunk APIs. In particular, apart from
being returned in a dropped state, chunks could also be returned in a
partial state (without all fields set, partial constraints,
etc.). None of this is allowed as of this change. Further, lock
handling was unclear when joining chunk metadata from different
catalog tables. This is made clear by having chunks built within
nested scan loops so that proper locks are held when joining in
additional metadata (such as constraints).
This change also fixes issues with dropped chunks that caused chunk
metadata to be processed many times instead of just once, leading to
potential bugs or bad performance.
In particular, since the introduction of the “dropped” flag, chunk
metadata can exist in two states: 1. `dropped=false`
2. `dropped=true`. When dropping chunks (e.g., via `drop_chunks`,
`DROP TABLE <chunk>`, or `DROP TABLE <hypertable>`) there are also two
modes of dropping: 1. DELETE row and 2. UPDATE row and SET
dropped=true.
The deletion mode and the current state of chunk lead to a
cross-product resulting in 4 cases when dropping/deleting a chunk:
1. DELETE row when dropped=false
2. DELETE row when dropped=true
3. UPDATE row when dropped=false
4. UPDATE row when dropped=true
Unfortunately, the code didn't distinguish between these cases. In
particular, case (4) should not be able to happen, but since it did it
lead to a recursing loop where an UPDATE created a new tuple that then
is recursed to in the same loop, and so on.
To fix this recursing loop and make the code for dropping chunks less
error prone, a number of assertions have been added, including some
new light-weight scan functions to access chunk information without
building a full-blown chunk.
This change also removes the need to provide the number of constraints
when scanning for chunks. This was really just a hint anyway, but this
is no longer needed since all constraints are joined in anyway.
This change fixes various compiler warnings that show up on different
compilers and platforms. In particular, MSVC is sensitive to functions
that do not return a value after throwing an error since it doesn't
realize that the code path is not reachable.
Drop Foreign Key constraints from uncompressed chunks during the
compression. This allows to cascade data deletion in FK-referenced
tables to compressed chunks. The foreign key constrains are restored
during decompression.
Cache queries support multiple optional behaviors, such as "missing
ok" (do not fail on cache miss) and "no create" (do not create a new
entry if one doesn't exist in the cache). With multiple boolean
parameters, the query API has become unwieldy so this change turns
these booleans into one flag parameter.
Correcting conditions in #ifdefs, adding missing includes, removing
and rearranging existing includes, replacing PG12 with PG12_GE for
forward compatibility. Fixed number of places with relation_close to
table_close, which were missed earlier.
relation_open is a general function, which is called from more
specific functions per database type. This commit replaces them
with the specific functions, which control correct types.
This change includes a major refactoring to support PostgreSQL
12. Note that many tests aren't passing at this point. Changes
include, but are not limited to:
- Handle changes related to table access methods
- New way to expand hypertables since expansion has changed in
PostgreSQL 12 (more on this below).
- Handle changes related to table expansion for UPDATE/DELETE
- Fixes for various TimescaleDB optimizations that were affected by
planner changes in PostgreSQL (gapfill, first/last, etc.)
Before PostgreSQL 12, planning was organized something like as
follows:
1. construct add `RelOptInfo` for base and appendrels
2. add restrict info, joins, etc.
3. perform the actual planning with `make_one_rel`
For our optimizations we would expand hypertables in the middle of
step 1; since nothing in the query planner before `make_one_rel` cared
about the inheritance children, we didn’t have to be too precises
about where we were doing it.
However, with PG12, and the optimizations around declarative
partitioning, PostgreSQL now does care about when the children are
expanded, since it wants as much information as possible to perform
partition-pruning. Now planning is organized like:
1. construct add RelOptInfo for base rels only
2. add restrict info, joins, etc.
3. expand appendrels, removing irrelevant declarative partitions
4. perform the actual planning with make_one_rel
Step 3 always expands appendrels, so when we also expand them during
step 1, the hypertable gets expanded twice, and things in the planner
break.
The changes to support PostgreSQL 12 attempts to solve this problem by
keeping the hypertable root marked as a non-inheritance table until
`make_one_rel` is called, and only then revealing to PostgreSQL that
it does in fact have inheritance children. While this strategy entails
the least code change on our end, the fact that the first hook we can
use to re-enable inheritance is `set_rel_pathlist_hook` it does entail
a number of annoyances:
1. this hook is called after the sizes of tables are calculated, so we
must recalculate the sizes of all hypertables, as they will not
have taken the chunk sizes into account
2. the table upon which the hook is called will have its paths planned
under the assumption it has no inheritance children, so if it's a
hypertable we have to replan it's paths
Unfortunately, the code for doing these is static, so we need to copy
them into our own codebase, instead of just using PostgreSQL's.
In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also
changed and are now planned in two stages:
- In stage 1, the statement is planned as if it was a `SELECT` and all
leaf tables are discovered.
- In stage 2, the original query is planned against each leaf table,
discovered in stage 1, directly, not part of an Append.
Unfortunately, this means we cannot look in the appendrelinfo during
UPDATE/DELETE planning, in particular to determine if a table is a
chunk, as the appendrelinfo is not at the point we wish to do so
initialized. This has consequences for how we identify operations on
chunks (sometimes for blocking and something for enabling
functionality).
This change fixes a number of typos and issues with inconsistent
formatting for compression-related code. A couple of other fixes for
variable names, etc. have also been applied.
When trying to compress a chunk that had a column of datatype
interval delta-delta compression would be selected for the column
but our delta-delta compression does not support interval and
would throw an errow when trying to compress a chunk.
This PR changes the compression selected for interval to dictionary
compression.
Refactors multiple implementations of finding hypertables in cache
and failing with different error messages if not found. The
implementations are replaced with calling functions, which encapsulate
a single error message. This provides the unified error message and
removes need for copy-paste.
Previously we could have a dangling policy and job referring
to a now-dropped hypertable.
We also block changing the compression options if a policy exists.
Fixes#1570
The constraint check previously assumed that the col_meta
offset for a column was equal to that columns attribute
offset. This is incorrect in the presence of dropped columns.
Fixed to match on column names.
Fixes#1590
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).
If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
Previously, the Chunk struct was used to represent both a full
chunk and the stub used for joins. The stub used for joins
only contained valid values for some chunk fields and not others.
After the join determined that a Chunk was complete, it filled
in the rest of the chunk field. The fact that a chunk could have
only some fields filled out and not others at different times,
made the code hard to follow and error prone.
So we separate out the stub state of the chunk into a separate
struct that doesn't contain the not-filled-out fields inside
of it. This leverages the type system to prevent errors that
try to access invalid fields during the join phase and makes
the code easier to follow.
We want compressed data to be stored out-of-line whenever possible so
that the headers are colocated and scans on the metadata and segmentbys
are cheap. This commit lowers toast_tuple_target to 128 bytes, so that
more tables will have this occur; using the default size, very often a
non-trivial portion of the data ends up in the main table, and only
very few rows are stored in a page.
This commit adds tests for DATE, TIMESTAMP, and FLOAT compression and
decompression, NULL compression and decompression in dictionaries and
fixes a bug where the database would refuse to decompress DATEs. This
commit also removes the fallback allowing any binary compatible 8-byte
types to be compressed by our integer compressors as I believe I found
a bug in said fallback last time I reviewed it, and cannot recall what
the bug was. These can be re-added later, with appropriate tests.
Queries with the first/last optimization on compressed chunks
would not properly decompress data but instead access the uncompressed
chunk. This patch fixes the behaviour and also unifies the check
whether a hypertable has compression.
This commit fixes issues reported by coverity. Of these, the only real
issue is an integer overflow in bitarray, which can never happen in its
current usages. This also adds a PG_USED_FOR_ASSERTS_ONLY for a
variable only used for Assert.
Since enabling compression creates limits on the hypertable
(e.g. types of constraints allowed) even if there are no
compressed chunks, we add the ability to turn off compression.
This is only possible if there are no compressed chunks.
This commit improves the API of compress_chunk and decompress_chunk:
- have it return the chunk regclass processed (or NULL in the
idempotent case);
- mark it as STRICT
- add if_not_compressed/if_compressed options for idempotency
Some small improvements:
- allow alter table with empty segment by if the original definition
had an empty segment by. Improve error msgs.
- block compression on tables with OIDs
- block compression on tables with RLS
For tablepaces with compressed chunks the semantics are the following:
- compressed chunks get put into the same tablespace as the
uncommpressed chunk on compression.
- set tablespace on uncompressed hypertable cascades to compressed hypertable+chunks
- set tablespace on all chunks is blocked (same as w/o compression)
- move chunks on a uncompressed chunk errors
- move chunks on compressed chunk works
In the future we will:
- add tablespace option to compress_chunk function and policy (this will override the setting
of the uncompressed chunk). This will allow changing tablespaces upon compression
- Note: The current plan is to never listen to the setting on compressed hypertable. In fact,
we will block setting tablespace on compressed hypertables
The statistics on segmentby and metadata columns are very important as
they affect the decompressed data a thousand-fold. Statistics on the
compressed columns are irrelevant, as the regular postgres planner
cannot understand the compressed columns. This commit sets the
statistics for compressed tables based on this, weighting the
uncompressed columns greatly, and the compressed columns not-at-all.
Primary and unqiue constraints are limited to segment_by and order_by
columns and foreign key constraints are limited to segment_by columns
when creating a compressed hypertable. There are no restrictions on
check constraints.
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.
This also removes the custom type that was used before.