106 Commits

Author SHA1 Message Date
gayyappan
4f865f7870 Add recompress_chunk function
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.

This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).

recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
2021-05-24 18:03:47 -04:00
Sven Klemm
fc5f10d454 Remove chunk_dml_blocker trigger
Remove the chunk_dml_blocker trigger which was used to prevent
INSERTs into compressed chunks.
2021-05-24 18:03:47 -04:00
Sven Klemm
5f6e492474 Adjust pathkeys generation for unordered compressed chunks
Compressed chunks with inserts after being compressed have batches
that are not ordered according to compress_orderby for those
chunks we cannot set pathkeys on the DecompressChunk node and we
need an extra sort step if we require ordered output from those
chunks.
2021-05-24 18:03:47 -04:00
gayyappan
d9839b9b61 Support defaults, sequences, check constraints for compressed chunks
Support defaults, sequences and check constraints with inserts
into compressed chunks
2021-05-24 18:03:47 -04:00
gayyappan
93be235d33 Support for inserts into compressed hypertables
Add CompressRowSingleState .
This has functions to compress a single row.
2021-05-24 18:03:47 -04:00
Sven Klemm
eef71fdfb1 Replace StrNCpy with strlcpy
PG14 removes StrNCpy and some Name helper functions.

https://github.com/postgres/postgres/commit/1784f278a6
2021-05-20 08:54:54 +02:00
Markos Fountoulakis
bc740a32fb Add distributed hypertable compression policies
Add support for compression policies on Access Nodes. Extend the
compress_chunk() function to maintain compression state per chunk
on the Access Node.
2021-05-07 16:50:12 +03:00
Sven Klemm
d26c744115 Use %u to format Oid instead of %d
Since Oid is unsigned int we have to use %u to print it otherwise
oids >= 2^31 will not work correctly. This also switches the places
that print type oid to use format helper functions to resolve the
oids.
2021-04-14 21:11:20 +02:00
gayyappan
5be6a3e4e9 Support column rename for hypertables with compression enabled
ALTER TABLE <hypertable> RENAME <column_name> TO <new_column_name>
is now supported for hypertables that have compression enabled.

Note: Column renaming is not supported for distributed hypertables.
So this will not work on distributed hypertables that have
compression enabled.
2021-02-19 10:21:50 -05:00
Sven Klemm
8bc2568f7d Adjust compression code to PG13 AlterTable changes
PG13 changes the signature of AlterTable so we switch to using
AlterTableInternal instead.
2021-01-15 13:07:52 +01:00
gayyappan
f649736f2f Support ADD COLUMN for compressed hypertables
Support ALTER TABLE .. ADD COLUMN <colname> <typname>
for hypertables with compressed chunks.
2021-01-14 09:32:50 -05:00
Sven Klemm
002510cb01 Add compatibilty wrapper functions for base64 encoding/decoding
PG13 adds a destination length 4th argument to pg_b64_decode and
pg_b64_encode functions so this patch adds a macro that translates
to the 3 argument and 4 argument calls depending on postgres version.
This patch also adds checking of return values for those functions.

https://github.com/postgres/postgres/commit/cfc40d384a
2020-12-10 18:40:37 +01:00
gayyappan
bca1e35a52 Support disabling compression on distributed hypertables
This change allows a user to execute
ALTER TABLE hyper SET (timescaledb.compress = false)
on distributed hypertables.

Fixes #2716
2020-12-08 12:08:54 -05:00
gayyappan
d41aa2aff5 Rename macro TS_HYPERTABLE_HAS_COMPRESSION
This PR cleans up some macro names.
Rename macro TS_HYPERTABLE_HAS_COMPRESSION
to TS_HYPERTABLE_HAS_COMPRESSION_TABLE.
2020-12-02 15:44:48 -05:00
gayyappan
7c76fd4d09 Save compression settings on access node for distributed hypertables
1. Add compression_state column for hypertable catalog
by renaming compressed column for the hypertable catalog
table. compression_state is a tri-state column.
This column indicates if the hypertable has
compression enabled (value = 1) or if it is an internal
compression table (value = 2).

2. Save compression settings on access node when compression
is turned on for a distributed hypertable
For a distributed hypertable, that has compression enabled,
compression_state is set. We don't create any internal tables
on the access node.

Fixes #2660
2020-12-02 10:42:57 -05:00
gayyappan
05319cd424 Support analyze of internal compression table
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
2020-11-11 15:05:14 -05:00
Sven Klemm
97254783d4 Fix segfault in decompress_chunk for chunks with dropped columns
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
2020-11-10 10:13:45 +01:00
Brian Rowe
525e821055 Add missing increment for PG11 decompression
There is a bug in some versions of PG11 where ANALYZE was not
calling CommandCounterIncrement.  This is causing us to fail to
update pg_class statistics during compression for those versions.
To work around this, this change adds an explicit
CommandCounterIncrement call after ExecVacuum in PG11.

Fixes #2581
2020-10-20 11:54:35 -07:00
Erik Nordström
3cf9c857c4 Make errors and messages conform to style guide
Errors and messages are overhauled to conform to the official
PostgreSQL style guide. In particular, the following things from the
guide has been given special attention:

* Correct capitalization of first letter: capitalize only for hints,
  and detail messages.
* Correct handling of periods at the end of messages (should be elided
  for primary message, but not detail and hint messages).
* The primary message should be short, factual, and avoid reference to
  implementation details such as specific function names.

Some messages have also been reworded for clarity and to better
conform with the last bullet above (short primary message). In other
cases, messages have been updated to fix references to, e.g., function
parameters that used the wrong parameter name.

Closes #2364
2020-10-20 16:49:32 +02:00
Brian Rowe
5acf3343b5 Ensure reltuples are preserved during compression
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path.  It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation.  It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.

Fixes #2524
2020-10-19 07:21:38 -07:00
Dmitry Simonenko
a51aa6d04b Move enterprise features to community
This patch removes enterprise license support and moves
move_chunk() function under community license (TSL).

Licensing validation code been reworked and simplified.
Previously used timescaledb.license_key guc been renamed to
timescaledb.license.

This change also makes testing code more strict against
used license. Apache test suite now can test only apache-licensed
functions.

Fixes #2359
2020-09-30 15:14:17 +03:00
Mats Kindahl
02ad8b4e7e Turn debug messages into DEBUG1
We have some debug messages that are printed as notices, but are more
suitable to have at `DEBUG1` level. This commit removes a notice about
indexes being added and turns it into a `DEBUG1` notice.
2020-09-29 11:04:07 +02:00
Brian Rowe
8e1e6036af Preserve pg_stats on chunks before compression
This change will ensure that the pg_statistics on a chunk are
updated immediately prior to compression. It also ensures that
these stats are not overwritten as part of a global or hypertable
targetted ANALYZE.

This addresses the issue that chunk will no longer generate valid
statistics durings an ANALYZE once the data's been moved to the
compressed table. Unfortunately any compressed rows will not be
captured in the parent hypertable's pg_statistics as there is no way
to change how PostGresQL samples child tables in PG11.

This approach assumes that the compressed table remains static, which
is mostly correct in the current implementation (though it is
possible to remove compressed segments). Once we start allowing more
operations on compressed chunks this solution will need to be
revisited. Note that in PG12 an approach leveraging table access
methods will not have a problem analyzing compressed tables.
2020-08-21 10:48:15 -07:00
Sven Klemm
7d230290b9 Remove unnecessary exports in tsl library
Since almost all the functions in the tsl library are accessed via
cross module functions there is no need to export the indivial
functions.
2020-08-17 18:58:18 +02:00
Sven Klemm
0d5f1ffc83 Refactor compress chunk policy
This patch changes the compression policy to store its configuration
in the bgw_job table and removes the bgw_policy_compress_chunks table.
2020-07-30 19:58:37 +02:00
Oleg Smirnov
0e9f1ee9f5 Enable compression for tables with compound foreign key
When enabling compression on a hypertable the existing
constraints are being cloned to the new compressed hypertable.
During validation of existing constraints a loop
through the conkey array is performed, and constraint name
is erroneously added to the list multiple times. This fix
moves the addition to the list outside the conkey loop.

Fixes #2000
2020-07-02 12:22:30 +02:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
Sven Klemm
c90397fd6a Remove support for PG9.6 and PG10
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:

remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
2020-06-02 23:48:35 +02:00
Stephen Polcyn
d1aacdccad Change compression locking order
This patch changes the order in which locks are taken during
compression to avoid taking strong locks for long periods on referenced
tables.

Previously, constraints from the uncompressed chunk were copied to the
compressed chunk before compressing the data. When the uncompressed
chunk had foreign key constraints, this resulted in a
ShareRowExclusiveLock being held on the referenced table for the
remainder of the transaction, which includes the (potentially long)
period while the data is compressed, and prevented any
INSERTs/UPDATEs/DELETEs on the referenced table during the remainder of
the time it took the compression transaction to complete.

Copying constraints after completing the actual data compression does
not pose safety issues (as any updates to referenced keys are caught by
the FK constraint on the uncompressed chunk), and it enables the
compression job to minimize the time during which strong locks are held
on referenced tables.

Fixes #1614.
2020-06-01 16:16:05 -04:00
Erik Nordström
ccc1018f44 Fix various linter-found issues
This fixes variuous issues found by clang-tidy. A number of
compression-related issues still remain, however.
2020-05-29 14:04:25 +02:00
Erik Nordström
1dd9314f4d Improve linting support with clang-tidy
This change replaces the existing `clang-tidy` linter target with
CMake's built-in support for it. The old way of invoking the linter
relied on the `run-clang-tidy` wrapper script, which is not installed
by default on some platforms. Discovery of the `clang-tidy` tool has
also been improved to work with more installation locations.

As a result, linting now happens at compile time and is enabled
automatically when `clang-tidy` is installed and found.

In enabling `clang-tidy`, several non-trivial issues were discovered
in compression-related code. These might be false positives, but,
until a proper solution can be found, "warnings-as-errors" have been
disabled for that code to allow compilation to succeed with the linter
enabled.
2020-05-29 14:04:25 +02:00
Erik Nordström
686860ea23 Support compression on distributed hypertables
Initial support for compression on distributed hypertables. This
_only_ includes the ability to run `compress_chunk` and
`decompress_chunk` on a distributed hypertable. There is no support
for automation, at least not beyond what one can do individually on
each data node.

Note that an access node keeps no local metadata about which
distributed hypertables have compressed chunks. This information needs
to be fetched directly from data nodes, although such functionality is
not yet implemented. For example, informational views on the access
nodes will not yet report the correct compression states for
distributed hypertables.
2020-05-27 17:31:09 +02:00
Erik Nordström
33f1601e6f Handle constraints, triggers, and indexes on distributed hypertables
In distributed hypertables, chunks are foreign tables and such tables
do not support (or should not support) indexes, certain constraints,
and triggers. Therefore, such objects should not recurse to foreign
table chunks nor add a mappings in the `chunk_constraint` or
`chunk_index` tables.

This change ensures that we properly filter out the indexes, triggers,
and constraints that should not recurse to chunks on distributed
hypertables.
2020-05-27 17:31:09 +02:00
Erik Nordström
596be8cda1 Add mappings table for remote chunks
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.

The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.

Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
2020-05-27 17:31:09 +02:00
Stephen Polcyn
b57d2ac388 Cleanup TODOs and FIXMEs
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.

test/sql/
- triggers.sql: Made required change

tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete

tsl/src/
- compression/compression.c:
  - row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
  - compressor_finish: Removed (obsolete)

src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
  - process_altertable_end_subcmd: Removed (PG will handle case)
2020-05-18 20:16:03 -04:00
Erik Nordström
28e9a443b3 Improve handling of "dropped" chunks
The internal chunk API is updated to avoid returning `Chunk` objects
that are marked `dropped=true` along with some refactoring, hardening,
and cleanup of the internal chunk APIs. In particular, apart from
being returned in a dropped state, chunks could also be returned in a
partial state (without all fields set, partial constraints,
etc.). None of this is allowed as of this change. Further, lock
handling was unclear when joining chunk metadata from different
catalog tables. This is made clear by having chunks built within
nested scan loops so that proper locks are held when joining in
additional metadata (such as constraints).

This change also fixes issues with dropped chunks that caused chunk
metadata to be processed many times instead of just once, leading to
potential bugs or bad performance.

In particular, since the introduction of the “dropped” flag, chunk
metadata can exist in two states: 1. `dropped=false`
2. `dropped=true`. When dropping chunks (e.g., via `drop_chunks`,
`DROP TABLE <chunk>`, or `DROP TABLE <hypertable>`) there are also two
modes of dropping: 1. DELETE row and 2. UPDATE row and SET
dropped=true.

The deletion mode and the current state of chunk lead to a
cross-product resulting in 4 cases when dropping/deleting a chunk:

1. DELETE row when dropped=false
2. DELETE row when dropped=true
3. UPDATE row when dropped=false
4. UPDATE row when dropped=true

Unfortunately, the code didn't distinguish between these cases. In
particular, case (4) should not be able to happen, but since it did it
lead to a recursing loop where an UPDATE created a new tuple that then
is recursed to in the same loop, and so on.

To fix this recursing loop and make the code for dropping chunks less
error prone, a number of assertions have been added, including some
new light-weight scan functions to access chunk information without
building a full-blown chunk.

This change also removes the need to provide the number of constraints
when scanning for chunks. This was really just a hint anyway, but this
is no longer needed since all constraints are joined in anyway.
2020-04-28 13:49:14 +02:00
Erik Nordström
0e9461251b Silence various compiler warnings
This change fixes various compiler warnings that show up on different
compilers and platforms. In particular, MSVC is sensitive to functions
that do not return a value after throwing an error since it doesn't
realize that the code path is not reachable.
2020-04-27 15:02:18 +02:00
Ruslan Fomkin
16897d2238 Drop FK constraints on chunk compression
Drop Foreign Key constraints from uncompressed chunks during the
compression. This allows to cascade data deletion in FK-referenced
tables to compressed chunks. The foreign key constrains are restored
during decompression.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
ed32d093dc Use table_open/close and PG aggregated directive
Fixing more places to use table_open and table_close introduced in
PG12. Unifies PG version directives to use aggregated macro.
2020-04-14 23:12:15 +02:00
Erik Nordström
36af23ec94 Use flags for cache query options
Cache queries support multiple optional behaviors, such as "missing
ok" (do not fail on cache miss) and "no create" (do not create a new
entry if one doesn't exist in the cache). With multiple boolean
parameters, the query API has become unwieldy so this change turns
these booleans into one flag parameter.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
1ddc62eb5f Refactor header inclusion
Correcting conditions in #ifdefs, adding missing includes, removing
and rearranging existing includes, replacing PG12 with PG12_GE for
forward compatibility. Fixed number of places with relation_close to
table_close, which were missed earlier.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
e57ee45fcf Replace general relation_open with specific
relation_open is a general function, which is called from more
specific functions per database type. This commit replaces them
with the specific functions, which control correct types.
2020-04-14 23:12:15 +02:00
Joshua Lockerman
949b88ef2e Initial support for PostgreSQL 12
This change includes a major refactoring to support PostgreSQL
12. Note that many tests aren't passing at this point. Changes
include, but are not limited to:

- Handle changes related to table access methods
- New way to expand hypertables since expansion has changed in
  PostgreSQL 12 (more on this below).
- Handle changes related to table expansion for UPDATE/DELETE
- Fixes for various TimescaleDB optimizations that were affected by
  planner changes in PostgreSQL (gapfill, first/last, etc.)

Before PostgreSQL 12, planning was organized something like as
follows:

 1. construct add `RelOptInfo` for base and appendrels
 2. add restrict info, joins, etc.
 3. perform the actual planning with `make_one_rel`

For our optimizations we would expand hypertables in the middle of
step 1; since nothing in the query planner before `make_one_rel` cared
about the inheritance children, we didn’t have to be too precises
about where we were doing it.

However, with PG12, and the optimizations around declarative
partitioning, PostgreSQL now does care about when the children are
expanded, since it wants as much information as possible to perform
partition-pruning. Now planning is organized like:

 1. construct add RelOptInfo for base rels only
 2. add restrict info, joins, etc.
 3. expand appendrels, removing irrelevant declarative partitions
 4. perform the actual planning with make_one_rel

Step 3 always expands appendrels, so when we also expand them during
step 1, the hypertable gets expanded twice, and things in the planner
break.

The changes to support PostgreSQL 12 attempts to solve this problem by
keeping the hypertable root marked as a non-inheritance table until
`make_one_rel` is called, and only then revealing to PostgreSQL that
it does in fact have inheritance children. While this strategy entails
the least code change on our end, the fact that the first hook we can
use to re-enable inheritance is `set_rel_pathlist_hook` it does entail
a number of annoyances:

 1. this hook is called after the sizes of tables are calculated, so we
    must recalculate the sizes of all hypertables, as they will not
    have taken the chunk sizes into account
 2. the table upon which the hook is called will have its paths planned
    under the assumption it has no inheritance children, so if it's a
    hypertable we have to replan it's paths

Unfortunately, the code for doing these is static, so we need to copy
them into our own codebase, instead of just using PostgreSQL's.

In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also
changed and are now planned in two stages:

- In stage 1, the statement is planned as if it was a `SELECT` and all
  leaf tables are discovered.
- In stage 2, the original query is planned against each leaf table,
  discovered in stage 1, directly, not part of an Append.

Unfortunately, this means we cannot look in the appendrelinfo during
UPDATE/DELETE planning, in particular to determine if a table is a
chunk, as the appendrelinfo is not at the point we wish to do so
initialized. This has consequences for how we identify operations on
chunks (sometimes for blocking and something for enabling
functionality).
2020-04-14 23:12:15 +02:00
Erik Nordström
a4fb0cec3f Cleanup compression-related errors
This change fixes a number of typos and issues with inconsistent
formatting for compression-related code. A couple of other fixes for
variable names, etc. have also been applied.
2020-03-11 13:27:16 +01:00
Sven Klemm
030443a8e2 Fix compressing interval columns
When trying to compress a chunk that had a column of datatype
interval delta-delta compression would be selected for the column
but our delta-delta compression does not support interval and
would throw an errow when trying to compress a chunk.

This PR changes the compression selected for interval to dictionary
compression.
2020-03-06 21:44:31 +01:00
gayyappan
565cca795a Support disabling compression when foreign keys are present
Fix failure with disabling compression when no
compressed chunks are present and the table has foreign
key constraints
2020-02-14 08:54:58 -05:00
Ruslan Fomkin
4dc0693d1f Unify error message if hypertable not found
Refactors multiple implementations of finding hypertables in cache
and failing with different error messages if not found. The
implementations are replaced with calling functions, which encapsulate
a single error message. This provides the unified error message and
removes need for copy-paste.
2020-01-29 08:10:27 +01:00
gayyappan
b1b840f00e Use timescaledb prefix for compression errors
Modify compression parameter related error messages
to make them consistent.
2020-01-28 09:08:58 -05:00
Matvey Arye
d52b48e0c3 Delete compression policy when drop hypertable
Previously we could have a dangling policy and job referring
to a now-dropped hypertable.

We also block changing the compression options if a policy exists.

Fixes #1570
2020-01-02 16:40:59 -05:00
Matvey Arye
6122e08fcb Fix error in compression constraint check
The constraint check previously assumed that the col_meta
offset for a column was equal to that columns attribute
offset. This is incorrect in the presence of dropped columns.

Fixed to match on column names.

Fixes #1590
2020-01-02 13:56:46 -05:00