91 Commits

Author SHA1 Message Date
gayyappan
05319cd424 Support analyze of internal compression table
This commit modifies analyze behavior as follows:
1. When an internal compression table is analyzed,
statistics from the compressed chunk (such as page
count and tuple count) is used to update the
statistics of the corresponding chunk parent, if
it is missing.

2. Analyze compressed chunk instead of raw chunks
When the command ANALYZE <hypertable> is executed,
a) analyze uncompressed chunks and b) skip the raw chunk,
but analyze the compressed chunk.
2020-11-11 15:05:14 -05:00
Sven Klemm
97254783d4 Fix segfault in decompress_chunk for chunks with dropped columns
This patch fixes a segfault in decompress_chunk for chunks with dropped
columns. Since dropped columns don't exists in the compressed chunk
the values for those columns were undefined in the decompressed tuple
leading to a segfault when trying to build the heap tuple.
2020-11-10 10:13:45 +01:00
Brian Rowe
525e821055 Add missing increment for PG11 decompression
There is a bug in some versions of PG11 where ANALYZE was not
calling CommandCounterIncrement.  This is causing us to fail to
update pg_class statistics during compression for those versions.
To work around this, this change adds an explicit
CommandCounterIncrement call after ExecVacuum in PG11.

Fixes #2581
2020-10-20 11:54:35 -07:00
Erik Nordström
3cf9c857c4 Make errors and messages conform to style guide
Errors and messages are overhauled to conform to the official
PostgreSQL style guide. In particular, the following things from the
guide has been given special attention:

* Correct capitalization of first letter: capitalize only for hints,
  and detail messages.
* Correct handling of periods at the end of messages (should be elided
  for primary message, but not detail and hint messages).
* The primary message should be short, factual, and avoid reference to
  implementation details such as specific function names.

Some messages have also been reworded for clarity and to better
conform with the last bullet above (short primary message). In other
cases, messages have been updated to fix references to, e.g., function
parameters that used the wrong parameter name.

Closes #2364
2020-10-20 16:49:32 +02:00
Brian Rowe
5acf3343b5 Ensure reltuples are preserved during compression
This change captures the reltuples and relpages (and relallvisible)
statistics from the pg_class table for chunks immediately before
truncating them during the compression code path.  It then restores
the values after truncating, as there is no way to keep postgresql
from clearing these values during this operation.  It also properly
uses these values properly during planning, working around some
postgresql code which substitutes in arbitrary sizing for tables
which don't see to hold data.

Fixes #2524
2020-10-19 07:21:38 -07:00
Dmitry Simonenko
a51aa6d04b Move enterprise features to community
This patch removes enterprise license support and moves
move_chunk() function under community license (TSL).

Licensing validation code been reworked and simplified.
Previously used timescaledb.license_key guc been renamed to
timescaledb.license.

This change also makes testing code more strict against
used license. Apache test suite now can test only apache-licensed
functions.

Fixes #2359
2020-09-30 15:14:17 +03:00
Mats Kindahl
02ad8b4e7e Turn debug messages into DEBUG1
We have some debug messages that are printed as notices, but are more
suitable to have at `DEBUG1` level. This commit removes a notice about
indexes being added and turns it into a `DEBUG1` notice.
2020-09-29 11:04:07 +02:00
Brian Rowe
8e1e6036af Preserve pg_stats on chunks before compression
This change will ensure that the pg_statistics on a chunk are
updated immediately prior to compression. It also ensures that
these stats are not overwritten as part of a global or hypertable
targetted ANALYZE.

This addresses the issue that chunk will no longer generate valid
statistics durings an ANALYZE once the data's been moved to the
compressed table. Unfortunately any compressed rows will not be
captured in the parent hypertable's pg_statistics as there is no way
to change how PostGresQL samples child tables in PG11.

This approach assumes that the compressed table remains static, which
is mostly correct in the current implementation (though it is
possible to remove compressed segments). Once we start allowing more
operations on compressed chunks this solution will need to be
revisited. Note that in PG12 an approach leveraging table access
methods will not have a problem analyzing compressed tables.
2020-08-21 10:48:15 -07:00
Sven Klemm
7d230290b9 Remove unnecessary exports in tsl library
Since almost all the functions in the tsl library are accessed via
cross module functions there is no need to export the indivial
functions.
2020-08-17 18:58:18 +02:00
Sven Klemm
0d5f1ffc83 Refactor compress chunk policy
This patch changes the compression policy to store its configuration
in the bgw_job table and removes the bgw_policy_compress_chunks table.
2020-07-30 19:58:37 +02:00
Oleg Smirnov
0e9f1ee9f5 Enable compression for tables with compound foreign key
When enabling compression on a hypertable the existing
constraints are being cloned to the new compressed hypertable.
During validation of existing constraints a loop
through the conkey array is performed, and constraint name
is erroneously added to the list multiple times. This fix
moves the addition to the list outside the conkey loop.

Fixes #2000
2020-07-02 12:22:30 +02:00
gayyappan
b93b30b0c2 Add counts to compression statistics
Store information related to compressed and uncompressed row
counts after compressing a chunk. This is saved in
compression_chunk_size table.
2020-06-19 15:58:04 -04:00
Sven Klemm
c90397fd6a Remove support for PG9.6 and PG10
This patch removes code support for PG9.6 and PG10. In addition to
removing PG96 and PG10 macros the following changes are done:

remove HAVE_INT64_TIMESTAMP since this is always true on PG10+
remove PG_VERSION_SUPPORTS_MULTINODE
2020-06-02 23:48:35 +02:00
Stephen Polcyn
d1aacdccad Change compression locking order
This patch changes the order in which locks are taken during
compression to avoid taking strong locks for long periods on referenced
tables.

Previously, constraints from the uncompressed chunk were copied to the
compressed chunk before compressing the data. When the uncompressed
chunk had foreign key constraints, this resulted in a
ShareRowExclusiveLock being held on the referenced table for the
remainder of the transaction, which includes the (potentially long)
period while the data is compressed, and prevented any
INSERTs/UPDATEs/DELETEs on the referenced table during the remainder of
the time it took the compression transaction to complete.

Copying constraints after completing the actual data compression does
not pose safety issues (as any updates to referenced keys are caught by
the FK constraint on the uncompressed chunk), and it enables the
compression job to minimize the time during which strong locks are held
on referenced tables.

Fixes #1614.
2020-06-01 16:16:05 -04:00
Erik Nordström
ccc1018f44 Fix various linter-found issues
This fixes variuous issues found by clang-tidy. A number of
compression-related issues still remain, however.
2020-05-29 14:04:25 +02:00
Erik Nordström
1dd9314f4d Improve linting support with clang-tidy
This change replaces the existing `clang-tidy` linter target with
CMake's built-in support for it. The old way of invoking the linter
relied on the `run-clang-tidy` wrapper script, which is not installed
by default on some platforms. Discovery of the `clang-tidy` tool has
also been improved to work with more installation locations.

As a result, linting now happens at compile time and is enabled
automatically when `clang-tidy` is installed and found.

In enabling `clang-tidy`, several non-trivial issues were discovered
in compression-related code. These might be false positives, but,
until a proper solution can be found, "warnings-as-errors" have been
disabled for that code to allow compilation to succeed with the linter
enabled.
2020-05-29 14:04:25 +02:00
Erik Nordström
686860ea23 Support compression on distributed hypertables
Initial support for compression on distributed hypertables. This
_only_ includes the ability to run `compress_chunk` and
`decompress_chunk` on a distributed hypertable. There is no support
for automation, at least not beyond what one can do individually on
each data node.

Note that an access node keeps no local metadata about which
distributed hypertables have compressed chunks. This information needs
to be fetched directly from data nodes, although such functionality is
not yet implemented. For example, informational views on the access
nodes will not yet report the correct compression states for
distributed hypertables.
2020-05-27 17:31:09 +02:00
Erik Nordström
33f1601e6f Handle constraints, triggers, and indexes on distributed hypertables
In distributed hypertables, chunks are foreign tables and such tables
do not support (or should not support) indexes, certain constraints,
and triggers. Therefore, such objects should not recurse to foreign
table chunks nor add a mappings in the `chunk_constraint` or
`chunk_index` tables.

This change ensures that we properly filter out the indexes, triggers,
and constraints that should not recurse to chunks on distributed
hypertables.
2020-05-27 17:31:09 +02:00
Erik Nordström
596be8cda1 Add mappings table for remote chunks
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.

The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.

Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
2020-05-27 17:31:09 +02:00
Stephen Polcyn
b57d2ac388 Cleanup TODOs and FIXMEs
Unless otherwise listed, the TODO was converted to a comment or put
into an issue tracker.

test/sql/
- triggers.sql: Made required change

tsl/test/
- CMakeLists.txt: TODO complete
- bgw_policy.sql: TODO complete
- continuous_aggs_materialize.sql: TODO complete
- compression.sql: TODO complete
- compression_algos.sql: TODO complete

tsl/src/
- compression/compression.c:
  - row_compressor_decompress_row: Expected complete
- compression/dictionary.c: FIXME complete
- materialize.c: TODO complete
- reorder.c: TODO complete
- simple8b_rle.h:
  - compressor_finish: Removed (obsolete)

src/
- extension.c: Removed due to age
- adts/simplehash.h: TODOs are from copied Postgres code
- adts/vec.h: TODO is non-significant
- planner.c: Removed
- process_utility.c
  - process_altertable_end_subcmd: Removed (PG will handle case)
2020-05-18 20:16:03 -04:00
Erik Nordström
28e9a443b3 Improve handling of "dropped" chunks
The internal chunk API is updated to avoid returning `Chunk` objects
that are marked `dropped=true` along with some refactoring, hardening,
and cleanup of the internal chunk APIs. In particular, apart from
being returned in a dropped state, chunks could also be returned in a
partial state (without all fields set, partial constraints,
etc.). None of this is allowed as of this change. Further, lock
handling was unclear when joining chunk metadata from different
catalog tables. This is made clear by having chunks built within
nested scan loops so that proper locks are held when joining in
additional metadata (such as constraints).

This change also fixes issues with dropped chunks that caused chunk
metadata to be processed many times instead of just once, leading to
potential bugs or bad performance.

In particular, since the introduction of the “dropped” flag, chunk
metadata can exist in two states: 1. `dropped=false`
2. `dropped=true`. When dropping chunks (e.g., via `drop_chunks`,
`DROP TABLE <chunk>`, or `DROP TABLE <hypertable>`) there are also two
modes of dropping: 1. DELETE row and 2. UPDATE row and SET
dropped=true.

The deletion mode and the current state of chunk lead to a
cross-product resulting in 4 cases when dropping/deleting a chunk:

1. DELETE row when dropped=false
2. DELETE row when dropped=true
3. UPDATE row when dropped=false
4. UPDATE row when dropped=true

Unfortunately, the code didn't distinguish between these cases. In
particular, case (4) should not be able to happen, but since it did it
lead to a recursing loop where an UPDATE created a new tuple that then
is recursed to in the same loop, and so on.

To fix this recursing loop and make the code for dropping chunks less
error prone, a number of assertions have been added, including some
new light-weight scan functions to access chunk information without
building a full-blown chunk.

This change also removes the need to provide the number of constraints
when scanning for chunks. This was really just a hint anyway, but this
is no longer needed since all constraints are joined in anyway.
2020-04-28 13:49:14 +02:00
Erik Nordström
0e9461251b Silence various compiler warnings
This change fixes various compiler warnings that show up on different
compilers and platforms. In particular, MSVC is sensitive to functions
that do not return a value after throwing an error since it doesn't
realize that the code path is not reachable.
2020-04-27 15:02:18 +02:00
Ruslan Fomkin
16897d2238 Drop FK constraints on chunk compression
Drop Foreign Key constraints from uncompressed chunks during the
compression. This allows to cascade data deletion in FK-referenced
tables to compressed chunks. The foreign key constrains are restored
during decompression.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
ed32d093dc Use table_open/close and PG aggregated directive
Fixing more places to use table_open and table_close introduced in
PG12. Unifies PG version directives to use aggregated macro.
2020-04-14 23:12:15 +02:00
Erik Nordström
36af23ec94 Use flags for cache query options
Cache queries support multiple optional behaviors, such as "missing
ok" (do not fail on cache miss) and "no create" (do not create a new
entry if one doesn't exist in the cache). With multiple boolean
parameters, the query API has become unwieldy so this change turns
these booleans into one flag parameter.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
1ddc62eb5f Refactor header inclusion
Correcting conditions in #ifdefs, adding missing includes, removing
and rearranging existing includes, replacing PG12 with PG12_GE for
forward compatibility. Fixed number of places with relation_close to
table_close, which were missed earlier.
2020-04-14 23:12:15 +02:00
Ruslan Fomkin
e57ee45fcf Replace general relation_open with specific
relation_open is a general function, which is called from more
specific functions per database type. This commit replaces them
with the specific functions, which control correct types.
2020-04-14 23:12:15 +02:00
Joshua Lockerman
949b88ef2e Initial support for PostgreSQL 12
This change includes a major refactoring to support PostgreSQL
12. Note that many tests aren't passing at this point. Changes
include, but are not limited to:

- Handle changes related to table access methods
- New way to expand hypertables since expansion has changed in
  PostgreSQL 12 (more on this below).
- Handle changes related to table expansion for UPDATE/DELETE
- Fixes for various TimescaleDB optimizations that were affected by
  planner changes in PostgreSQL (gapfill, first/last, etc.)

Before PostgreSQL 12, planning was organized something like as
follows:

 1. construct add `RelOptInfo` for base and appendrels
 2. add restrict info, joins, etc.
 3. perform the actual planning with `make_one_rel`

For our optimizations we would expand hypertables in the middle of
step 1; since nothing in the query planner before `make_one_rel` cared
about the inheritance children, we didn’t have to be too precises
about where we were doing it.

However, with PG12, and the optimizations around declarative
partitioning, PostgreSQL now does care about when the children are
expanded, since it wants as much information as possible to perform
partition-pruning. Now planning is organized like:

 1. construct add RelOptInfo for base rels only
 2. add restrict info, joins, etc.
 3. expand appendrels, removing irrelevant declarative partitions
 4. perform the actual planning with make_one_rel

Step 3 always expands appendrels, so when we also expand them during
step 1, the hypertable gets expanded twice, and things in the planner
break.

The changes to support PostgreSQL 12 attempts to solve this problem by
keeping the hypertable root marked as a non-inheritance table until
`make_one_rel` is called, and only then revealing to PostgreSQL that
it does in fact have inheritance children. While this strategy entails
the least code change on our end, the fact that the first hook we can
use to re-enable inheritance is `set_rel_pathlist_hook` it does entail
a number of annoyances:

 1. this hook is called after the sizes of tables are calculated, so we
    must recalculate the sizes of all hypertables, as they will not
    have taken the chunk sizes into account
 2. the table upon which the hook is called will have its paths planned
    under the assumption it has no inheritance children, so if it's a
    hypertable we have to replan it's paths

Unfortunately, the code for doing these is static, so we need to copy
them into our own codebase, instead of just using PostgreSQL's.

In PostgreSQL 12, UPDATE/DELETE on inheritance relations have also
changed and are now planned in two stages:

- In stage 1, the statement is planned as if it was a `SELECT` and all
  leaf tables are discovered.
- In stage 2, the original query is planned against each leaf table,
  discovered in stage 1, directly, not part of an Append.

Unfortunately, this means we cannot look in the appendrelinfo during
UPDATE/DELETE planning, in particular to determine if a table is a
chunk, as the appendrelinfo is not at the point we wish to do so
initialized. This has consequences for how we identify operations on
chunks (sometimes for blocking and something for enabling
functionality).
2020-04-14 23:12:15 +02:00
Erik Nordström
a4fb0cec3f Cleanup compression-related errors
This change fixes a number of typos and issues with inconsistent
formatting for compression-related code. A couple of other fixes for
variable names, etc. have also been applied.
2020-03-11 13:27:16 +01:00
Sven Klemm
030443a8e2 Fix compressing interval columns
When trying to compress a chunk that had a column of datatype
interval delta-delta compression would be selected for the column
but our delta-delta compression does not support interval and
would throw an errow when trying to compress a chunk.

This PR changes the compression selected for interval to dictionary
compression.
2020-03-06 21:44:31 +01:00
gayyappan
565cca795a Support disabling compression when foreign keys are present
Fix failure with disabling compression when no
compressed chunks are present and the table has foreign
key constraints
2020-02-14 08:54:58 -05:00
Ruslan Fomkin
4dc0693d1f Unify error message if hypertable not found
Refactors multiple implementations of finding hypertables in cache
and failing with different error messages if not found. The
implementations are replaced with calling functions, which encapsulate
a single error message. This provides the unified error message and
removes need for copy-paste.
2020-01-29 08:10:27 +01:00
gayyappan
b1b840f00e Use timescaledb prefix for compression errors
Modify compression parameter related error messages
to make them consistent.
2020-01-28 09:08:58 -05:00
Matvey Arye
d52b48e0c3 Delete compression policy when drop hypertable
Previously we could have a dangling policy and job referring
to a now-dropped hypertable.

We also block changing the compression options if a policy exists.

Fixes #1570
2020-01-02 16:40:59 -05:00
Matvey Arye
6122e08fcb Fix error in compression constraint check
The constraint check previously assumed that the col_meta
offset for a column was equal to that columns attribute
offset. This is incorrect in the presence of dropped columns.

Fixed to match on column names.

Fixes #1590
2020-01-02 13:56:46 -05:00
Matvey Arye
2c594ec6f9 Keep catalog rows for some dropped chunks
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).

If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
2019-12-30 09:10:44 -05:00
Matvey Arye
d9d1a44d2e Refactor chunk handling to separate out stub
Previously, the Chunk struct was used to represent both a full
chunk and the stub used for joins. The stub used for joins
only contained valid values for some chunk fields and not others.
After the join determined that a Chunk was complete, it filled
in the rest of the chunk field. The fact that a chunk could have
only some fields filled out and not others at different times,
made the code hard to follow and error prone.

So we separate out the stub state of the chunk into a separate
struct that doesn't contain the not-filled-out fields inside
of it.  This leverages the type system to prevent errors that
try to access invalid fields during the join phase and makes
the code easier to follow.
2019-12-06 15:04:51 -05:00
Joshua Lockerman
48ef701fa9 Set toast_tuple_target to 128B when able
We want compressed data to be stored out-of-line whenever possible so
that the headers are colocated and scans on the metadata and segmentbys
are cheap. This commit lowers toast_tuple_target to 128 bytes, so that
more tables will have this occur; using the default size, very often a
non-trivial portion of the data ends up in the main table, and only
very few rows are stored in a page.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
efb131dd6f Add missing tests discovered by Codecov 2
This commit adds tests for DATE, TIMESTAMP, and FLOAT compression and
decompression, NULL compression and decompression in dictionaries and
fixes a bug where the database would refuse to decompress DATEs. This
commit also removes the fallback allowing any binary compatible 8-byte
types to be compressed by our integer compressors as I believe I found
a bug in said fallback last time I reviewed it, and cannot recall what
the bug was. These can be re-added later, with appropriate tests.
2019-10-29 19:02:58 -04:00
Sven Klemm
e2df62c81c Fix transparent decompression interaction with first/last
Queries with the first/last optimization on compressed chunks
would not properly decompress data but instead access the uncompressed
chunk. This patch fixes the behaviour and also unifies the check
whether a hypertable has compression.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
07841670a7 Fix issues discovered by coverity
This commit fixes issues reported by coverity. Of these, the only real
issue is an integer overflow in bitarray, which can never happen in its
current usages. This also adds a PG_USED_FOR_ASSERTS_ONLY for a
variable only used for Assert.
2019-10-29 19:02:58 -04:00
Matvey Arye
85d30e404d Add ability to turn off compression
Since enabling compression creates limits on the hypertable
(e.g. types of constraints allowed) even if there are no
compressed chunks, we add the ability to turn off compression.
This is only possible if there are no compressed chunks.
2019-10-29 19:02:58 -04:00
Matvey Arye
2fe51d2735 Improve (de)compress_chunk API
This commit improves the API of compress_chunk and decompress_chunk:

- have it return the chunk regclass processed (or NULL in the
  idempotent case);
- mark it as STRICT
- add if_not_compressed/if_compressed options for idempotency
2019-10-29 19:02:58 -04:00
Matvey Arye
92aa77247a Improve minor UIUX
Some small improvements:

- allow alter table with empty segment by if the original definition
  had an empty segment by. Improve error msgs.
- block compression on tables with OIDs
- block compression on tables with RLS
2019-10-29 19:02:58 -04:00
Matvey Arye
b8a98c1f18 Make compressed chunks use same tablespace as uncompressed
For tablepaces with compressed chunks the semantics are the following:
  - compressed chunks get put into the same tablespace as the
    uncommpressed chunk on compression.
 - set tablespace on uncompressed hypertable cascades to compressed hypertable+chunks
 - set tablespace on all chunks is blocked (same as w/o compression)
 - move chunks on a uncompressed chunk errors
 - move chunks on compressed chunk works

In the future we will:
 - add tablespace option to compress_chunk function and policy (this will override the setting
   of the uncompressed chunk). This will allow changing tablespaces upon compression
 - Note: The current plan is to never listen to the setting on compressed hypertable. In fact,
   we will block setting tablespace on  compressed hypertables
2019-10-29 19:02:58 -04:00
Joshua Lockerman
91a73c3e17 Set statistics on compressed chunks
The statistics on segmentby and metadata columns are very important as
they affect the decompressed data a thousand-fold. Statistics on the
compressed columns are irrelevant, as the regular postgres planner
cannot understand the compressed columns. This commit sets the
statistics for compressed tables based on this, weighting the
uncompressed columns greatly, and the compressed columns not-at-all.
2019-10-29 19:02:58 -04:00
gayyappan
72588a2382 Restrict constraints on compressed hypertables.
Primary and unqiue constraints are limited to segment_by and order_by
columns and foreign key constraints are limited to segment_by columns
when creating a compressed hypertable. There are no restrictions on
check constraints.
2019-10-29 19:02:58 -04:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
6687189a6c Free memory earlier in decompress_chunk
This was supposed to be part of an earlier commit, but seems to have
been lost. This should reduce peak memory usage of that function.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
64f56d5088 Create indexes on segmentby columns
This commit creates indexes on all segmentby columns of the compressed
hypertable.
2019-10-29 19:02:58 -04:00