Users might want to implement their own partitioning function
or use the legacy one included with TimescaleDB. This change
adds support for setting the partitioning function in
create_hypertable() and add_dimension().
Hash partitioning previously relied on coercing (casting) values to
strings before calculating a hash value, including creating CHECK
constraints with casts. This approach is fairly suboptimal from a
performance perspective and might have issues related to different
character encodings depending on system.
Hash partitioning now instead uses a partitioning function that takes
an anyelement type that calls type-dependent hash functions internal
to PostgreSQL. This should provide more efficient hashing both by
avoiding unnecessary string conversions and by using more optimal
type-specific hash functions.
Support for the previous hash partitioning function is preserved for
backwards compatibility. Hypertables created with the previous
function will continue to use to old hashing strategy, while new
tables will default to the updated hash partitioning.
For safety, this change also blocks changing types on hash-partitioned
columns, since it seems hard to guarantee the same hash result between
different types.
This is part of the ongoing effort to simplify the metadata tables and
removing any triggers on them that cause side effects.
This change includes the following:
- Remove the on_change_hypertable() trigger on the hypertable catalog
table.
- Remove the TRUNCATE blocking triggers on all metadata tables. If
we think such blocking is important, we should do this in an
event trigger or the processUtility hook.
- Put all SQL files in a single load_order.txt instead of splitting
across three distinct files. Now all SQL files are included in
update scripts as well for simplicity and consistency.
- As a result of removing triggers and related functions, the
setup_main() and restore_timescaledb() functions are no longer
needed. This also further simplifies the database restore process
as calling restore_timescaledb() is no longer needed (or possible).
- Refactor create_hypertable_row() to do more validation before
allocating a new hypertable ID. This avoids incrementing the serial
ID unnecessarily in case some validations fail.
This change refactors the chunk index handling to make better use
of standard PostgreSQL catalog information, while removing the
hypertable_index metadata table and associated triggers, including
those on the chunk_index table. The chunk_index table itself is
also simplified.
A benefit of this refactoring is that indexes are no longer
created using string mangling to construct the CREATE INDEX command
for a chunk, based on the string definition of the hypertable
index. Instead, indexes are created in C using proper index-related
internal data structures.
Chunk indexes can now also be renamed and are added in the parent
index tablespace. Changing tablespace on a hypertable index also
recurses to chunks, as expected. Default indexes that are added when
creating a hypertable use the hypertable's tablespace.
Creating Hypertable indexes with the CONCURRENTLY modifier is
currently blocked, due to unclear semantics regarding concurrent
creation over many tables, including how to deal with snapshots.
Applying triggers to chunks requires taking the definition
of a trigger on a hypertable and executing it on a chunk. Previously
this was done with string replacement in the trigger definition.
This was not especially safe, and thus we moved the logic to C
where we can do proper parsing/deparsing and replacement of the table
name. Another positive aspect is that we got rid of some DDL triggers.
Streamline code and remove triggers from chunk and
chunk_constraint. Lots of additional cleanup. Also removes need to CASCADE
hypertable drops (fixes#88).
This PR add support for primary-key, foreign-key, unique, and exclusion constraints.
Previously supported are CHECK and NOT NULL constraints. Now, foreign key
constraints where a hypertable references a plain table is support
(while vice versa, with a plain table references a hypertable, is still not).
Previously the default chunk time in microseconds was too large
for a SMALLINT or INTEGER field. Now, we only assign a default
value if the type is TIMESTAMP or TIMESTAMPTZ. Integer timestamps,
such as SMALLINT, INTEGER, and BIGINT, need to be explicitly set
since only the user knows what units the numbers represent.
Further, we check to make sure the chunk time interval is not too
large for SMALLINT and INTEGER so as to avoid confusing problems
later when the user goes to insert.
With this change, hypertables no longer rely on an INSERT trigger to
dispatch tuples to chunks. While an INSERT trigger worked well for
both INSERTs and COPYs, it caused issues with supporting some regular
triggers on hypertables, and didn't support RETURNING statements and
upserts (ON CONFLICT DO UPDATE).
INSERTs are now handled by modifying the plan for INSERT statements. A
custom plan node is inserted as a subplan to a ModifyTable plan node,
taking care of dispatching tuples to chunks by setting the result
table for every tuple scanned.
COPYs are handled by modifying the regular copy code. Unfortunately,
this required copying a significant amount of regular PostgreSQL
source code since there are no hooks to add modifications. However,
since the modifications are small it should be fairly easy to keep the
code in sync with upstream changes.
This adds support for all types of triggers on a hypertable except
INSERT AFTER. UPDATE and DELETE ROW triggers are automatically copied from
a hypertable onto the chunks. Therefore, any trigger defined on the
parent hypertable will apply to any row in any of the chunks as well.
STATEMENT level triggers and iNSERT triggers need not be copied in this
way.
A new public-facing API `add_dimension(table, column, ...)`
makes it possible to add additional dimensions (partitioning
columns) to a hypertable.
Currently, new dimension can only be added to empty tables.
The code has also been refactored with a corresponding
internal function that is called by both `add_dimension()`
and `create_hypertable()`.
Tablespaces can now be associated with a hypertable
using the new user-facing API attach_tablespace().
Alternatively, if the main table, which is to be
converted into a hypertable, was created with an
associated tablespace, that tablespace will
automatically be associated also with the hypertable.
Whenever a chunk is created, a tablespace will be
chosen from the ones associated with the chunk's
hypertable (if any). This is done in a way that ensures
consistency in one dimension. I.e., if a hypertable
has a closed (space) dimension with a fixed number
of slices (ranges), it will ensure that chunks that
fall in the same slice will alsp be stored in the same
tablespace.
If a hypertable has more than one closed dimension,
the first one will be used to assign tablespaces
to chunks. If the table has no closed dimensions, but
one or more open (time) dimensions, then the first
time dimension will be used. However, since open
dimensions do not have a fixed number of slices,
tablespaces will be assigned in a round-robbin
fashion as new slices are created. Still, chunks
in the same time slice will be stored in the same
tablespace.
Clean up the table schema to get rid of legacy tables and functionality
that makes it more difficult to provide an upgrade path.
Notable changes:
* Get rid of legacy tables and code
* Simplify directory structure for SQL code
* Simplify table hierarchy: remove root table and make chunk tables
* inherit directly from main table
* Change chunk table suffix from _data to _chunk
* Simplify schema usage: _timescaledb_internal for internal functions.
* _timescaledb_catalog for metadata tables.
* Remove postgres_fdw dependency
* Improve code comments in sql code