Add bool created to return value of create_hypertable and add_dimension.
When if_not_exists is true and creation is skipped because the object
already exists created will be false, otherwise it will be true. This
modifies the functions to return meta data even when no object was
created.
Previously, we did not intercept the ALTER SCHEMA [name] RENAME command, which meant all Timescale catalog tables that store a schema name were not getting updated properly. This caused problems when users tried to drop a hypertable in a renamed schema, but also would have also caused problems with other commands (which we now add tests for in this PR).
Change create_hypertable to return a record consisting of
(hypertable_id, schema_name, table_name). This improves user feedback
about success of the operation but also gives the function an API
returning useful information for non-human consumption.
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
This fixes a number of compilation and test issues when OpenSSL is not
available. While we still default to OpenSSL enabled, we allow
explicitly setting -DUSE_OPENSSL=false to compile and run tests
without OpenSSL installed. But if not specified, CMake will fail if
OpenSSL is not available on the system.
Adding the telemetry BGW and all auxiliary functions, such as generating a UUID, creating the internal metadata
table for storing UUIDs, and parsing the server-side response with the latest version of TimescaleDB.
Previously, the pg_dump test was broken because it is not possible to reference psql variables
from inside bash commands run through psql. This is fixed by hardcoding the username passed to
the bash commands inside the test.
Also, we changed the insert block trigger preventing inserts into hypertable to a non-internal
trigger, because internal triggers are not dumped by pg_dump. We need to dump the trigger so that
it is already in place after a pg_restore, to prevent users from accidentally inserting rows into
a hypertable while timescaledb_restoring=on.
Use the postgres macro instead of manual subtraction everywhere to be more readable,
and gain the Assert.
Done with
`find ./src -type f -exec sed -i -e 's/Anum_\(.*\) - 1/AttrNumberGetAttrOffset(Anum_\1)/' {} \;`
When adaptive chunking is enabled, and no `chunk_time_interval` is
set, it is better to start with a small chunk size rather than a too
big, since it will adapt faster. This change sets the
`chunk_time_interval` to 1 day if adaptive chunking is enabled on a
hypertable. Note that this only happens if adaptive chunking is
enabled when `create_hypertable()` is called. Otherwise, the existing
`chunk_time_interval` will be used.
Adaptive chunking uses the min and max value of previous chunks
to estimate their "fill factor". Ideally, min and max should be
retreived using an index, but if no index exists we fall back
to a heap scan. A heap scan can be very expensive, so we now
raise a WARNING if no index exists.
This change also renames set_adaptive_chunk_sizing() to simply
set_adaptive_chunking().
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.
Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
A hypertable's root table should never have any tuples, but it can
acquire tuples by accident if the TimescaleDB extension is not
preloaded or `timescaledb.restoring` is set to ON.
To avoid the above issue, a hypertable's root table now has a
(internal) trigger that generates an error when tuples are
inserted. This preserves the integrity of the hypertable even when
restoring or the extension is not preloaded.
An internal trigger has the advantage of being mostly transparent to
users (e.g., it doesn't show up with \d) and it is not inherited by
chunks, so it needs no special handling to avoid adding it to chunks.
The blocking trigger is added in the update script and if it is
detected that the hypertable's root table has data in the process, the
update will fail with an error and instructions on how to fix.
This refactor does three things:
1) Upgrades the lock taken to AccessExclusive. This is
to prevent upgrading locks during data migration.
2) Explicitly release lock in the IF NOT EXISTS case.
This is more inline with what PG itself does. Also,
optimize the easy IF NOT EXISTS case.
3) Exposes a rel inside create_hypertable itself
so that checks can use one rel instead of opening and closing
a bunch of them.
Constraints with the NO INHERIT option does not make sense on a
hypertable's root table since these will not be enforced.
Previously, NO INHERIT constraints were blocked on chunks, and were
thus not enforced until chunk creation time, allowing creation of NO
INHERIT constraints on empty hypertables, but then causing failure at
chunk-creation time. Instead, NO INHERIT constraints are now properly
blocked at the hypertable level.
Previously, cache lookups were run on the cache's memory
context. While simple, this risked allocating transient (work) data on
that memory context, e.g., when scanning for new cache data during
cache misses.
This change makes scan functions take a memory context, which the
found data should be allocated on. All other data is allocated on the
current memory (typically the transaction's memory context). With this
functionality, a cache can pass its memory context to the scan, thus
avoiding taking on unnecessary memory allocations.
This PR moves table, schema, and trigger drop handling into the event
trigger system. The event trigger system is a more reliable method of
intercepting object drops especially as they can CASCADE via other
object drops.
This PR also adds a test for DROP OWNED which was previously broken.
Tables can now hold existing data, which is optionally migrated from
the main table to chunks when create_hypertable() is called.
The data migration is similar to the COPY path, with the single
difference that the inserted/copied tuples come from an existing table
instead of being read from a file. After the data has been migrated,
the main table is truncated.
One potential downside of this approach is that all of this happens in
a single transaction, which means that the table is blocked while
migration is ongoing, preventing inserts by other transactions.
This Fixes at least two bugs:
1) A drop of a referenced table used to drop the associated
FK constraint but not the metadata associated with the constraint.
Fixes#43.
2) A drop of a column removed any indexes associated with the column
but not the metadata associated with the index.
This change improves the handling of tablespaces as follows:
- Add if_not_attached / if_attached options to attach_tablespace() and
detach_tablespace(), respectively
- Block DROP tablespace if it is still attached to a table
- Block REVOKE if it means the table owner no longer has CREATE
permissions on an attached tablespace
- Make error messages follow the PostgreSQL style guide
Dropping a schema that a hypertable depends on should clean up
dependent metadata. There are two schemas that matter for hypertables:
the hypertable's schema and the associated schema where chunks are
stored.
This change deals with the above as follows:
- If the hypertable schema is dropped, the hypertable and all chunks
should be deleted as well, including metadata.
- If an associated schema is dropped, the hypertables that use that
associated schema will have their associated schemas reset to the
internal schema.
- Even if no hypertable currently uses the dropped schema as their
associated schema, there might be chunks that reside in the dropped
schema (e.g., if the associated schema was changed for their
hypertables), so those chunks should have the metadata deleted.
Previously stdint.h was not included on Windows so INT16_MAX and
friends were not defined. Additionally, having tablespace_attach
with PG_FUNCTION_ARGS in the header file caused issues during
linking, so a direct call version of the function is now exported
for others to use instead of the PG_FUNCTION_ARGS version.
Two minor warnings regarding not having a return in all cases are
also addressed.
Deletes on metadata in the TimescaleDB catalog has so far been a mix
of native deletes using the C-based catalog API and SQL-based DELETE
statements that CASCADEs.
This mixed environment is confusing, and SQL-based DELETEs do not
consistently clean up objects that are related to the deleted
metadata.
This change moves towards A C-based API for deletes that consistently
deletes also the dependent objects (such as indexes, tables and
constraints). Ideally, we should prohobit direct manipulation of
catalog tables using SQL statements to avoid ending up in a bad state.
Once all catalog manipulations happend via the native API, we can also
remove the cache invalidation triggers on the catalog tables.
This is a continuation of prior efforts to refactor API functions in C
to:
- improve usage of proper error codes
- use error messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
- move towards doing all metadata updates using a consistent catalog API
Most importantly, `create_hypertable()` has been refactored in C,
which simplifies a lot of code that previously required
upcalls/downcalls between C code and plpgsql code, or duplicated
functionality between the two environments.
The functions for adding and updating dimensions have been refactored
in C to:
- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.
A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
Currently, chunk indexes are always created in the tablespace of the
index on the main table (which could be none/default one), even if the
chunks themselves are created in different tablespaces. This is
problematic in a multi-disk setting where each disk is a separate
tablespace where chunks are placed. The chunk indexes might exhaust
the space on the common (often default tablespace) which might not
have a lot of disk space. This also prohibits the database, including
index storage to grow by adding new tablespaces.
Instead, chunk indexes are now created in the "next" tablespace after
that of their chunks to both spread indexes across tablespaces and
avoid colocating indexes with their chunks (for I/O throughput
reasons). To optionally avoid this spreading, one can pin chunk
indexes to a specific tablespace by setting an explicit tablespace on
a main table index.
Source code indentation has been updated in PostgreSQL 10 to fix a
number of issues. This update applies this new indentation to the
entire code base.
The new indentation requires a new version of pg_bsd_indent, which can
be found here:
https://git.postgresql.org/git/pg_bsd_indent.git
We add better accounting for number of items stored in a subspace
to allow better pruning. Instead of pruning based on the number of
dimension_slices in subsequent dimensions we now track number of total
items in the subspace store and prune based on that.
We add two GUC variables:
1) max_open_chunks_per_insert (default work_mem in bytes / 512. This
assumes an entry is 512 bytes)
2) max_cached_chunks_per_hypertable (default 100). Maximum cached chunks per
hypertable.
Previously, the cache in chunk_dispatch was limited to only hold
the chunk_insert_state for the last time dimension as a consequence
of logic in subspace_store. This has now been relaxed so that a
chunk_dispatch holds the cache for any chunk_insert_states that it
encounters. Logic for the hypertable chunk cache has not been changed.
The rule that we should follow is to limit the subspace store size for
caches that survive across commands. But caches within commands can be
allowed to grow.
A hypertable's associated schema is used to create and store internal
data tables (chunks). A hypertable creates tables in that schema,
typically with full superuser permissions, regardless of whether the
hypertable's owner or the current user have permissions for the schema.
If the schema doesn't exist, the hypertable will create it when
creating the first chunk, even though the user or table owner does
not have permissions to create schemas in the database.
This change adds proper permissions checks to create_hypertable() so
that users cannot create hypertables with a custom associated schema
unless they have the proper permissions on the schema or the database.
Chunks are also no longer created with internal schema permissions if
the associated schema is something different from the internal schema.
This change is part of an effort to create a consistent way
of dealing with metadata catalog updates, which is currently
a mix of C API and INSERT/UPDATE/DELETE statements from SQL
code. This mix makes catalog handling unnecessarily complex as
there are multiple ways to update metadata, increasing the risk
of security issues with publically exposed SQL functions. It also
complicates things like cache invalidation, requiring different
mechanisms for C and SQL code. Catalog updates from SQL code
require triggers on metadata tables for cache invalidation that
do not work with native catalog updates.
The creation of chunks has been particularly messy in this regard,
making the code hard to follow. Especially the handling of a chunk's
constraints, where dimensional and other constraints were handled
differently. With this change, constraint handling is now consistent
across constraint types with a single API for updating metadata.
Reduce memory usage for out-of-order inserts
The chunk_result_relation_info should be put on the chunk memory
context. This will cause the rri constraint expr to also go onto
that context and be correctly freed when the chunk insert state
is destroyed.
A hypertable's tablespaces are now always retrieved from
the tablespace metadata table instead of being cached
with the hypertable. This avoids having to do cache invalidation
when updating the tablespace table.
Tablespaces can now be detached from hypertables using
`tablespace_detach()`. This function can either detach
a tablespace from all tables or only a specific table.
Having the ability to detach tablespace allows more
advanced storage management, for instance, one can detach
tablespaces that are running low on diskspace while attaching
new ones to replace the old ones.
Attaching tablespaces to hypertables is now handled
in native code, with improved permissions checking and
caching of tablespaces in the Hypertable data object.
Windows 64-bit binaries should now be buildable using the cmake
build system either from the command line or from Visual Studio.
Previous issues regarding unresolved symbols have been resolved
with compatibility header files to properly export symbols or
getting GUCs via normal APIs.
This change reduces the usage of SECURITY DEFINER on SQL
functions and fixes related permissions issues. It also
properly checks hypertable permissions relative the current_user
instead of the session_user, which otherwise breaks SET ROLE,
among other things.
reindex allows you to reindex the indexes of only certain chunks,
filtering by time. This is a common use case because a user may
want to reindex chunks after they are no longer getting new data once.
reindex also has a recreate option which will not use REINDEX
but will rather CREATE INDEX a new index and then
DROP INDEX / RENAME new_index to old_name. This approach has advantages
in terms of blocking reads for a much shorter period of time. However,
it does more work and will use more disk space during the operation.
The extension now works with PostgreSQL 10, while
retaining compatibility with version 9.6.
PostgreSQL 10 has numerous internal changes to functions and
APIs, which necessitates various glue code and compatibility
wrappers to seamlessly retain backwards compatiblity with older
versions.
Test output might also differ between versions. In particular,
the psql client generates version-specific output with `\d` and
EXPLAINs might differ due to new query optimizations. The test
suite has been modified as follows to handle these issues. First,
tests now use version-independent functions to query system
catalogs instead of using `\d`. Second, changes have been made to
the test suite to be able to verify some test outputs against
version-dependent reference files.
The chunk cache needs to free chunk memory as
it evicts chunks from the cache. This was previously
done by pfree:ing the chunk memory, but this didn't
account for sub-allocated objects, like the chunk's
hypercube. This lead to some chunk objects remaining
in the cache's memory context, thus inflating memory
usage, although the objects were no longer associated
with a chunk.
This change adds a per-chunk memory context in the cache
that allows all chunk memory to be easily freed when
the cache entry is evicted or when the chunk cache
is destroyed.
This change refactors the chunk index handling to make better use
of standard PostgreSQL catalog information, while removing the
hypertable_index metadata table and associated triggers, including
those on the chunk_index table. The chunk_index table itself is
also simplified.
A benefit of this refactoring is that indexes are no longer
created using string mangling to construct the CREATE INDEX command
for a chunk, based on the string definition of the hypertable
index. Instead, indexes are created in C using proper index-related
internal data structures.
Chunk indexes can now also be renamed and are added in the parent
index tablespace. Changing tablespace on a hypertable index also
recurses to chunks, as expected. Default indexes that are added when
creating a hypertable use the hypertable's tablespace.
Creating Hypertable indexes with the CONCURRENTLY modifier is
currently blocked, due to unclear semantics regarding concurrent
creation over many tables, including how to deal with snapshots.
The ProcessUtility hook doesn't give any information on applied DDL
commands, which makes it hard to implement DDL processing that
requires the result of a DDL command on a hypertable (for instance,
adding a constraint or index without an explicit name).
This change splits the DDL processing over start and end hooks,
handling DDL commands before and after regular PostgreSQL processing,
respectively.
The start DDL hook is still based on the ProcessUtility hook, while
the end DDL hook is based on an event trigger that allows getting
information on the created/dropped/altered objects.