This adds an internal API function to create a chunk using explicit
constraints (dimension slices). A function to export a chunk in a
format consistent with the chunk creation function is also added.
The chunk export/create functions are needed for distributed
hypertables so that an access node can create chunks on data nodes
according to its own (global) partitioning configuration.
Cache queries support multiple optional behaviors, such as "missing
ok" (do not fail on cache miss) and "no create" (do not create a new
entry if one doesn't exist in the cache). With multiple boolean
parameters, the query API has become unwieldy so this change turns
these booleans into one flag parameter.
This change refactors our main planner hooks in `planner.c` with the
intention of providing a consistent way to classify planned relations
across hooks. In our hooks, we'd like to know whether a planned
relation (`RelOptInfo`) is one of the following:
* Hypertable
* Hypertable child (a hypertable can appear as a child of itself)
* Chunk as a child of hypertable (from expansion)
* Chunk as standalone (operation directly on chunk)
* Any other relation
Previously, there was no way to consistently know which of these one
was dealing with. Instead, a mix of various functions was used without
"remembering" the classification for reuse in later sections of the
code.
When classifying relations according to the above categories, the only
source of truth about a relation is our catalog metadata. In case of
hypertables, this is cached in the hypertable cache. However, this
cache is read-through, so, in case of a cache miss, the metadata will
always be scanned to resolve a new entry. To avoid unnecessary
metadata scans, this change introduces a way to do cache-only
queries. This requires maintaining a single warmed cache throughout
planning and is enabled by using a planner-global cache object. The
pre-planning query processing warms the cache by populating it with
all hypertables in the to-be-planned query.
Refactors multiple implementations of finding hypertables in cache
and failing with different error messages if not found. The
implementations are replaced with calling functions, which encapsulate
a single error message. This provides the unified error message and
removes need for copy-paste.
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.
INSERT path:
On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
invalidated (that is, newly inserted, updated, or deleted) to
_timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
will be used to re-materialize these ranges, to ensure that the aggregate
is up-to-date. Currently these invalidations are recorded in by a trigger
_timescaledb_internal.continuous_agg_invalidation_trigger, which should be
added to the hypertable when the continuous aggregate is created. This trigger
stores a cache of min/max values per-hypertable, and on transaction commit
writes them to the log, if needed. At the moment, we consider them to always
be needed, unless we're in ReadCommitted mode or weaker, and the min
invalidated value is greater than the hypertable's invalidation threshold
(found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)
Materialization path:
Materialization currently happens in multiple phase: in phase 1 we determine
the timestamp at which we will end the new set of materializations, then we
update the hypertable's invalidation threshold to that point, and finally we
read the current invalidations, then materialize any invalidated rows, the new
range between the continuous aggregate's completed threshold (found in
_timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
invalidation threshold. After all of this is done we update the completed
threshold to the invalidation threshold. The portion of this protocol from
after the invalidations are read, until the completed threshold is written
(that is, actually materializing, and writing the completion threshold) is
included with this commit, with the remainder to follow in subsequent ones.
One important caveat is that since the thresholds are exclusive, we invalidate
all values _less_ than the invalidation threshold, and we store timevalue
as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
invalidated. To avoid this problem, we never materialize the time bucket
containing PG_INT64_MAX.
New cluster-like command which writes to a new index than swaps,
much like is done for the data table, and only acquires
exclusive locks for said swap. This trades off disk usage for
lower contention: we hold locks for a much lower period of time,
allowing reads to work concurrently, but we have both the old
and new versions of the table existing at once, approximately
doubling storage usage while reorder is running.
Currently only works on chunks.
Introduce PG11 support by introducing compatibility functions for
any whose signatures have changed in PG11. Additionally, refactor
the structure of the compatibility functions found in compat.h by
breaking them out by function (or small set of similar functions)
so that it is easier to see what changed between versions and maintain
changes as more versions are supported.
In general, the philosophy has been to try for forward compatibility
wherever possible, so that we use the latest versions of function interfaces
where we can or where reasonably convenient and mimic the behavior
in older versions as much as possible.
Future proofing: if we ever want to make our functions available to
others they’d need to be prefixed to prevent name collisions. In
order to avoid having some functions with the ts_ prefix and
others without, we’re adding the prefix to all non-static
functions now.
This change adds proper result types for the scanner's filter and
tuple handling callbacks. Previously, these callbacks were supposed to
return bool, which was hard to interpret. For instance, for the tuple
handler callback, true meant continue processing the next tuple while
false meant finish the scan. However, this wasn't always clear. Having
proper return types also makes it easier to see from a function's
signature that it is a scanner callback handler, rather than some
other function that can be called directly.
Previously, cache lookups were run on the cache's memory
context. While simple, this risked allocating transient (work) data on
that memory context, e.g., when scanning for new cache data during
cache misses.
This change makes scan functions take a memory context, which the
found data should be allocated on. All other data is allocated on the
current memory (typically the transaction's memory context). With this
functionality, a cache can pass its memory context to the scan, thus
avoiding taking on unnecessary memory allocations.
The functions for adding and updating dimensions have been refactored
in C to:
- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.
A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
Attaching tablespaces to hypertables is now handled
in native code, with improved permissions checking and
caching of tablespaces in the Hypertable data object.
TimescaleDB cache invalidation happens as a side effect of doing a
full SQL statement (INSERT/UPDATE/DELETE) on a catalog table (via
table triggers). However, triggers aren't invoked when using
PostgreSQL's internal catalog API for updates, since PostgreSQL's
catalog tables don't have triggers that require full statement
parsing, planning, and execution.
Since we are now using the regular PostgreSQL catalog update API for
some TimescaleDB catalog operations, we need to do cache invalidation
also on such operations.
This change adds cache invalidation when updating catalogs using the
internal (C) API and also makes the cache invalidation more fine
grained. For instance, caches are no longer invalidated on some
INSERTS that do not affect the validity of objects already in the
cache, such as adding a new chunk.
The extension now works with PostgreSQL 10, while
retaining compatibility with version 9.6.
PostgreSQL 10 has numerous internal changes to functions and
APIs, which necessitates various glue code and compatibility
wrappers to seamlessly retain backwards compatiblity with older
versions.
Test output might also differ between versions. In particular,
the psql client generates version-specific output with `\d` and
EXPLAINs might differ due to new query optimizations. The test
suite has been modified as follows to handle these issues. First,
tests now use version-independent functions to query system
catalogs instead of using `\d`. Second, changes have been made to
the test suite to be able to verify some test outputs against
version-dependent reference files.
This change refactors the chunk index handling to make better use
of standard PostgreSQL catalog information, while removing the
hypertable_index metadata table and associated triggers, including
those on the chunk_index table. The chunk_index table itself is
also simplified.
A benefit of this refactoring is that indexes are no longer
created using string mangling to construct the CREATE INDEX command
for a chunk, based on the string definition of the hypertable
index. Instead, indexes are created in C using proper index-related
internal data structures.
Chunk indexes can now also be renamed and are added in the parent
index tablespace. Changing tablespace on a hypertable index also
recurses to chunks, as expected. Default indexes that are added when
creating a hypertable use the hypertable's tablespace.
Creating Hypertable indexes with the CONCURRENTLY modifier is
currently blocked, due to unclear semantics regarding concurrent
creation over many tables, including how to deal with snapshots.
Previously, when issued on hypertable, database maintenance
commands, like VACUUM and REINDEX, only affected the main
table and did not recurse to chunks.
This change fixes that issue, allowing database maintainers
to issue single commands on hypertables that affect all
the data stored in the hypertable.
These commands (VACUUM, REINDEX) only work at the table level
for hypertables. If issued at other levels, e.g., schema, or
database, the behavior is the same as in standard PostgreSQL
as all tables are covered by default.
REINDEX commands that specify a hypertable index do not
recurse as that requires mapping the hypertable
index to the corresponding index on the chunk. This might
be fixed in a future update.
Also in this commit:
- Rename time/space to open/closed for more generality.
- Create a Point data type for mapping a tuple to an
N-dimensional space.
- Numerous fixes and cleanups.
This is the first stab at updating the table and data type
definitions in the catalog module in the C code. This also
adds functions for natively scanning the dimension and
dimension_slice tables.
Clean up the table schema to get rid of legacy tables and functionality
that makes it more difficult to provide an upgrade path.
Notable changes:
* Get rid of legacy tables and code
* Simplify directory structure for SQL code
* Simplify table hierarchy: remove root table and make chunk tables
* inherit directly from main table
* Change chunk table suffix from _data to _chunk
* Simplify schema usage: _timescaledb_internal for internal functions.
* _timescaledb_catalog for metadata tables.
* Remove postgres_fdw dependency
* Improve code comments in sql code
This PR disables query optimizations on regular tables by default.
The option timescaledb.optimize_plain_tables = 'on' enables them
again. timescaledb.disable_optimizations = 'on' disables all
optimizations (note the change from 'true' to 'on').
DROP EXTENSION didn't properly reset caches and other saved state
causing various errors related to bad state when the extension was
dropped and/or recreated later.
This patch adds functionality to track the state of the extension and
also signals DROP EXTENSION to other backends that might be running,
allowing them to reset their internal extension state.
The hypertable cache stores negative cache entries to speed up checks
for tables that aren't hypertables. However, a full hypertable cache
entry was allocated for these negative entries, thus wasting cache
storage. With this update, the full entry is only allocated for
positive entries that actually represent hypertables.
Previously, the planner used a direct query via the SPI interface to
retrieve metadata info needed for query planner functions like query
rewriting. This commit updates the planner to use our caching system.
This is a performance improvement for pretty much all operations,
both data modifications and queries.
For hypertables, this added a cache keyed by the main table OID and
added negative entries (because the planner often needs to know if a
table is /not/ a hypertable).
This patch continues work to consolidate catalog information
and definitions to the catalog module.
It also refactors the naming of some data types to adhere
to camelcase naming convention (Hypertable, PartitionEpoch).
Previously, chunk replicas were retreived with an SPI query. Now, all
catalog items are retrieved with native scans, with the exception of
newly created chunks.
This commit also refactors the chunk (replica) cache, removing some
data structures that were duplicating information. Now chunks are
cached by their ID (including their replicas) instead of just the
set of replicas. This removes the need for additional data structures,
such as the replica set which looked like a chunk minus time info,
and the cache entry wrapper struct. Another upside is that chunks
can be retrieved from the cache directly by ID.
This change is a performance improvement. Previously each insert called
a plpgsql function to check if there is a need to close the chunk. This
patch implements a c-only fastpath for the case when the table size is
less than the configured chunk size.
The chunk catalog table is now scanned with a native
scan rather than SPI call.
The scanner module is also updated with the option of
of taking locks on found tuples. In the case of chunk
scanning, chunks are typically returned with a share
lock on the tuple.
Aborting scans can be useful when scanning large tables or indexes and
the scan function knows that a condition has been met (e.g., all
tuples needed have been found). For instance, when scanning for
hypertable partitions, one can abort the scan when the found tuple
count equals the number of partitions in the partition epoch.
This patch refactors the code to use native heap/index scans for
finding partition epochs and partitions. It also moves the
partitioning-related code and data structures to partitioning.{c,h}.
This is a first stab at moving from SPI queries
to direct heap/index scans for cleaner and more
efficient code. By doing direct scans, there is no
need to prepare and cache a bunch of query plans
that make the code both slower and more complex.
This patch also adds a catalog module that keeps
cached OIDs and other things for catalog tables.
The cached information is updated every time the
backend switches to a new database. A permission
check is also implemented when accessing the catalog
information, but should probably be extended to
tables and schemas in the future.