Improve the performance of metadata scanning during hypertable
expansion.
When a hypertable is expanded to include all children chunks, only the
chunks that match the query restrictions are included. To find the
matching chunks, the planner first scans for all matching dimension
slices. The chunks that reference those slices are the chunks to
include in the expansion.
This change optimizes the scanning for slices by avoiding repeated
open/close of the dimension slice metadata table and index.
At the same time, related dimension slice scanning functions have been
refactored along the same line.
An index on the chunk constraint metadata table is also changed to
allow scanning on dimension_slice_id. Previously, dimension_slice_id
was the second key in the index, which made scans on this key less
efficient.
Harden core APIs by adding the `const` qualifier to pointer parameters
and return values passed by reference. Adding `const` to APIs has
several benefits and potentially reduces bugs.
* Allows core APIs to be called using `const` objects.
* Callers know that objects passed by reference are not modified as a
side-effect of a function call.
* Returning `const` pointers enforces "read-only" usage of pointers to
internal objects, forcing users to copy objects when mutating them
or using explicit APIs for mutations.
* Allows compiler to apply optimizations and helps static analysis.
Note that these changes are so far only applied to core API
functions. Further work can be done to improve other parts of the
code.
GROUP BY aggregates are now fully pushed down in more cases.
1. GROUP BY expressions that cover all partitioning dimensions are
always fully pushed down. This is safe to do regardless of
repartitioning issues.
2. All GROUP BYs on (just) the closed "space" dimension will be fully
pushed down as long as no slices in that dimension overlap across
servers.
3. GROUP BYs that include a bucketing expression on time (e.g.,
`date_trunc` or `time_bucket`) will be fully pushed down like case
(1), as bucketing expressions are now treated as a "compatible"
partitioning key.
4. GROUP BYs are always pushed down--irrespective of partitioning--if
only one server is involved in the query.
Special handling has been implemented for pushing down bucketing
functions like `date_trunc`. The first parameter of the `date_trunc`
functions are collatable, which normally prohibits push down. This has
been handled specifically for bucketing functions until a more
cohesive handling of collations across servers has been
implemented. Further, the `date_trunc(text, timestamptz)` function is
marked STABLE, due to the timezone-dependent second parameter, which
also prohibits push-down. This has been handled by implementing our
own `contain_mutable_functions` check that filters these functions,
thus allowing push down. Note, to make this safe, we have to guarantee
that the connection session uses the same time zone setting as the
access node. NOTE, however, that we currently do not handle changes to
time zone within a session.
A couple of costing issues for server rels have also been fixed.
This optimization will replace the MergeAppendPath for queries on
hypertables ordered by the time partitioning column and with a
LIMIT clause with an ordered AppendPath. This optimization will
remove the need for last point queries to access every chunk of
a hypertable.
This commit also adds struct TimescaleDBPrivate which is stored
in RelOptInfo->fdw_private to store TimescaleDB-specific plan state
between different planner hook invocations in the plan.
We needed to add TimescaleDBPrivate to store a flag indicating
whether or not to use ordered append between different parts of
the planner.
Future proofing: if we ever want to make our functions available to
others they’d need to be prefixed to prevent name collisions. In
order to avoid having some functions with the ts_ prefix and
others without, we’re adding the prefix to all non-static
functions now.
If IN/ANY/ALL operator is used with explicit values
we can effectively restrict chunks that need to be scanned.
Here are some examples of supported queries:
- SELECT * FROM hyper_with_space_dim WHERE time < 10 AND device_id IN ('dev5','dev6','dev7','dev8');
- SELECT * FROM hyper_with_space_dim WHERE device_id = ANY(ARRAY['dev5','dev6']) AND device_id = ANY(ARRAY['dev6','dev7']);
There are som cases that are not optimized:
- subqueries within IN/ANY/ALL
- open dimension (eg. time) when using IN/ANY with multiple args
- NOT operator
SubspaceStore keeps a running count of the number of objects added to
it called `descendants`. This patch fixes that count, so that it always
keeps track of the number of objects sitting at the leaves of the
SubspaceStore. (The current version treats `descendants` as keeping
track of the number of leaves at some places, and the number of objects
sitting at the next level at others, resulting in the counter containing
neither.
Also fixes UB in dimension vector: memcpy cannot be used on overlapping memory.
We add better accounting for number of items stored in a subspace
to allow better pruning. Instead of pruning based on the number of
dimension_slices in subsequent dimensions we now track number of total
items in the subspace store and prune based on that.
We add two GUC variables:
1) max_open_chunks_per_insert (default work_mem in bytes / 512. This
assumes an entry is 512 bytes)
2) max_cached_chunks_per_hypertable (default 100). Maximum cached chunks per
hypertable.
This change is part of an effort to create a consistent way
of dealing with metadata catalog updates, which is currently
a mix of C API and INSERT/UPDATE/DELETE statements from SQL
code. This mix makes catalog handling unnecessarily complex as
there are multiple ways to update metadata, increasing the risk
of security issues with publically exposed SQL functions. It also
complicates things like cache invalidation, requiring different
mechanisms for C and SQL code. Catalog updates from SQL code
require triggers on metadata tables for cache invalidation that
do not work with native catalog updates.
The creation of chunks has been particularly messy in this regard,
making the code hard to follow. Especially the handling of a chunk's
constraints, where dimensional and other constraints were handled
differently. With this change, constraint handling is now consistent
across constraint types with a single API for updating metadata.
Reduce memory usage for out-of-order inserts
The chunk_result_relation_info should be put on the chunk memory
context. This will cause the rri constraint expr to also go onto
that context and be correctly freed when the chunk insert state
is destroyed.
We now use INT64_MAX and INT64_MIN as the max and min values for
dimension_slice ranges. If a dimension_slice has a range_start of
INT64_MIN or the range_end is INT64_MAX, we remove the corresponding
check constraint on the chunk since it signifies that this end of the
range is infinite. Closed ranges now always have INT64_MIN as range_end
of first slice and range_end of INT64_MAX for the last slice.
Also, points corresponding to INT64_MAX are always
put in the same slice as INT64_MAX-1 to avoid problems with the
semantics that coordinate < range_end.
When new chunks are created, the calculated chunk hypercube might
collide or not align with existing chunks when partitioning has
changed in one or more dimensions. In such cases, the chunk should be
cut to fit the alignment criteria and any collisions should be
resolved. Unfortunately, alignment and collision detection wasn't
properly handled.
This refactoring adds proper axis-aligned bounding box collision
detection generalized to N dimensions. It also correctly handles
dimension alignment.