If the joininfo for a rel is not available, the index path
cannot compute the correct filters for parameterized paths
as the RelOptInfo's ppilist is setup using information
from the joininfo.
Fixes 1558
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).
If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
Allow dropping raw chunks on the raw hypertable while keeping
the continuous aggregate. This allows for downsampling data
and allows users to save on TCO. We only allow dropping
such data when the dropped data is older than the
`ignore_invalidation_older_than` parameter on all the associated
continuous aggs. This ensures that any modifications to the
region of data which was dropped should never be reflected
in the continuous agg and thus avoids semantic ambiguity
if chunks are dropped but then again recreated due to an
insert.
Before we drop a chunk we need to make sure to process any
continuous aggregate invalidations that were registed on
data inside the chunk. Thus we add an option to materialization
to perform materialization transactionally, to only process
invalidations, and to process invalidation only before a timestamp.
We fix drop_chunks and policy to properly process
`cascade_to_materialization` as a tri-state variable (unknown,
true, false); Existing policy rows should change false to NULL
(unknown) and true stays as true since it was explicitly set.
Remove the form data for bgw_policy_drop_chunk because there
is no good way to represent the tri-state variable in the
form data.
When dropping chunks with cascade_to_materialization = false, all
invalidations on the chunks are processed before dropping the chunk.
If we are so far behind that even the completion threshold is inside
the chunks being dropped, we error. There are 2 reasons that we error:
1) We can't safely process new ranges transactionally without taking
heavy weight locks and potentially locking the entire sytem
2) If a completion threshold is that far behind the system probably has
some serious issues anyway.
Reorder the group by clause to match the ordering of the
ORDER BY by clause. SELECTs on continuous aggregate queries
often have an order by clause on the time_bucket column.
If the order-by does not match the grouping clause,
the planner inserts an additional Sort node after
the view evaluation (Aggregate node).
preprocess_groupclause does this reordering if the ORDER BY clause
is part of the current subquery being processed. But when we have
a continuous aggregate view, the order by needs to be derived
from the outer query. This PR allows the order by from
outer query to propagate to group clause of continuous aggr. view.
The rewrite is :
select * from (select a, b, max(c), min(d) from ...group by a, b)
order by b;
is transformed as
SELECT * from (select a, b, max(c), min(d) from .. group by b, a
) order by b;
The locf treat_null_as_missing option would not trigger the lookup
query if there was a row for the first bucket and value in that row
was NULL. This patch fixes the behaviour and triggers the lookup
query for the first row too.
Previously, the Chunk struct was used to represent both a full
chunk and the stub used for joins. The stub used for joins
only contained valid values for some chunk fields and not others.
After the join determined that a Chunk was complete, it filled
in the rest of the chunk field. The fact that a chunk could have
only some fields filled out and not others at different times,
made the code hard to follow and error prone.
So we separate out the stub state of the chunk into a separate
struct that doesn't contain the not-filled-out fields inside
of it. This leverages the type system to prevent errors that
try to access invalid fields during the join phase and makes
the code easier to follow.
Removes duplicate call to setup_append_rel_array and avoids allocating
another append_rel_array with the same values during planning queries
with hypertables.
We added a timescaledb.ignore_invalidation_older_than parameter for
continuous aggregatess. This parameter accept a time-interval (e.g. 1
month). if set, it limits the amount of time for which to process
invalidation. Thus, if
timescaledb.ignore_invalidation_older_than = '1 month'
then any modifications for data older than 1 month from the current
timestamp at insert time will not cause updates to the continuous
aggregate. This limits the amount of work that a backfill can trigger.
This parameter must be >= 0. A value of 0 means that invalidations are
never processed.
When recording invalidations for the hypertable at insert time, we use
the maximum ignore_invalidation_older_than of any continuous agg attached
to the hypertable as a cutoff for whether to record the invalidation
at all. When materializing a particular continuous agg, we use that
aggs ignore_invalidation_older_than cutoff. However we have to apply
that cutoff relative to the insert time not the materialization
time to make it easier for users to reason about. Therefore,
we record the insert time as part of the invalidation entry.
On older point releases (e.g. 10.2) the step size in isolation
tests is smaller leading to "SQL step too long" errors. This
PR splits up the setup step to avoid this error.
continuous aggregate views like
select time_bucket(), sum(col)
from ...
group by time_bucket(), grpcol;
when grpcol is missing from the select targetlist, the
partialize query's select targetlist is incorrect and the view
cannot be materialized. This PR fixes this issue.
This change the continuous aggregate materialization logic so that
the max_interval_per_job applies to invalidation entries as well
as new ranges in the materialization. The new logic is that the
MIPJ setting limits the sum of work done by the invalidations
and new ranges. Invalidations take precedence so new ranges
are only processed if there is time left over in the MIPJ
budget after all invalidations are done.
This forces us to calculate the invalidation range during the first
transaction. We still delete and/or cut the invalidation entries
in the second transaction. This change also more neatly separates concerns:
all decisions on work to be done happens in the first txn while only
execution happens in the second. Further refactoring could make
this more clear by passing a list of InternalRanges to represent the
work. But this PR is big enough, so that's left to a future refactor.
Note: There is remaining work to be done in breaking up invalidation
entries as created during inserts to constrain the length of the entries.
But that's a separate issue to be addressed in the future.
Refactor the continuous aggregate validation to use our function cache
to check for bucketing function. This simplifies the code and allows
adding support for other bucketing functions like date_trunc later on.
In some cases _temp variable will not be set due to pg_config not
returning any output for a specific flag. This results in an
error when doing comparison using STREQUAL and build failure.
Wrapping variable in double quotes fixes the problem.
Previously, refresh_lag in continuous aggs was calculated
relative to the maximum timestamp in the table. Change the
semantics so that it is relative to now(). This is more
intuitive.
Requires an integer_now function applied to hypertables
with integer-based time dimensions.
This maintenance release contains bugfixes since the 1.5.0 release. We deem it low
priority for upgrading.
In particular the fixes contained in this maintenance release address potential
segfaults and no other security vulnerabilities. The bugfixes are related to bloom
indexes and updates from previous versions.
**Bugfixes**
* #1523 Fix bad SQL updates from previous updates
* #1526 Fix hypertable model
* #1530 Set active snapshots in multi-xact index create
**Thanks**
* @84660320 for reporting an issue with bloom indexes
Type functions have to be CREATE OR REPLACED on every update
since they need to point to the correct .so. Thus,
split the type definitions into a pre, functions,
and post part and rerun the functions part on both
pre_install and on every update.
Set active snapshots when creating txns during index
create with timescaledb.transaction_per_chunk. This
is needed for some index types like `bloom`.
Tests not added since we don't want dependencies on contrib modules
like bloom.
Fixes#1521.
1. This commit introduces changes to existing plans due
to the addition of new chunks to metrics_ordered_idx.
2. Add tests for constraint aware appends on compressed
tables.
The update logic from 1.4.2 to 1.5.0 had an error where
the _timescaledb_catalog.hypertable table was altered in such
a way that the table was not re-written. This causes
bugs in catalog processing code. A CLUSTER rewrites the
table. We also backpatch this change to the 1.4.2--1.5.0
script to help anyone building from source.
Also fixes a similar error on _timescaledb_catalog.metadata
introduced in the 1.3.2--1.4.0 update.
PG11 added an optimization where columns that were added by
an ALTER TABLE that had a DEFAULT value did not cause a table
re-write. Instead, those columns are filled with the default
value on read.
But, this mechanism does not apply to catalog tables and does
not work with our catalog scanning code. This tests makes
sure we never have such alters in our updates.
The construct used for pushing down produces a warning on certain
older compiler, so while it was correct this patch changes it to
get rid of the warning and to prevent introducing an imbalance later.
The `test_sanitizer.sh` test failed because source code was being
copied from the host to the container as user `postgres` and this user
did not have read permissions on the mounted directory. This is fixed
by copying the files as `root` and then changing the owner to
`postgres`.
The commit also removes `wait_for_pg` since PostgreSQL server status is
not relevant for the tests since they start their own temporary
instance.
The commit also switches to use here-is documents for the execution for
readability purposes.
The main reason to run ARM tests was not to identify issues with
ARM but to identify 32 bit issues e.g. int64 as pointer instead
type by value. Those issues don't need ARM emulation but can be
tested with i386 which is much faster.
Fix tests that fail like so:
test=# CREATE CAST (customtype AS bigint)
test-# WITHOUT FUNCTION AS ASSIGNMENT;
ERROR: source and target data types are not physically compatible
A previous change made `UNIX` and `APPLE` build flags mutually
exclusive instead of complementary. This broke builds on, e.g., Mac OS
X.
The changes in this commit will make builds work on Mac OS X again.
When linking the extensions as shared libraries, the linker flags from
`pg_config` is not used. This means that if `PG_PATH` is provided and
refer to a locally compiled Postgres installation, shared libraries
from that installation will not be used. Instead any default-installed
version of Postgres will be used.
This commit adds `PG_LDFLAGS` to `CMAKE_SHARED_LINKER_FLAGS` and
`CMAKE_MODULE_LINKER_FLAGS`.
To handle that Windows set some fields to "not recorded" when they are
not available, it introduces a CMake function `get_pg_config` that will
replace it with `<var>-NOTFOUND` so that it is treated as undefined by
CMake.
This release adds major new features and bugfixes since the 1.4.2 release.
We deem it moderate priority for upgrading.
This release adds compression as a major new feature.
Multiple type-specific compression options are available in this release
(including DeltaDelta with run-length-encoding for integers and
timestamps; Gorilla compression for floats; dictionary-based compression
for any data type, but specifically for low-cardinality datasets;
and other LZ-based techniques). Individual columns can be compressed with
type-specific compression algorithms as Postgres' native row-based format
are rolled up into columnar-like arrays on a per chunk basis.
The query planner then handles transparent decompression for compressed
chunks at execution time.
This release also adds support for basic data tiering by supporting
the migration of chunks between tablespaces, as well as support for
parallel query coordination to the ChunkAppend node.
Previously ChunkAppend would rely on parallel coordination in the
underlying scans for parallel plans.
Histogram's combine function threw a segfault if both state1
and state2 were NULL. I could only reproduce this case in
PG 10. Add a tests that hits this with PG 10.4
Fixes#1490
When restoring a database, people would encounter errors if
the restore happened after telemetry has run. This is because
a 'exported_uuid' field would then exist and people would encounter
a "duplicate key value" when the restore tried to overwrite it.
We fix this by moving this metadata to a different key
in pre_restore and trying to move it back in post_restore.
If the restore create an exported_uuid, that restored
value is used and the moved version is simply deleted
We also remove the error redirection in restore so that errors
will show up in tests in the future.
Fixes#1409.
Several fixes:
- Change incorrect variable name in CmakeLists that prevented tests
from running.
- Add a PG 10.10 test to codecov
- Remove unused CODECOV_FLAGS in travis.yml
The following fields are added:
-num_compressed_hypertables
-compressed_KIND_size
-uncompressed_KIND_size
Where KIND = heap, index, toast.
`num_hypertables` field does NOT count the internal hypertables
used for compressed data.
We also removed internal continuous aggs tables from the
`num_hypertables` count.
We want compressed data to be stored out-of-line whenever possible so
that the headers are colocated and scans on the metadata and segmentbys
are cheap. This commit lowers toast_tuple_target to 128 bytes, so that
more tables will have this occur; using the default size, very often a
non-trivial portion of the data ends up in the main table, and only
very few rows are stored in a page.