Error message used to specify that interval must be defined in terms
of days or smaller, which was confusing because we really meant any
fixed interval (e.g., weeks, days, hours, minutes, etc.), but not an
interval that is not of fixed duration (e.g., months or years).
Some small improvements:
- allow alter table with empty segment by if the original definition
had an empty segment by. Improve error msgs.
- block compression on tables with OIDs
- block compression on tables with RLS
This patch adds support for producing ordered output. All
segmentby columns need to be prefix of pathkeys and the orderby
specified for the compression needs exactly match the rest of
pathkeys.
The get_function_oid function was a reimplementation of PostgreSQL
LookupFuncName. This patch removes the function and switches
all callers to use LookupFuncName instead.
The microsoft compiler can't figure out that elog(ERROR) doesn't
return and warns about functions not returning a value in all code
paths. This patch adds pg_unreachable calls to those functions.
When ordered append tried to push down targetlist to child paths
it assumed childs would be scans on rels which is not true for
space partitioning where children might be MergeAppend nodes.
This patch also no longer applies the ordered append optimization
to partial paths because its not safe to do so.
This patch also adds more tests for space partitioned hypertables.
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.
The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
int64 should be passed to functions that take a Datum parameter using Int64GetDatum.
Depending on the platform, postgres either passes int64 by value or allocs a pointer
to hold this value.
Without this change, we get SEGV on raspberry pi.
This commit switches the remaining JOIN in the continuous_aggs_stats
view to LEFT JOIN. This way we'll still see info from the other columns
even when the background worker has not run yet.
This commit also switches the time fields to output text in the correct
format for the underlying time type.
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.
INSERT path:
On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
invalidated (that is, newly inserted, updated, or deleted) to
_timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
will be used to re-materialize these ranges, to ensure that the aggregate
is up-to-date. Currently these invalidations are recorded in by a trigger
_timescaledb_internal.continuous_agg_invalidation_trigger, which should be
added to the hypertable when the continuous aggregate is created. This trigger
stores a cache of min/max values per-hypertable, and on transaction commit
writes them to the log, if needed. At the moment, we consider them to always
be needed, unless we're in ReadCommitted mode or weaker, and the min
invalidated value is greater than the hypertable's invalidation threshold
(found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)
Materialization path:
Materialization currently happens in multiple phase: in phase 1 we determine
the timestamp at which we will end the new set of materializations, then we
update the hypertable's invalidation threshold to that point, and finally we
read the current invalidations, then materialize any invalidated rows, the new
range between the continuous aggregate's completed threshold (found in
_timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
invalidation threshold. After all of this is done we update the completed
threshold to the invalidation threshold. The portion of this protocol from
after the invalidations are read, until the completed threshold is written
(that is, actually materializing, and writing the completion threshold) is
included with this commit, with the remainder to follow in subsequent ones.
One important caveat is that since the thresholds are exclusive, we invalidate
all values _less_ than the invalidation threshold, and we store timevalue
as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
invalidated. To avoid this problem, we never materialize the time bucket
containing PG_INT64_MAX.
Before this PR only SELECTs would be optimized to exclude unneeded
chunks by our planner. This PR enables such optimizations on SELECTs
found within an INSERT as well. This should speed up commands of the
form
INSERT INTO <hypertable> (SELECT ... FROM <hyepertable> WHERE ...)
We would like to enable this for all commands, but currently DELETE and
UPDATE can not handle them, and cause errors when the optimizations are
enabled.
This commit also fixes an issue that would occur if we tried to exclude
chunks based off of infinite time values.
This patch adds support for chunk exclusion for time_bucket
expressions in the WHERE clause. The following transformation
is done when building RestrictInfo:
Transform time_bucket calls of the following form in WHERE clause:
time_bucket(width, column) OP value
Since time_bucket always returns the lower bound of the bucket
for lower bound comparisons the width is not relevant and the
following transformation can be applied:
time_bucket(width, column) > value
column > value
Example with values:
time_bucket(10, column) > 109
column > 109
For upper bound comparisons width needs to be taken into account
and we need to extend the upper bound by width to capture all
possible values.
time_bucket(width, column) < value
column < value + width
Example with values:
time_bucket(10, column) < 100
column < 100 + 10
This allows chunk exclusions to work for views with aggregations.
We find ourselves needing to store intervals (specifically time_bucket widths) in
upcoming PRs, so this commit adds that functionality, along with tests that we
perform the conversion in a sensible, round-tripa-able, manner.
This commit fixes a longstanding bug in plan_hashagg where negative time values
would prevent us from using a hashagg. The old logic for to_internal had a flag
that caused the function to return -1 instead of throwing an error, if it could
not perform the conversion. This logic was incorrect, as -1 is a valid time val
The new logic throws the error uncoditionally, and forces the user to CATCH it
if they wish to handle that case. Switching plan_hashagg to using the new logic
fixed the bug.
The commit adds a single SQL file, c_unit_tests.sql, to be the driver for all such
pure-C unit tests. Since the tests run quickly, and there is very little work to
be done at the SQL level, it does not seem like each group of such tests requires
their own SQL file.
This commit also upates the test/sql/.gitignore, as some generated files were
missing.
In some cases user might already know what chunks need to be scanned to answer
a particular query. Using `chunks_in` function we can skip calculating chunks
involved in particular query which should result in better performances as well.
A simple example:
`SELECT * FROM hypertable WHERE chunks_in(hypertable, ARRAY[1,2])`
Something is causing a heap corruption upon setting the license key to
default when we try to use the guc extra on windows. For now stop using
it and just rerun the validation function, if we get to the assign hook
we must have a valid key, so it will never fail.
Also Fixes error message on windows;
turns out windows does not like to print NULL strings.
Don't do that.
Fixes other minor windows bugs.
Remove the following unused functions:
_timescaledb_internal.to_microseconds(TIMESTAMPTZ)
_timescaledb_internal.to_timestamp_pg(BIGINT)
_timescaledb_internal.time_to_internal(anyelement)
Introduce PG11 support by introducing compatibility functions for
any whose signatures have changed in PG11. Additionally, refactor
the structure of the compatibility functions found in compat.h by
breaking them out by function (or small set of similar functions)
so that it is easier to see what changed between versions and maintain
changes as more versions are supported.
In general, the philosophy has been to try for forward compatibility
wherever possible, so that we use the latest versions of function interfaces
where we can or where reasonably convenient and mimic the behavior
in older versions as much as possible.
If possible replace aggregate functions FIRST/LAST with subqueries of the form
(SELECT value FROM table WHERE sort IS NOT NULL AND existing-quals ORDER BY sort ASC/DESC
LIMIT 1).
Given a suitable index on sort column, this plan can be much faster then scanning all the
rows and running an aggregate function.
The optimization can't be performed if:
- query uses GROUP BY or WINDOW function
- query contains CTEs
- query contains other aggregate functions (eg. Combining MIN/MAX with FIRST/LAST. We can't
optimize accross different aggregate functions)
- query uses JOIN
- FIRST/LAST used in ORDER BY
Optimization also works with subqueries, or if FIRST/LAST is used in CTE subquery.
In order to standardize existing FIRST/LAST aggregate function with PostgreSQL and
FIRST/LAST optimization, we exclude NULL values in sort by column.
Future proofing: if we ever want to make our functions available to
others they’d need to be prefixed to prevent name collisions. In
order to avoid having some functions with the ts_ prefix and
others without, we’re adding the prefix to all non-static
functions now.
Timescale provides an efficient and easy to use api to drop individual
chunks from timescale database through drop_chunks. This PR builds on
that functionality and through a new show_chunks function gives the
opportunity to see the chunks that would be dropped if drop_chunks was run.
Additionally, it adds a newer_than option to drop_chunks (also supported
by show_chunks) that allows to see/drop chunks in an interval or newer
than a point in time.
This commit includes:
- Implementation of show_chunks in C
- Additional helper functions to work with chunks
- New version of drop_chunks in sql that uses show_chunks. This
also adds a newer_than option to drop_chunks
- More enhanced tests of drop_chunks and new tests for show_chunks
Among other reasons, show_chunks was implemented in C in order
to be able to have both older_than and newer_than arguments be null. This
was not possible in SQL because the arguments had to have polymorphic types
and whether they are used in function body or not, PL/pgSQL requires these
arguments to typecheck.
Refactored the boilerplate that allocates and copies over data from a tuple to a struct. This is typically used in the scanner context in order to read rows from a SQL table in C.
Macro is used for 2 reasons:
1) It's more correct in that it doesn't mix Timestamp and TimestampTz
types. There is no implicit conversion of the two beneath the hood.
2) It is slightly faster as it avoid an extra function call. This
is a very performance sensitive function for OLAP queries.
Since Monday is the ISO start of the week, it makes sense to move
the time_bucket epoch to start on a Monday. Before the epoch was the
same as the Postgres epoch (2000-01-01, a Saturday).
We've decided to adopt the ts_ prefix on all exported C functions in
order to avoid having symbol conflicts with future postgres functions.
We've already started using this prefix on new functions and this commit
adds the prefix to to the old functions.
Users can now (optionally) set a target chunk size and TimescaleDB
will try to adapt the interval length of the first open ("time")
dimension in order to reach that target chunk size. If a hypertable
has more than one open dimension, only the first one will have a
dynamically adapting interval.
Users can optionally specify their own function that calculates the
new dimension interval. They can also set a target size of 0 in order
to estimate a suitable target size for a chunk based on available
memory.
This PR fixes all the formatting to be inline with the latest version of
pgindent. Since pgindent does not like variables named `type`, those
have been appropriately renamed.
This optimization adds a HashAggregate plan to many group by queries.
In plain postgres, many time-series queries will not use the hash
aggregate because the planner will incorrectly assume that the number of
rows is much larger than it actually is and will use the less efficient
GroupAggregate instead of a HashAggregate to prevent running out of
memory.
The planner will assume a large number of rows because the statistics
planner for grouping assumes that the number of distinct items produced
by a function is the same as the number of distinct items going in. This
is not true for functions like time_bucket and date_trunc. This
optimization fixes the statistics and add the HashAggregate plan if
appropriate.
The statistics now rely on evaluating the spread of a variable and
dividing it by the interval in the time_bucket or date_trunc. This is
still an overestimate of the total number of groups but is better than
before. A further improvement on this will be to evaluate the quals
(WHERE clauses) on the query to try to derive a tighter spread on the
variable. This is left to a future optimization.
The functions for adding and updating dimensions have been refactored
in C to:
- improve usage of proper error codes
- make messages that better conform with the PostgreSQL standard.
- improve security by avoiding that lots of code run under SECURITY DEFINER
A new if_not_exists option has also been added to add_dimension() and
a the number of partitions can now be set using the new
set_number_partitions() function.
A bug in the validation of smallint time intervals has been fixed. The
previous code didn't check for intervals > 0 and smallint intervals
accepted values up to UINT16_MAX instead of INT16_MAX.
Source code indentation has been updated in PostgreSQL 10 to fix a
number of issues. This update applies this new indentation to the
entire code base.
The new indentation requires a new version of pg_bsd_indent, which can
be found here:
https://git.postgresql.org/git/pg_bsd_indent.git
Windows 64-bit binaries should now be buildable using the cmake
build system either from the command line or from Visual Studio.
Previous issues regarding unresolved symbols have been resolved
with compatibility header files to properly export symbols or
getting GUCs via normal APIs.
reindex allows you to reindex the indexes of only certain chunks,
filtering by time. This is a common use case because a user may
want to reindex chunks after they are no longer getting new data once.
reindex also has a recreate option which will not use REINDEX
but will rather CREATE INDEX a new index and then
DROP INDEX / RENAME new_index to old_name. This approach has advantages
in terms of blocking reads for a much shorter period of time. However,
it does more work and will use more disk space during the operation.