There is a security loophole in current core Postgres, due to which
it's possible for a non-superuser to gain superuser access by attaching
dependencies like expression indexes, triggers, etc. before logical
replication commences.
To avoid this, we now ensure that the chunk objects that get created
for the subscription are done so as a superuser. This avoids malicious
dependencies by regular users.
When calling the `cagg_watermark` function to get the watermark of a
Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query
in the underlying materialization hypertable.
The problem is that a `SELECT MAX(time_dimention)` query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.
Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
- Create CAgg: store the minimum value of hypertable time dimension
data type;
- Refresh CAgg: store the last value of the time dimension materialized
in the underlying materialization hypertable (or the minimum value of
materialization hypertable time dimension data type if there's no
data materialized);
- Drop CAgg Chunks: the same as refresh cagg.
Closes#4699, #5307
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.
Fixes#309
No functional changes, mostly just reshuffles the code to prepare for
batch decompression.
Also removes unneeded repeated column value stores and ExecStoreTuple,
to save 3-5% execution time on some queries.
The functions `PQconndefaults` and `PQmakeEmptyPGresult` calls
`malloc` and can return NULL if it fails to allocate memory for the
defaults and the empty result. It is checked with an `Assert`, but this
will be removed in production builds.
Replace the `Assert` with an checks to generate an error in production
builds rather than trying to de-reference the pointer and cause a
crash.
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
Reindexing a relation requires AccessExclusiveLock which prevents
queries on that chunk. This patch changes decompress_chunk to update
the index during decompression instead of reindexing. This patch
does not change the required locks as there are locking adjustments
needed in other places to make it safe to weaken that lock.
This patch changes the way user-defined FDW options (e.g., startup
costs, per-tuple costs) are handled. So far, these values were retrieved
in apply_fdw_and_server_options() but reset to default values afterward.
Problem:
When the guc timescaledb.license = 'timescale' is set in the conf file
and a SIGHUP is sent to postgress process and a reload of the tsl
module is triggered.
This reload happens in 2 phases 1. tsl_module_load is called which
will load the module only if not already loaded and 2.The
ts_module_init is called for every ts_license_guc_assign_hook
irrespective of if it is new load.This ts_module_init initialization
function also registers a on_proc_exit function to be called on exit.
The list of on_proc_exit methods are maintained in a fixed array
on_proc_exit_list of size MAX_ON_EXITS (20) which gets filled up on
repeated SIGHUPs and hence an error.
Fix:
The fix is to make the ts_module_init() register the on_proc_exit
callback, only in case the module is reloaded and not in every init
call.
Closes#5233
The commit 96574a7 changes the handling of the file_trailer_received
flag. It is now only used in asserts and not in any other kind of logic.
This patch encapsulates the file_trailer_received in a
USE_ASSERT_CHECKING macro.
The copy fetcher fetches tuples in batches. When the last element in the
batch is the file trailer, the trailer was not handled correctly. The
existing logic did not perform a PQgetCopyData in that case. Therefore
the state of the fetcher was not set to EOF and the copy operation was
not correctly finished at this point.
Fixes: #5323
WHERE clause with SEGMENTBY column of type text/bytea
non-equality operators are not pushed down to Seq Scan
node of compressed chunk. This patch fixes this issue.
Fixes#5286
Make `partialize_agg()` support parallel query execution. To make this
work, the finalize node need combine the individual partials from each
parallel worker, but the final step that turns the resulting partial
into the finished aggregate should not happen. Thus, in the case of
distributed hypertables, each data node can run a parallel query to
compute a partial, and the access node can later combine and finalize
these partials into the final aggregate. Esssentially, there will be
one combine step (minus final) on each data node, and then another one
plus final on the access node.
To implement this, the finalize aggregate plan is simply modified to
elide the final step, and to reserialize the partial. It is only
possible to do this at the plan stage; if done at the path stage, the
PostgreSQL planner will hit assertions that assume that the node has
certain values (e.g., it doesn't expect combine Paths to skip the
final step).
Previously we used date_part("epoch", interval) and integer division
internally to determine whether the top cagg's interval is a
multiple of its parent's.
This led to precision loss and wrong results
in the case of intervals with sub-second components.
Fixed by using the `ts_interval_value_to_internal` function to convert
intervals to appropriate integer representation for division.
Fixes#5277
Concurrent insert into dist hypertable after a data node marked as
unavailable would produce 'tuple concurrently deleted` error.
The problem occurs because of missing tuple level locking during
scan and concurrent delete from chunk_data_node table afterwards,
which should be treated as `SELECT … FOR UPDATE` case instead.
Based on the fix by @erimatnor.
Fix#5153
When a Continuous Aggregate is created the `chunk_interval_size` is
defined my the `chunk_interval_size` of the original hypertable
multiplied by a fixed factor of 10.
The problem is currently when we create a Hierarchical Continuous
Aggregate the same factor is applied and it lead to an exponential
`chunk_interval_size`.
Fixed it by just copying the `chunk_interval_size` from the base
Continuous Aggregate for an Hierachical Continuous Aggreagate.
Fixes#5382
We added checks via #4846 to handle DML HA when replication factor is
greater than 1 and a datanode is down. Since each insert can go to a
different chunk with a different set of datanodes, we added checks
on every insert to check if DNs are unavailable. This increased CPU
consumption on the AN leading to a performance regression for RF > 1
code paths.
This patch fixes this regression. We now track if any DN is marked as
unavailable at the start of the transaction and use that information to
reduce unnecessary checks for each inserted row.
Make the copy fetcher more asynchronous by separating the sending of
the request for data from the receiving of the response. By doing
that, the async append node can send the request to each data node
before it starts reading the first response. This can massively
improve the performance because the response isn't returned until the
remote node has finished executing the query and is ready to return
the first tuple.
Renamed:
tsl/test/sql/size_utils.sql
tsl/test/expected/size_utils.out
To:
tsl/test/sql/size_utils_tsl.sql
tsl/test/expected/size_utils_tsl.out
because conflicting with test/sql/size_utils.sql
At the moment, the MERGE command is not supported on distributed
hypertables. This patch ensures that the join pushdown code ignores the
invocation by the MERGE command.
Different num_chunks values reported by
timescaledb_information.hypertables and
timescaledb_information.chunks.
View definition of hypertables was
not filtering dropped and osm_chunks.
Fixes#5338
The `bucket_info` variable is initialized by `caggtimebucketinfo_init`
function called inside the following branch:
`if (rte->relkind == RELKIND_RELATION || rte->relkind == RELKIND_VIEW)`
If for some reason we don't enter in this branch then the `bucket_info`
will not be initialized leading to an uninitialized variable when
returning `bucket_info` at the end of the `cagg_validate_query`
function.
Fixed it by initializing with zeros the `bucket_info` variable when
declaring it.
Found by coverity scan.
This small patch adds support for continuous aggregates to the
`hypertable_detailed_size` (and with that `hypertable_size`).
It adds an additional check to see if a continuous aggregate exists
if a hypertable with the given regclass name isn't found.
This patch adds the functionality that is needed to perform distributed,
parallel joins on reference tables on access nodes. This code allows the
pushdown of a join if:
* (1) The setting "ts_guc_enable_per_data_node_queries" is enabled
* (2) The outer relation is a distributed hypertable
* (3) The inner relation is marked as a reference table
* (4) The join is a left join or an inner join
This PR introduces a timeout argument and a new logic to the
timescale_internal.ping_data_node() function which allows
to handle io timeouts for nodes being unresponsive.
Fix#5312
When executing functions, SPI assumes that `TopTransactionContext` is
used for atomic execution contexts and `PortalContext` is used for
non-atomic contexts. Since jobs need to be able to commit and start
transactions, they are executing in a non-atomic context hence
`PortalContext` will be used, but `PortalContext` is not set when
starting the job. This is not a problem for PL/PgSQL executor, but for
other executors (such as PL/Python) it would be.
This commit fixes the issue by setting the `PortalContext` variable to
the portal context created for the portal and restores it (to NULL)
after execution.
Fixes#5326
To don't make developers confused the right name for Continuous
Aggregates on top of another Continuous Aggregates is `Hierarchical
Continuous Aggregates`, so changed the usage of term `nested` for
`hierarchical`.
TextDatumGetCString() was made typesafe in upstream HEAD (16devel), so
now the compiler catches this. As Tom puts it in ac50f84866:
"TextDatumGetCString(PG_GETARG_TEXT_P(x))" is formally wrong: a text*
is not a Datum. Although this coding will accidentally fail to fail on
all known platforms, it risks leaking memory if a detoast step is needed,
unlike "TextDatumGetCString(PG_GETARG_DATUM(x))" which is what's used
elsewhere.
Just noticed abysmal INSERT performance when experimenting with one of
our customers' data set, and turns out my cache sizes were
misconfigured, leading to constant hypertable chunk cache thrashing.
Show a warning to detect this misconfiguration. Also use more generous
defaults, we're not supposed to run on a microwave (unlike Postgres).
This patch fixes several issues with next_start calculation.
- Previously, the offset was added twice in some cases.
This is fixed by this patch.
- Additionally, schedule intervals with month components
were not handled correctly.
Internally, time_bucket with origin is used to calculate
the next start. However, in the case of month intervals, the
timestamp calculated for a bucket is always aligned on the first
day of the month, regardless of origin.
Therefore, previously the result was aligned with origin by adding
the difference between origin and its respective time bucket.
This difference was computed as a fixed length interval in terms
of days and time. That computation led to incorrect computation of
next start occasionally, for example when a job should be executed on
the last day of a month.
That is fixed by adding an appropriate interval of months to
initial_start and letting Postgres handle this computation properly.
Fixes#5216
The hypertable_compression test had on implicit reliance on the
ordering of tuples when querying the materialized results.
This patch makes the ordering explicit in this test.
1) Simplify the path generation for the parameterized data node scans.
1) Adjust the data node scan cost if it's an index scan, instead of always
treating it as a sequential scan.
1) Hard-code the grouping estimation for distributed hypertable, instead
of using the totally bogus per-column ndistinct value.
1) Add the GUC to disable parameterized
data node scan.
1) Add more tests.
A previous change accidentally broke the dist_hypertable test so that
it prematurely exited. This change restores the test so that it
executes properly.
The function to execute remote commands on data nodes used a blocking
libpq API that doesn't integrate with PostgreSQL interrupt handling,
making it impossible for a user or statement timeout to cancel a
remote command.
Refactor the remote command execution function to use a non-blocking
API and integrate with PostgreSQL signal handling via WaitEventSets.
Partial fix for #4958.
Refactor remote command execution function