If there any indexes on the compressed chunk, insert into them while
inserting the heap data rather than reindexing the relation at the
end. This reduces the amount of locking on the compressed chunk
indexes which created issues when merging chunks and should help
with the future updates of compressed data.
So far, we have set the number of desired workers for decompression to
1. If a query touches only one chunk, we end up with one worker in a
parallel plan. Only if the query touches multiple chunks PostgreSQL
spins up multiple workers. These workers could then be used to process
the data of one chunk.
This patch removes our custom worker calculation and relies on
PostgreSQL logic to calculate the desired parallelity.
Co-authored-by: Jan Kristof Nidzwetzki <jan@timescale.com>
This patch does following:
1. Planner changes to create ChunkDispatch node when MERGE command
has INSERT action.
2. Changes to map partition attributes from a tuple returned from
child node of ChunkDispatch against physical targetlist, so that
ChunkDispatch node can read the correct value from partition column.
3. Fixed issues with MERGE on compressed hypertable.
4. Added more testcases.
5. MERGE in distributed hypertables is not supported.
6. Since there is no Custom Scan (HypertableModify) node for MERGE
with UPDATE/DELETE on compressed hypertables, we don't support this.
Fixes#5139
This patch adds an optimization to the DecompressChunk node. If the
query 'order by' and the compression 'order by' are compatible (query
'order by' is equal or a prefix of compression 'order by'), the
compressed batches of the segments are decompressed in parallel and
merged using a binary heep. This preserves the ordering and the sorting
of the result can be prevented. Especially LIMIT queries benefit from
this optimization because only the first tuples of some batches have to
be decompressed. Previously, all segments were completely decompressed
and sorted.
Fixes: #4223
Co-authored-by: Sotiris Stamokostas <sotiris@timescale.com>
All children of an append path are required to have the same parameterization
so we have to reparameterize when the selected path does not have the right
parameterization.
The function to execute remote commands on data nodes used a blocking
libpq API that doesn't integrate with PostgreSQL interrupt handling,
making it impossible for a user or statement timeout to cancel a
remote command.
Refactor the remote command execution function to use a non-blocking
API and integrate with PostgreSQL signal handling via WaitEventSets.
Partial fix for #4958.
Refactor remote command execution function
SELECT from partially compressed chunk crashes due to reference to NULL
pointer. When generating paths for DecompressChunk, uncompressed_partial_path
is null which is not checked, thus causing a crash. This patch checks for NULL
before calling create_append_path().
Fixes#5134
On caggs with realtime aggregation changing the column name does
not update all the column aliases inside the view metadata.
This patch changes the code that creates the compression
configuration for caggs to get the column name from the materialization
hypertable instead of the view internals.
Fixes#5100
The cursor_fetcher_rewind method assumes that the data node cursor is
rewind either after eof or when there is an associated request. But the
rewind can also occur once the server has generated required number of
rows by joining the relation being scanned with another regular
relation. In this case, the fetch would not have reached eof and there
will be no associated requests as the rows would have been already
loaded into the cursor causing the assertion in cursor_fetcher_rewind
to fail. Fixed that by removing the Assert and updating
cursor_fetcher_rewind to discard the response only if there is an
associated request.
Fixes#5053
Ensure the COPY fetcher implementation reads data until EOF with
`PQgetCopyData()`. Also ensure the malloc'ed copy data is freed with
`PQfreemem()` if an error is thrown in the processing loop.
Previously, the COPY fetcher didn't read until EOF, and instead
assumed EOF when the COPY file trailer is received. Since EOF wasn't
reached, it required terminating the COPY with an extra call to the
(deprecated) `PQendcopy()` function.
Still, there are cases when a COPY needs to be prematurely terminated,
for example, when querying with a LIMIT clause. Therefore, distinguish
between "normal" end (when receiving EOF) and forceful end (cancel the
ongoing query).
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.
This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.
Fixes#4983
We don't want to support BitmapScans below DecompressChunk
as this adds additional complexity to support and there
is little benefit in doing so.
This fixes a bug that can happen when we have a parameterized
BitmapScan that is parameterized on a compressed column and
will lead to an execution failure with an error regarding
incorrect attribute types in the expression.
On PG15 CustomScan by default is not projection capable, thus wraps this
node in Result node. THis change in PG15 causes tests result files which
have EXPLAIN output to fail. This patch fixes the plan outputs.
Fixes#4833
This name better reflects its characteristics, and I'm thinking about
resurrecting the old row-by-row fetcher later, because it can be useful
for parameterized queries.
INSERT into compressed hypertable with number of open chunks greater
than ts_guc_max_open_chunks_per_insert causes segementation fault.
New row which needs to be inserted into compressed chunk has to be
compressed. Memory required as part of compressing a row is allocated
from RowCompressor::per_row_ctx memory context. Once row is compressed,
ExecInsert() is called, where memory from same context is used to
allocate and free it instead of using "Executor State". This causes
a corruption in memory.
Fixes: #4778
Group the incoming rows into batches on access node before COPYing to
data nodes. This gives 2x-5x speedup on various COPY queries to
distributed hypertables.
Also fix the text format passthrough, and prefer text transfer format
for text input to be able to use this passthrough. It saves a lot of
CPU on the access node.
The optimization that constifies certain now() expressions before
hypertable expansion did not apply to CURRENT_TIMESTAMP even
though it is functionally similar to now(). This patch extends the
optimization to CURRENT_TIMESTAMP.
Calling `ts_dist_cmd_invoke_on_data_nodes_using_search_path()` function
without an active transaction allows connection invalidation event
happen between applying `search_path` and the actual command
execution, which leads to an error.
This change introduces a way to ignore connection cache invalidations
using `remote_connection_cache_invalidation_ignore()` function.
This work is based on @nikkhils original fix and the problem research.
Fix#4022
The constify code constifying TIMESTAMPTZ expressions when doing
chunk exclusion did not account for daylight saving time switches
leading to different calculation outcomes when timezone changes.
This patch adds a 4 hour safety buffer to any such calculations.
The code added to support VIEWs did not account for the fact that
varno could be from a different nesting level and therefore not
be present in the current range table.
Allow planner chunk exclusion in subqueries. When we decicde on
whether a query may benefit from constifying now and encounter a
subquery peek into the subquery and check if the constraint
references a hypertable partitioning column.
Fixes#4524
This patch adjusts the operator logic for valid space dimension
constraints to no longer look for an exact match on both sides
of the operator but instead allow mismatched datatypes.
Previously a constraint like `col = value` would require `col`
and `value` to have matching datatype with this change `col` and
`value` can be different datatype as long as they have equality
operator in btree family.
Mismatching datatype can happen commonly when using int8 columns
and comparing them with integer literals. Integer literals default
to int4 so the datatypes would not match unless special care has
been taken in writing the constraints and therefore the optimization
would never apply in those cases.
Since we do not use our own hypertable expansion for SELECT FOR UPDATE
queries we need to make sure to add the extra information necessary to
get hashed space partitions with the native postgres inheritance
expansion working.
This patch adds a new time_bucket_gapfill function that
allows bucketing in a specific timezone.
You can gapfill with explicit timezone like so:
`SELECT time_bucket_gapfill('1 day', time, 'Europe/Berlin') ...`
Unfortunately this introduces an ambiguity with some previous
call variations when an untyped start/finish argument was passed
to the function. Some queries might need to be adjusted and either
explicitly name the positional argument or resolve the type ambiguity
by casting to the intended type.
This patch changes get_git_commit to always return the full hash.
Since different git versions do not agree on the length of the
abbreviated hash this made the length flaky. To make the length
consistent change it to always be the full hash.
When a query has multiple distributed hypertables the row-by-by
fetcher cannot be used. This patch changes the fetcher selection
logic to throw a better error message in those situations.
Previously the following error would be produced in those situations:
unexpected PQresult status 7 when starting COPY mode
The gapfill mechanism to detect an aggregation group change was
using datumIsEqual to compare the group values. datumIsEqual does
not detoast values so when one value is toasted and the other value
is not it will not return the correct result. This patch changes
the gapfill code to use the correct equal operator for the type
of the group column instead of datumIsEqual.
This patch fixes the param handling in prepared statements for generic
plans in ChunkAppend making those params usable in chunk exclusion.
Previously those params would not be resolved and therefore not used
for chunk exclusion.
Fixes#3719
When executing multinode queries that initialize row-by-row fetcher
but never execute it the node cleanup code would hit an assertion
checking the state of the fetcher. Found by sqlsmith.
The "empty" bytea value in a column of a distributed table when
selected was being returned as "null". The actual value on the
datanodes was being stored appropriately but just the return code path
was converting it into "null" on the AN. This has been handled via the
use of PQgetisnull() function now.
Fixes#3455
This patch transforms constraints on hash-based space partitions to make
them usable by postgres constraint exclusion.
If we have an equality condition on a space partitioning column, we add
a corresponding condition on get_partition_hash on this column. These
conditions match the constraints on chunks, so postgres' constraint
exclusion is able to use them and exclude the chunks.
The following transformations are done:
device_id = 1
becomes
((device_id = 1) AND (_timescaledb_internal.get_partition_hash(device_id) = 242423622))
s1 = ANY ('{s1_2,s1_2}'::text[])
becomes
((s1 = ANY ('{s1_2,s1_2}'::text[])) AND
(_timescaledb_internal.get_partition_hash(s1) = ANY ('{1583420735,1583420735}'::integer[])))
These transformations are not visible in EXPLAIN output as we remove
them again after hypertable expansion is done.
For certain inserts on a distributed hypertable, e.g., involving CTEs
and upserts, plans can be generated that weren't properly handled by
the DataNodeCopy and DataNodeDispatch execution nodes. In particular,
the nodes expect ChunkDispatch as a child node, but PostgreSQL can
sometimes insert a Result node above ChunkDispatch, causing the crash.
Further, behavioral changes in PG14 also caused the DataNodeCopy node
to sometimes wrongly believe a RETURNING clause was present. The check
for returning clauses has been updated to fix this issue.
Fixes#4339