1065 Commits

Author SHA1 Message Date
Konstantina Skovola
6e65172cd8 Fix tablespace for compressed hypertable and corresponding toast
If a hypertable uses a non default tablespace, the compressed hypertable
and its corresponding toast table and index is still created in the
default tablespace.
This PR fixes this unexpected behavior and creates the compressed
hypertable and its toast table and index in the same tablespace as
the hypertable.

Fixes #5520
2023-05-02 15:28:00 +03:00
Jan Nidzwetzki
df32ad4b79 Optimize compressed chunk resorting
This patch adds an optimization to the DecompressChunk node. If the
query 'order by' and the compression 'order by' are compatible (query
'order by' is equal or a prefix of compression 'order by'), the
compressed batches of the segments are decompressed in parallel and
merged using a binary heep. This preserves the ordering and the sorting
of the result can be prevented. Especially LIMIT queries benefit from
this optimization because only the first tuples of some batches have to
be decompressed. Previously, all segments were completely decompressed
and sorted.

Fixes: #4223

Co-authored-by: Sotiris Stamokostas <sotiris@timescale.com>
2023-05-02 10:46:15 +02:00
Nikhil Sontakke
ed8ca318c0 Quote username identifier appropriately
Need to use quote_ident() on the user roles. Otherwise the
extension scripts will fail.

Co-authored-by: Mats Kindahl <mats@timescale.com>
2023-04-28 16:53:43 +05:30
Bharathy
2ce4bbc432 Enable continuous_aggs tests on all PG version. 2023-04-28 07:40:59 +05:30
Zoltan Haindrich
1d092560f4 Fix on-insert decompression after schema changes
On compressed hypertables 3 schema levels are in use simultaneously
 * main - hypertable level
 * chunk - inheritance level
 * compressed chunk

In the build_scankeys method all of them appear - as slot have their
fields represented as a for a row of the main hypertable.

Accessing the slot by the attribut numbers of the chunks may lead to
indexing mismatches if there are differences between the schemes.

Fixes: #5577
2023-04-27 16:33:36 +02:00
Mats Kindahl
be28794384 Enable run_job() for telemetry job
Since telemetry job has a special code path to be able to be used both
from Apache code and from TSL code, trying to execute the telemetry job
with run_job() will fail.

This code will allow run_job() to be used with the telemetry job to
trigger a send of telemetry data. You have to belong to the group that
owns the telemetry job (or be the owner of the telemetry job) to be
able to use it.

Closes #5605
2023-04-27 16:00:03 +02:00
Fabrízio de Royes Mello
3c8d7cef77 Fix cagg_repair for the old CAgg format
In commit 4a6650d1 we fixed the cagg_repair running for broken
Continuous Aggregates with JOINs but we accidentally removed the
code path for running against the old format (finalized=false)
leading us to a dead code pointed out by CoverityScan:
https://scan4.scan.coverity.com/reports.htm#v54116/p12995/fileInstanceId=131706317&defectInstanceId=14569420&mergedDefectId=384044

Fixed it by restoring the old code path for running the cagg_repair
for Continuous Aggregates in the old format (finalized=false).
2023-04-26 14:29:58 -03:00
Mats Kindahl
d3730a4f6a Add permission checks to run_job()
There were no permission checks when calling run_job(), so it was
possible to execute any job regardless of who owned it. This commit
adds such checks.
2023-04-26 11:56:56 +02:00
Fabrízio de Royes Mello
4a6650d170 Fix broken CAgg with JOIN repair function
The internal `cagg_rebuild_view_definition` function was trying to cast
a pointer to `RangeTblRef` but it actually is a `RangeTblEntry`.

Fixed it by using the already existing `direct_query` data struct to
check if there are JOINs in the CAgg to be repaired.
2023-04-25 15:20:48 -03:00
Ante Kresic
910663d0be Reduce decompression during UPDATE/DELETE
When updating or deleting tuples from a compressed chunk, we first
need to decompress the matching tuples then proceed with the operation.
This optimization reduces the amount of data decompressed by using
compressed metadata to decompress only the affected segments.
2023-04-25 15:49:59 +02:00
Jan Nidzwetzki
c54d8bd946 Add missing order by to compression_ddl tests
Some queries in compression_ddl had no order by. Therefore the output
order was not defined, which led to flaky tests.
2023-04-21 15:54:25 +02:00
Ante Kresic
23b3f8d7a6 Block unique idx creation on compressed hypertable
This block was removed by accident, in order to support this we
need to ensure the uniqueness in the compressed data which is
something we should do in the future thus removing this block.
2023-04-20 16:22:03 +02:00
Zoltan Haindrich
a0df8c8e6d Fix on-insert decompression for unique constraints
Inserting multiple rows into a compressed chunk could have bypassed
constraint check in case the table had segment_by columns.

Decompression is narrowed to only consider candidates by the actual
segment_by value.
Because of caching - decompression was skipped for follow-up rows of
the same Chunk.

Fixes #5553
2023-04-20 13:47:47 +02:00
Ante Kresic
a49fdbcffb Reduce decompression during constraint checking
When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
2023-04-20 12:17:12 +02:00
Konstantina Skovola
5633960f8b Enable indexscan on uncompressed part of partially compressed chunks
This was previously disabled as no data resided on the
uncompressed chunk once it was compressed, but this is not
the case anymore with partially compressed chunks, so we
enable indexscan for the uncompressed chunk again.

Fixes #5432

Co-authored-by: Ante Kresic <ante.kresic@gmail.com>
2023-04-18 17:29:08 +03:00
Mats Kindahl
9a64385f34 Use regrole for job owner
Instead of using a user name to register the owner of a job, we use
regrole. This allows renames to work properly since the underlying OID
does not change when the owner name changes.

We add a check when calling `DROP ROLE` that there is no job with that
owner and generate an error if there is.
2023-04-18 08:57:52 +02:00
Alexander Kuzmenkov
20db884bd7 Increase remote tuple and startup costs
Our cost model should be self-consistent, and the relative values for
the remote tuple and startup costs should reflect their real cost,
relative to costs of other operations like CPU tuple cost.

For example, now remote costs are set even lower than the parallel tuple
and startup cost. Contrary to that, their real world cost is going to be
an order of magnitude higher or more, because parallel tuples are sent
through shared memory, and remote tuples are sent over the network.

Increasing these costs leads to query plan improvements, e.g. we start
to favor the GROUP BY pushdown in some cases.
2023-04-17 22:22:08 +04:00
Lakshmi Narayanan Sreethar
a383c8dd4f Copy scheduled_jobs list before sorting it
The start_scheduled_jobs function mistakenly sorts the scheduled_jobs
list in-place. As a result, when the ts_update_scheduled_jobs_list
function compares the updated list of scheduled jobs with the existing
scheduled jobs list, it is comparing a list that is sorted by job_id to
one that is sorted by next_start time. Fix that by properly copying the
scheduled_jobs list into a new list and use that for sorting.

Fixes #5537
2023-04-14 15:59:57 +05:30
Lakshmi Narayanan Sreethar
b10139ba48 Update bgw_custom testcase
Added a few cleanup steps and updated a test logic to make the testcase
runs more stable.
2023-04-14 15:59:57 +05:30
Ante Kresic
464d20fb41 Propagate vacuum/analyze to compressed chunks
With recent changes, we enabled analyze on uncompressed chunk tables
for compressed chunks. This change includes analyzing the compressed
chunks table when analyzing the hypertable and its chunks,
enabling us to remove the generating stats when compressing chunks.
2023-04-13 12:15:32 +02:00
Rafia Sabih
3f9cb3c27a Pass join related structs to the cagg rte
In case of joins in the continuous aggregates, pass the required
structs to the new rte created. These values are required by the
planner to finally query the materialized view.

Fixes #5433
2023-04-13 04:57:33 +02:00
Fabrízio de Royes Mello
09565acae4 Fix timescaledb_experimental.policies duplicates
Commit 16fdb6ca5e introduced `timescaledb_experimental.policies` view
to expose the Continuous Aggregate policies but the current JOINS over
our catalog are not accurate.

Fixed it by properly JOIN the underlying catalog tables to expose the
correct information without duplicates about the Continuous Aggregate
policies.

Fixes #5492
2023-04-12 15:28:29 -03:00
Fabrízio de Royes Mello
f6c8468ee6 Fix timestamp out of range refreshing CAgg
When refreshing from the beginning (window_start=NULL) of a
Continuous Aggregate with variable time bucket we were getting a
`timestamp out of range` error.

Fixed it by setting `-Infinity` when passing `window_start=NULL` when
refreshing a Continuous Aggregate with variable time bucket.

Fixes #5474, #5534
2023-04-12 14:50:04 -03:00
Mats Kindahl
3cc8a4ca34 Fix error message for continuous aggregates
Several error messages for continuous aggregates are not following the
error message style guidelines at
https://www.postgresql.org/docs/current/error-style-guide.html

In particular, they do not write the hints and detailed messages as
full sentences.
2023-04-12 18:34:56 +02:00
Sven Klemm
f0623a8c38 Skip Ordered Append when only 1 child node is present
This is mostly a cosmetic change. When only 1 child is present there
is no need for ordered append. In this situation we might still
benefit from a ChunkAppend node here due to runtime chunk exclusion
when we have non-immutable constraints, so we still add the ChunkAppend
node in that situation even with only 1 child.
2023-04-12 13:19:16 +02:00
Sven Klemm
0595ff0888 Move type support functions into _timescaledb_functions schema 2023-04-12 12:48:34 +02:00
Ante Kresic
dc5bf3b32e Test compression DML with physical layout changes
These tests try to verify that changing physical layout
of chunks (either compressed or uncompressed) should
yield consistent results. They also verify index mapping
on compressed chunks is handled correctly.
2023-04-11 17:30:08 +02:00
Zoltan Haindrich
975e9ca166 Fix segfault after column drop on compressed table
Decompression produces records which have all the decompressed data
set, but it also retains the fields which are used internally during
decompression.
These didn't cause any problem - unless an operation is being done
with the whole row - in which case all the fields which have ended up
being non-null can be a potential segfault source.

Fixes #5458 #5411
2023-04-06 08:49:54 +02:00
Bharathy
1fb058b199 Support UPDATE/DELETE on compressed hypertables.
This patch does following:

1. Executor changes to parse qual ExprState to check if SEGMENTBY
   column is specified in WHERE clause.
2. Based on step 1, we build scan keys.
3. Executor changes to do heapscan on compressed chunk based on
   scan keys and move only those rows which match the WHERE clause
   to staging area aka uncompressed chunk.
4. Mark affected chunk as partially compressed.
5. Perform regular UPDATE/DELETE operations on staging area.
6. Since there is no Custom Scan (HypertableModify) node for
   UPDATE/DELETE operations on PG versions < 14, we don't support this
   feature on PG12 and PG13.
2023-04-05 17:19:45 +05:30
Erik Nordström
2e6c6b5c58 Refactor and optimize distributed COPY
Refactor the code path that handles remote distributed COPY. The
main changes include:

* Use a hash table to lookup data node connections instead of a list.
* Refactor the per-data node buffer code that accumulates rows into
  bigger CopyData messages.
* Reduce the default number of rows in a CopyData message to 100. This
  seems to improve throughput, probably striking a better balance
  between message overhead and latency.
* The number of rows to send in each CopyData message can now be
  changed via a new foreign data wrapper option.
2023-04-04 15:35:54 +02:00
Rafia Sabih
ff5959f8f9 Handle when FROM clause is missing in continuous aggregate definition
It now errors out for such a case.

Fixes #5500
2023-03-29 22:29:16 +02:00
Konstantina Skovola
cb81c331ae Allow named time_bucket arguments in Cagg definition
Fixes #5450
2023-03-28 18:45:41 +03:00
Rafia Sabih
98218c1d07 Enable joins for heirarchical continuous aggregates
The joins could be between a continuous aggregate and hypertable,
continuous aggregate and a regular Postgres table,
and continuous aggregate and a regular Postgres view.
2023-03-28 15:12:54 +02:00
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
Fabrízio de Royes Mello
38fcd1b76b Improve Realtime Continuous Aggregate performance
When calling the `cagg_watermark` function to get the watermark of a
Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query
in the underlying materialization hypertable.

The problem is that a `SELECT MAX(time_dimention)` query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.

Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
- Create CAgg: store the minimum value of hypertable time dimension
  data type;
- Refresh CAgg: store the last value of the time dimension materialized
  in the underlying materialization hypertable (or the minimum value of
  materialization hypertable time dimension data type if there's no
  data materialized);
- Drop CAgg Chunks: the same as refresh cagg.

Closes #4699, #5307
2023-03-22 16:35:23 -03:00
shhnwz
699fcf48aa Stats improvement for Uncompressed Chunks
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.

Fixes #309
2023-03-22 23:51:13 +05:30
Bharathy
cc51e20e87 Add support for ON CONFLICT DO UPDATE for compressed hypertables
This patch fixes execution of INSERT with ON CONFLICT DO UPDATE by
removing error and allowing UPDATE do happen on the given compressed
hypertable.
2023-03-20 22:55:27 +05:30
Zoltan Haindrich
790b322b24 Fix DEFAULT value handling in decompress_chunk
The sql function decompress_chunk did not filled in
default values during its operation.

Fixes #5412
2023-03-16 09:16:50 +01:00
Alexander Kuzmenkov
827684f3e2 Use prepared statements for parameterized data node scans
This allows us to avoid replanning the inner query on each new loop,
speeding up the joins.
2023-03-15 18:22:01 +04:00
Dmitry Simonenko
f8022eb332 Add additional tests for compression with HA
Make sure inserts into compressed chunks work when a DN is down

Fix #5039
2023-03-13 17:43:48 +02:00
Sven Klemm
65562f02e8 Support unique constraints on compressed chunks
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
2023-03-13 12:04:38 +01:00
Jan Nidzwetzki
356a20777c Handle user-defined FDW options properly
This patch changes the way user-defined FDW options (e.g., startup
costs, per-tuple costs) are handled. So far, these values were retrieved
in apply_fdw_and_server_options() but reset to default values afterward.
2023-03-13 10:39:52 +01:00
Alexander Kuzmenkov
e92d5ba748 Add more tests for compression
Unit tests for different data sequences, and SQL test for float4.
2023-03-10 20:34:17 +04:00
Jan Nidzwetzki
7b8177aa74 Fix file trailer handling in the COPY fetcher
The copy fetcher fetches tuples in batches. When the last element in the
batch is the file trailer, the trailer was not handled correctly. The
existing logic did not perform a PQgetCopyData in that case. Therefore
the state of the fetcher was not set to EOF and the copy operation was
not correctly finished at this point.

Fixes: #5323
2023-03-09 14:29:06 +01:00
Bharathy
f54dd7b05d Fix SEGMENTBY columns predicates to be pushed down
WHERE clause with SEGMENTBY column of type text/bytea
non-equality operators are not pushed down to Seq Scan
node of compressed chunk. This patch fixes this issue.

Fixes #5286
2023-03-08 19:17:43 +05:30
Erik Nordström
c76a0cff68 Add parallel support for partialize_agg()
Make `partialize_agg()` support parallel query execution. To make this
work, the finalize node need combine the individual partials from each
parallel worker, but the final step that turns the resulting partial
into the finished aggregate should not happen. Thus, in the case of
distributed hypertables, each data node can run a parallel query to
compute a partial, and the access node can later combine and finalize
these partials into the final aggregate. Esssentially, there will be
one combine step (minus final) on each data node, and then another one
plus final on the access node.

To implement this, the finalize aggregate plan is simply modified to
elide the final step, and to reserialize the partial. It is only
possible to do this at the plan stage; if done at the path stage, the
PostgreSQL planner will hit assertions that assume that the node has
certain values (e.g., it doesn't expect combine Paths to skip the
final step).
2023-03-08 14:14:25 +01:00
Konstantina Skovola
5a3cacd06f Fix sub-second intervals in hierarchical caggs
Previously we used date_part("epoch", interval) and integer division
internally to determine whether the top cagg's interval is a
multiple of its parent's.
This led to precision loss and wrong results
in the case of intervals with sub-second components.

Fixed by using the `ts_interval_value_to_internal` function to convert
intervals to appropriate integer representation for division.

Fixes #5277
2023-03-07 13:25:49 +02:00
Ildar Musin
4c0075010d Add hooks for hypertable drops
To properly clean up the OSM catalog we need a way to reliably track
hypertable deletion (including internal hypertables for CAGGS).
2023-03-06 15:10:49 +01:00
Fabrízio de Royes Mello
32046832d3 Fix Hierarchical CAgg chunk_interval_size
When a Continuous Aggregate is created the `chunk_interval_size` is
defined my the `chunk_interval_size` of the original hypertable
multiplied by a fixed factor of 10.

The problem is currently when we create a Hierarchical Continuous
Aggregate the same factor is applied and it lead to an exponential
`chunk_interval_size`.

Fixed it by just copying the `chunk_interval_size` from the base
Continuous Aggregate for an Hierachical Continuous Aggreagate.

Fixes #5382
2023-03-03 12:31:24 -03:00
Sotiris Stamokostas
750e69ede1 Renamed size_utils.sql
Renamed:
tsl/test/sql/size_utils.sql
tsl/test/expected/size_utils.out
To:
tsl/test/sql/size_utils_tsl.sql
tsl/test/expected/size_utils_tsl.out
because conflicting with test/sql/size_utils.sql
2023-03-02 13:20:08 +02:00