1
0
mirror of https://github.com/timescale/timescaledb.git synced 2025-05-20 04:35:00 +08:00

237 Commits

Author SHA1 Message Date
Erik Nordström
a51d21efbe Fix issue creating dimensional constraints
During chunk creation, the chunk's dimensional CHECK constraints are
created via an "upcall" to PL/pgSQL code. However, creating
dimensional constraints in PL/pgSQL code sometimes fails, especially
during high-concurrency inserts, because PL/pgSQL code scans metadata
using a snapshot that might not see the same metadata as the C
code. As a result, chunk creation sometimes fail during constraint
creation.

To fix this issue, implement dimensional CHECK-constraint creation in
C code. Other constraints (FK, PK, etc.) are still created via an
upcall, but should probably also be rewritten in C. However, since
these constraints don't depend on recently updated metadata, this is
left to a future change.

Fixes 
2023-03-24 10:55:08 +01:00
Konstantina Skovola
72c0f5b25e Rewrite recompress_chunk in C for segmentwise processing
This patch introduces a C-function to perform the recompression at
a finer granularity instead of decompressing and subsequently
compressing the entire chunk.

This improves performance for the following reasons:
- it needs to sort less data at a time and
- it avoids recreating the decompressed chunk and the heap
inserts associated with that by decompressing each segment
into a tuplesort instead.

If no segmentby is specified when enabling compression or if an
index does not exist on the compressed chunk then the operation is
performed as before, decompressing and subsequently
compressing the entire chunk.
2023-03-23 11:39:43 +02:00
Fabrízio de Royes Mello
38fcd1b76b Improve Realtime Continuous Aggregate performance
When calling the `cagg_watermark` function to get the watermark of a
Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query
in the underlying materialization hypertable.

The problem is that a `SELECT MAX(time_dimention)` query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.

Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
- Create CAgg: store the minimum value of hypertable time dimension
  data type;
- Refresh CAgg: store the last value of the time dimension materialized
  in the underlying materialization hypertable (or the minimum value of
  materialization hypertable time dimension data type if there's no
  data materialized);
- Drop CAgg Chunks: the same as refresh cagg.

Closes , 
2023-03-22 16:35:23 -03:00
Sven Klemm
03a799b874 Mention that new status values need handling in downgrade script
When adding new status values we must make sure to add special
handling for these values to the downgrade script as previous
versions will not know how to deal with those.
2023-03-14 23:59:10 +01:00
Sven Klemm
65562f02e8 Support unique constraints on compressed chunks
This patch allows unique constraints on compressed chunks. When
trying to INSERT into compressed chunks with unique constraints
any potentially conflicting compressed batches will be decompressed
to let postgres do constraint checking on the INSERT.
With this patch only INSERT ON CONFLICT DO NOTHING will be supported.
For decompression only segment by information is considered to
determine conflicting batches. This will be enhanced in a follow-up
patch to also include orderby metadata to require decompressing
less batches.
2023-03-13 12:04:38 +01:00
Ildar Musin
4c0075010d Add hooks for hypertable drops
To properly clean up the OSM catalog we need a way to reliably track
hypertable deletion (including internal hypertables for CAGGS).
2023-03-06 15:10:49 +01:00
Sven Klemm
4527f51e7c Refactor INSERT into compressed chunks
This patch changes INSERTs into compressed chunks to no longer
be immediately compressed but stored in the uncompressed chunk
instead and later merged with the compressed chunk by a separate
job.

This greatly simplifies the INSERT-codepath as we no longer have
to rewrite the target of INSERTs and on-the-fly compress leading
to a roughly 2x improvement on INSERT rate into compressed chunk.
Additionally this improves TRIGGER-support for INSERTs into
compressed chunks.

This is a necessary refactoring to allow UPSERT/UPDATE/DELETE on
compressed chunks in follow-patches.
2022-12-21 12:53:29 +01:00
Sven Klemm
3b94b996f2 Use custom node to block frozen chunk modifications
This patch changes the code that blocks frozen chunk
modifications to no longer use triggers but to use custom
node instead. Frozen chunks is a timescaledb internal object
and should therefore not be protected by TRIGGER which is
external and creates several hazards. TRIGGERs created to
protect internal state contend with user-created triggers.
The trigger created to protect frozen chunks does not work
well with our restoring GUC which we use when restoring
logical dumps. Thirdly triggers are not functional for any
internal operations but are only working in code paths that
explicitly added trigger support.
2022-11-25 19:56:48 +01:00
Nikhil Sontakke
c92e29ba3a Fix DML HA in multi-node
If a datanode goes down for whatever reason then DML activity to
chunks residing on (or targeted to) that DN will start erroring out.
We now handle this by marking the target chunk as "stale" for this
DN by changing the metadata on the access node. This allows us to
continue to do DML to replicas of the same chunk data on other DNs
in the setup. This obviously will only work for chunks which have
"replication_factor" > 1. Note that for chunks which do not have
undergo any change will continue to carry the appropriate DN related
metadata on the AN.

This means that such "stale" chunks will become underreplicated and
need to be re-balanced by using the copy_chunk functionality by a micro
service or some such process.

Fixes 
2022-11-25 17:42:26 +05:30
Dmitry Simonenko
5813173e07 Introduce drop_stale_chunks() function
This function drops chunks on a specified data node if those chunks are
not known by the access node.

Call drop_stale_chunks() automatically when data node becomes
available again.

Fix 
2022-11-23 19:21:05 +02:00
Jan Nidzwetzki
380464df9b Perform frozen chunk status check via trigger
The commit 9f4dcea30135d1e36d1c452d631fc8b8743b3995 introduces frozen
chunks. Checking whether a chunk is frozen or not has been done so far
in the query planner. If it is not possible to determine which chunks
are affected by a query in the planner (e.g., due to a cast in the WHERE
condition), all chunks are checked. This leads (1) to an increased
planning time and (2) to the situation that a single frozen chunk could
reject queries, even if the frozen chunk is not addressed by the query.
2022-11-18 15:29:49 +01:00
gayyappan
b9ca06d6e3 Move freeze/unfreeze chunk to tsl
Move code for freeze and unfreeze chunk to tsl directory.
2022-11-17 15:28:47 -05:00
Sven Klemm
3059290bea Add new chunk state CHUNK_STATUS_COMPRESSED_PARTIAL
A chunk is in this state when it is compressed but also has
uncompressed data in the uncompressed chunk. Individual tuples
can only ever exist in either area. This is preparation patch
to add support for uncompressed staging area for DML operations.
2022-11-07 13:32:37 +01:00
Sven Klemm
f289ef8828 Remove unused function ts_chunk_is_uncompressed_or_unordered 2022-11-07 11:03:34 +01:00
Ante Kresic
2475c1b92f Roll up uncompressed chunks into compressed ones
This change introduces a new option to the compression procedure which
decouples the uncompressed chunk interval from the compressed chunk
interval. It does this by allowing multiple uncompressed chunks into one
compressed chunk as part of the compression procedure. The main use-case
is to allow much smaller uncompressed chunks than compressed ones. This
has several advantages:
- Reduce the size of btrees on uncompressed data (thus allowing faster
inserts because those indexes are memory-resident).
- Decrease disk-space usage for uncompressed data.
- Reduce number of chunks over historical data.

From a UX point of view, we simple add a compression with clause option
`compress_chunk_time_interval`. The user should set that according to
their needs for constraint exclusion over historical data. Ideally, it
should be a multiple of the uncompressed chunk interval and so we throw
a warning if it is not.
2022-11-02 15:14:18 +01:00
Alexander Kuzmenkov
840f144e09 Enable and fix -Wclobbered
The one in job_stat.c could probably lead to errors.
2022-11-01 18:01:26 +04:00
gayyappan
e08e0a59db Add hook for chunk creation
After data is tiered using OSM, we cannot insert data into the same
range. Need a callback that can be invoked by timescaledb to check
for range overlaps before creating a new chunk
2022-10-28 12:43:31 -04:00
Alexander Kuzmenkov
313845a882 Enable -Wextra
Our code mostly has warnings about comparison with different
signedness.
2022-10-27 16:06:58 +04:00
Alexander Kuzmenkov
864da20cee Build on Ubuntu 22.04
It has newer GCC which should detect more warnings.
2022-10-26 23:32:05 +04:00
Alexander Kuzmenkov
4e47302c2c Speed up chunk search by restriction clauses
We don't have to look up the dimension slices for dimensions for which
we don't have restrictions.

Also sort chunks by ids before looking up the metadata, because this
gives more favorable table access patterns (closer to sequential).

This fixes a planning time regression introduced in 2.7.
2022-09-12 13:44:18 +03:00
gayyappan
7c55d0d5dc Modify OSM chunk's constraint info in chunk catalog
The OSM chunk registers a dummy primary dimension range
in the TimescaleDB catalog. Use the max interval of
the dimension instead of the min interval i.e
use range like [Dec 31 294246 PST, infinity).
Otherwise, policies can try to apply the policy on an OSM
chunk.

Add test with policies for OSM chunks
2022-08-24 17:14:36 -04:00
Alexander Kuzmenkov
51259b31c4 Fix OOM in large INSERTs
Do not allocate various temporary data in PortalContext, such as the
hyperspace point corresponding to the row, or the intermediate data
required for chunk lookup.
2022-08-23 19:40:51 +03:00
gayyappan
6beda28965 Modify chunk exclusion to include OSM chunks
OSM chunks manage their ranges and the timescale
catalog has dummy ranges for these dimensions.
So the chunk exclusion logic cannot rely on the
timescaledb catalog metadata to exclude an OSM chunk.
2022-08-18 09:32:21 -04:00
gayyappan
847919a05f Add osm_chunk field to chunk catalog table
Setting this field to true indicates that
this is an OSM chunk.
2022-08-18 09:32:21 -04:00
Fabrízio de Royes Mello
500c225999 Handle properly default privileges on CAggs
If a default privilege is configured and applied to a given Continuous
Aggregate during it creation just the user view has the ACL properly
configured but the underlying materialization hypertable no leading to
permission errors.

Fixed it by copying the privileges from the user view to the
materialization hypertable during the Continous Aggregate creation.

Fixes 
2022-08-12 14:30:10 -03:00
gayyappan
95cc330e0c Add inherited check constraints to OSM chunk
When a table is added to an inheritance hierrachy, PG checks
if all check constraints are present on this table. When a OSM chunk
is added as a child of a hypertable with constraints,
make sure that all check constraints are replicated on the child OSM
chunk as well.
2022-08-10 10:20:14 -04:00
Erik Nordström
025bda6a81 Add stateful partition mappings
Add a new metadata table `dimension_partition` which explicitly and
statefully details how a space dimension is split into partitions, and
(in the case of multi-node) which data nodes are responsible for
storing chunks in each partition. Previously, partition and data nodes
were assigned dynamically based on the current state when creating a
chunk.

This is the first in a series of changes that will add more advanced
functionality over time. For now, the metadata table simply writes out
what was previously computed dynamically in code. Future code changes
will alter the behavior to do smarter updates to the partitions when,
e.g., adding and removing data nodes.

The idea of the `dimension_partition` table is to minimize changes in
the partition to data node mappings across various events, such as
changes in the number of data nodes, number of partitions, or the
replication factor, which affect the mappings. For example, increasing
the number of partitions from 3 to 4 currently leads to redefining all
partition ranges and data node mappings to account for the new
partition. Complete repartitioning can be disruptive to multi-node
deployments. With stateful mappings, it is possible to split an
existing partition without affecting the other partitions (similar to
partitioning using consistent hashing).

Note that the dimension partition table expresses the current state of
space partitions; i.e., the space-dimension constraints and data nodes
to be assigned to new chunks. Existing chunks are not affected by
changes in the dimension partition table, although an external job
could rewrite, move, or copy chunks as desired to comply with the
current dimension partition state. As such, the dimension partition
table represents the "desired" space partitioning state.

Part of 
2022-08-02 11:38:32 +02:00
Dmitry Simonenko
65b5dc900f Support add_dimension() with existing data
This change allows to create new dimensions even with
existing chunks.

It does not modify any existing data or do migration,
instead it creates full-range (-inf/inf) dimension slice for
existing chunks in order to be compatible with newly created
dimension.

All new chunks created after this will follow logic of the new
dimension and its partitioning.

Fix: 
2022-08-01 10:52:03 +03:00
Sven Klemm
90c7c652b1 Fix chunk creation on hypertables with non-default statistics
When triggering chunk creation on a hypertable with non-default
statistics targets by a user different from the hypertable owner
the chunk creation will fail with a permission error. This patch
changes the chunk table creation to run the attribute modification
as the table owner.

Fixes 
2022-07-22 16:59:00 +02:00
gayyappan
6b0a9937c5 Fix attach_osm_table_chunk
Add chunk inheritance when attaching a OSM
tabel as a chunk of the hypertable
2022-07-21 19:27:24 -04:00
Jan Nidzwetzki
a608d7db61 Fix race conditions during chunk (de)compression
This patch introduces a further check to compress_chunk_impl and
decompress_chunk_impl. After all locks are acquired, a check is made
to see if the chunk is still (un-)compressed. If the chunk was
(de-)compressed while waiting for the locks, the (de-)compression
operation is stopped.

In addition, the chunk locks in decompress_chunk_impl
are upgraded to AccessExclusiveLock to ensure the chunk is not deleted
while other transactions are using it.

Fixes: 
2022-07-05 15:13:10 +02:00
gayyappan
6c20e74674 Block drop chunk if chunk is in frozen state
A chunk in frozen state cannot be dropped.
drop_chunks will skip over frozen chunks without erroring.
Internal api , drop_chunk will error if you attempt to  drop
a chunk without unfreezing it.

This PR also adds a new internal API to unfreeze a chunk.
2022-06-30 09:56:50 -04:00
gayyappan
131f58ee60 Add internal api for foreign table chunk
Add _timescaledb_internal.attach_osm_table_chunk.
This treats a pre-existing foreign table as a
hypertable chunk by adding dummy metadata to the
catalog tables.
2022-06-23 10:11:56 -04:00
Sven Klemm
308ce8c47b Fix various misspellings 2022-06-13 10:53:08 +02:00
Alexander Kuzmenkov
3c56d3eceb Faster lookup of chunks by point
Don't keep the chunk constraints while searching. The number of
candidate chunks can be very large, so keeping these constraints is a
lot of work and uses a lot of memory. For finding the matching chunk,
it is enough to track the number of dimensions that matched a given
chunk id. After finding the chunk id, we can look up only the matching
chunk data with the usual function.

This saves some work when doing INSERTs.
2022-06-07 18:10:20 +05:30
Erik Nordström
9b91665162 Fix crashes in functions using AlterTableInternal
A number of TimescaleDB functions internally call `AlterTableInternal`
to modify tables or indexes. For instance, `compress_chunk` and
`attach_tablespace` act as DDL commands to modify
hypertables. However, crashes occur when these functions are called
via `SELECT * INTO FROM <function_name>` or the equivalent `CREATE
TABLE AS` statement. The crashes happen because these statements are
considered process utility commands and therefore sets up an event
trigger context for collecting commands. However, the event trigger
context is not properly set up to record alter table statements in
this code path, thus causing the crashes.

To prevent crashes, wrap `AlterTableInternal` with the event trigger
functions to properly initialize the event trigger context.
2022-05-19 17:37:09 +02:00
gayyappan
5d56b1cdbc Add api _timescaledb_internal.drop_chunk
Add an internal api to drop a single chunk.
This function drops the storage and metadata
associated with the chunk.
Note that chunk dependencies are not affected.
e.g. Continuous aggs are not updated when this chunk
is dropped.
2022-05-11 15:10:38 -04:00
gayyappan
9f4dcea301 Add _timescaledb_internal.freeze_chunk API
This is an internal function to freeze a chunk
for PG14 and later.

This function sets a chunk status to frozen.
Operations that modify the chunk data
(like insert, update, delete) are not
supported. Frozen chunks can be dropped.

Additionally, chunk status is cached as part of
classify_relation.
2022-05-10 14:00:32 -04:00
Alexander Kuzmenkov
935684c83a Cache whether a rel is a chunk in classify_relation
Use a per-query hash table for this. This speeds up the repeated calls
to classify_relation by avoiding the costly chunk lookup.
2022-03-23 16:49:02 +05:30
Alexander Kuzmenkov
ae79ba6eb4 Scan less chunk metadata when planning ForeignModify
Instead of loading the entire Chunk struct, just look up the data
nodes.
2022-03-23 14:03:34 +05:30
Erik Nordström
c1cf067c4f Improve restriction scanning during hypertable expansion
Improve the performance of metadata scanning during hypertable
expansion.

When a hypertable is expanded to include all children chunks, only the
chunks that match the query restrictions are included. To find the
matching chunks, the planner first scans for all matching dimension
slices. The chunks that reference those slices are the chunks to
include in the expansion.

This change optimizes the scanning for slices by avoiding repeated
open/close of the dimension slice metadata table and index.

At the same time, related dimension slice scanning functions have been
refactored along the same line.

An index on the chunk constraint metadata table is also changed to
allow scanning on dimension_slice_id. Previously, dimension_slice_id
was the second key in the index, which made scans on this key less
efficient.
2022-03-21 15:18:44 +01:00
Erik Nordström
14deea6bd5 Improve chunk scan performance
Chunk scan performance during querying is improved by avoiding
repeated open and close of relations and indexes when joining chunk
information from different metadata tables.

When executing a query on a hypertable, it is expanded to include all
its children chunks. However, during the expansion, the chunks that
don't match the query constraints should also be excluded. The
following changes are made to make the scanning and exclusion more
efficient:

* Ensure metadata relations and indexes are only opened once even
  though metadata for multiple chunks are scanned. This avoids doing
  repeated open and close of tables and indexes for each chunk
  scanned.
* Avoid interleaving scans of different relations, ensuring better
  data locality, and having, e.g., indexes warm in cache.
* Avoid unnecessary scans that repeat work already done.
* Ensure chunks are locked in a consistent order (based on Oid).

To enable the above changes, some refactoring was necessary. The chunk
scans that happen during constraint exclusion are moved into separate
source files (`chunk_scan.c`) for better structure and readability.

Some test outputs are affected due to the new ordering of chunks in
append relations.
2022-02-28 16:53:01 +01:00
gayyappan
264540610e Fix tablespace for compressed chunk's index
When a hypertable uses a non default tablespace, based
on attach_tablespace settings, the compressed chunk's
index is still created in the default tablespace.
This PR fixes this behavior and creates the compressed
chunk and its indexes in the same tablespace.

When move_chunk is executed on a compressed chunk,
move the indexes to the specified destination tablespace.

Fixes 
2022-02-14 11:06:10 -05:00
gayyappan
e5db6a9eec Fix status for dropped chunks that have catalog entries
Chunks that are dropped but preserve the catalog entries
have an incorrect status when they are marked as dropped.
This happens if the chunk was previously compressed and then
gets dropped - the status in the catalog tuple reflects the
compression status. This should be reset since the data is now
dropped.
2022-01-31 17:39:39 -05:00
Alexander Kuzmenkov
22f9cf689d Don't leak Chunks in classify_relation
This function is called often, at least 4 times per chunk, so these add
up. Freeing these chunks allows us to save memory. Ideally, we should
fix the function not to look up the chunk anew each time.

Also make a couple other tweaks that reduce memory usage for planning.
2022-01-31 15:02:50 +03:00
gayyappan
9f64df8567 Add ts_catalog subdirectory
Move files that are related to timescaledb catalog
access to this subdirectory
2022-01-24 16:58:09 -05:00
Fabrízio de Royes Mello
342f848d90 Refactor invalidation log inclusion
Commit 97c2578ffa6b08f733a75381defefc176c91826b overcomplicated the
`invalidate_add_entry` API by adding parameters related to the remote
function call for multi-node on materialization hypertables.

Refactored it simplifying the function interface and adding a new
function to deal with materialization hypertables on multi-node
environment.

Fixes 
2022-01-17 11:45:12 -03:00
Sven Klemm
d989e61b56 Improve show_chunks and drop_chunks error handling
This patch fixes a segfault when calling show_chunks on internal
compressed hypertable and a cache lookup failure for drop_chunks
when calling on internal compressed hypertable.
2021-12-20 10:02:57 +01:00
gayyappan
d8d392914a Support for compression on continuous aggregates
Enable ALTER MATERIALIZED VIEW (timescaledb.compress)
This enables compression on the underlying materialized
hypertable. The segmentby and orderby columns for
compression are based on the GROUP BY clause and time_bucket
clause used while setting up the continuous aggregate.

timescaledb_information.continuous_aggregate view defn
change

Add support for compression policy on continuous
aggregates

Move code from job.c to policy_utils.c
Add support functions to check compression
policy validity for continuous aggregates.
2021-12-17 10:51:33 -05:00
Mats Kindahl
aae19319c0 Rewrite recompress_chunk as procedure
When executing `recompress_chunk` and a query at the same time, a
deadlock can be generated because the chunk relation and the chunk
index and the compressed and uncompressd chunks are locked in different
orders. In particular, when `recompress_chunk` is executing, it will
first decompress the chunk and as part of that lock the uncompressed
chunk index in AccessExclusive mode and when trying to compress the
chunk again it will try to lock the uncompressed chunk in
AccessExclusive as part of truncating it.

Note that `decompress_chunk` and `compress_chunk` lock the relations in
the same order and the issue arises because the procedures are combined
inth a single transaction.

To avoid the deadlock, this commit rewrites the `recompress_chunk` to
be a procedure and adds a commit between the decompression and
compression. Committing the transaction after the decompress will allow
reads and inserts to proceed by working on the uncompressed chunk, and
the compression part of the procedure will take the necessary locks in
strict order, thereby avoiding a deadlock.

In addition, the isolation test is rewritten so that instead of adding
a waitpoint in the PL/SQL function, we implement the isolation test by
taking a lock on the compressed table after the decompression.

Fixes 
2021-12-09 19:42:12 +01:00