173 Commits

Author SHA1 Message Date
Mats Kindahl
e5e94960d0 Change parameter name to enable Hypercore TAM
Changing from using the `compress_using` parameter with a table access
method name to use the boolean parameter `hypercore_use_access_method`
instead to avoid having to provide a name when using the table access
method for compression.
2024-11-10 10:50:48 +01:00
Mats Kindahl
e0a7a6f6e1 Hyperstore renamed to hypercore
This changes the names of all symbols, comments, files, and functions
to use "hypercore" rather than "hyperstore".
2024-10-16 13:13:34 +02:00
Erik Nordström
f5eae6dc70 Support hyperstore in compression policy
Make sure that hyperstore can be used in a compression policy by
setting `compress_using => 'hyperstore'` in the policy configuration.
2024-10-16 13:13:34 +02:00
Erik Nordström
d1a2ea4961 Make compress_chunk() work with Hyperstore
The `compress_chunk()` function can now be used to create hyperstores
by passing the option `compress_using => 'hyperstore'` to the
function.

Using the `compress_chunk()` function is an alternative to using
`ALTER TABLE my_hyperstore SET ACCESS METHOD` that is compatible with
the existing way of compressing hypertable chunks. It will also make
it easier to support hyperstore compression via compression policies.

Additionally, implement "fast migration" to hyperstore when a table is
already compressed. In that case, simply update the PG catalog to say
that the table is using hyperstore as TAM without rewriting the
table. This fast migration works with both with `...SET ACCESS METHOD`
and `compress_chunk()`.
2024-10-16 13:13:34 +02:00
Erik Nordström
e5fd18728c Add VACUUM support in hyperstore
Implement vacuum by internally calling vacuum on both the compressed
and non-compressed relations.

Since hyperstore indexes are defined on the non-compressed relation,
vacuuming the compressed relation won't clean up compressed tuples
from those indexes. To handle this, a proxy index is defined on each
compressed relation in order to direct index vacuum calls to the
corresponding indexes on the hyperstore relation. The proxy index also
translates the encoded TIDs stored in the index to proper TIDs for the
compressed relation.
2024-10-16 13:13:34 +02:00
Mats Kindahl
ab9f072df7 Replace compressionam with hyperstore
Replace "compressionam" in all functions and symbols with "hyperstore".
2024-10-16 13:13:34 +02:00
Mats Kindahl
00999801e2 Add test for SELECT FOR UPDATE
Add check that SELECT FOR UPDATE does not crash as well as an isolation
test to make sure that it locks rows properly.

Also adds a debug function to check if a TID is for a compressed tuple.
2024-10-16 13:13:34 +02:00
Mats Kindahl
1373ec31f8 Rename compression TAM to hyperstore
The access method and associated tests is renamed to "hyperstore".
2024-10-16 13:13:34 +02:00
Erik Nordström
cb8c756a1d Add initial compression TAM
Implement the table-access method API around compression in order to
have, among other things, seamless index support on compressed data.

The current functionality is rudimentary but common operations work,
including sequence scans.
2024-10-16 13:13:34 +02:00
Pallavi Sontakke
5858892d54
Release 2.17.0
This release adds support for PostgreSQL 17, significantly improves the
performance of continuous aggregate refreshes,
and contains performance improvements for analytical queries and delete
operations over compressed hypertables.
We recommend that you upgrade at the next available opportunity.

**Highlighted features in TimescaleDB v2.17.0**

* Full PostgreSQL 17 support for all existing features. TimescaleDB
v2.17 is available for PostgreSQL 14, 15, 16, and 17.

* Significant performance improvements for continuous aggregate
policies: continuous aggregate refresh is now using
`merge` instead of deleting old materialized data and re-inserting.

This update can decrease dramatically the amount of data that must be
written on the continuous aggregate in the
presence of a small number of changes, reduce the `i/o` cost of
refreshing a continuous aggregate, and generate fewer
  Write-Ahead Logs (`WAL`).
Overall, continuous aggregate policies will be more lightweight, use
less system resources, and complete faster.

* Increased performance for real-time analytical queries over compressed
hypertables:
we are excited to introduce additional Single Instruction, Multiple Data
(`SIMD`) vectorization optimization to our
engine by supporting vectorized execution for queries that group by
using the `segment_by` column(s) and
aggregate using the basic aggregate functions (`sum`, `count`, `avg`,
`min`, `max`).

Stay tuned for more to come in follow-up releases! Support for grouping
on additional columns, filtered aggregation,
  vectorized expressions, and `time_bucket` is coming soon.

* Improved performance of deletes on compressed hypertables when a large
amount of data is affected.

This improvement speeds up operations that delete whole segments by
skipping the decompression step.
It is enabled for all deletes that filter by the `segment_by` column(s).

**PostgreSQL 14 deprecation announcement**

We will continue supporting PostgreSQL 14 until April 2025. Closer to
that time, we will announce the specific
version of TimescaleDB in which PostgreSQL 14 support will not be
included going forward.

**Features**
* #6882: Allow delete of full segments on compressed chunks without
decompression.
* #7033: Use `merge` statement on continuous aggregates refresh.
* #7126: Add functions to show the compression information.
* #7147: Vectorize partial aggregation for `sum(int4)` with grouping on
`segment by` columns.
* #7204: Track additional extensions in telemetry.
* #7207: Refactor the `decompress_batches_scan` functions for easier
maintenance.
* #7209: Add a function to drop the `osm` chunk.
* #7275: Add support for the `returning` clause for `merge`.
* #7200: Vectorize common aggregate functions like `min`, `max`, `sum`,
`avg`, `stddev`, `variance` for compressed columns
of arithmetic types, when there is grouping on `segment by` columns or
no grouping.

**Bug fixes**
* #7187: Fix the string literal length for the `compressed_data_info`
function.
* #7191: Fix creating default indexes on chunks when migrating the data.
* #7195: Fix the `segment by` and `order by` checks when dropping a
column from a compressed hypertable.
* #7201: Use the generic extension description when building `apt` and
`rpm` loader packages.
* #7227: Add an index to the `compression_chunk_size` catalog table.
* #7229: Fix the foreign key constraints where the index and the
constraint column order are different.
* #7230: Do not propagate the foreign key constraints to the `osm`
chunk.
* #7234: Release the cache after accessing the cache entry.
* #7258: Force English in the `pg_config` command executed by `cmake` to
avoid the unexpected building errors.
* #7270: Fix the memory leak in compressed DML batch filtering.
* #7286: Fix the index column check while searching for the index.
* #7290: Add check for null offset for continuous aggregates built on
top of continuous aggregates.
* #7301: Make foreign key behavior for hypertables consistent.
* #7318: Fix chunk skipping range filtering.
* #7320: Set the license specific extension comment in the install
script.

**Thanks**
* @MiguelTubio for reporting and fixing the Windows build error.
* @posuch for reporting the misleading extension description in the generic loader packages.
* @snyrkill for discovering and reporting the issue with continuous
aggregates built on top of continuous aggregates.

---------

Signed-off-by: Pallavi Sontakke <pallavi@timescale.com>
Signed-off-by: Yannis Roussos <iroussos@gmail.com>
Signed-off-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>
Co-authored-by: Yannis Roussos <iroussos@gmail.com>
Co-authored-by: atovpeko <114177030+atovpeko@users.noreply.github.com>
Co-authored-by: Sven Klemm <31455525+svenklemm@users.noreply.github.com>
2024-10-08 15:37:13 +02:00
Ante Kresic
0ac3e3429f Removal of sequence number in compression
Sequence numbers were an optimization for ordering batches based on the
orderby configuration setting. It was used for ordered append and
avoiding sorting compressed data when it matched the query ordering.
However, with enabling changes to compressed data, bookkeeping of
sequence numbers is becoming more of a hassle. Removing them and
using the metadata columns for ordering reduces that burden while
keeping all the existing optimizations that relied on the sequences
in place.
2024-09-30 13:45:47 +02:00
Ildar Musin
01231bafd4 Add function to drop the OSM chunk
The function is used by OSM to disable tiering. It removes catalog records
associated with OSM chunk and resets hypertable status.
2024-09-04 11:29:32 +02:00
Mats Kindahl
e1eeedb276 Add index to compression_chunk_size catalog table
During upgrade the function `remove_dropped_chunk_metadata` is used to
update the metadata tables and remove data for chunks marked as
dropped. The function iterates of the chunks of the provided hypertable
and internally does a sequence scan of `compression_chunk_size` table
to locate the `compressed_chunk_id`, resulting in quadratic execution
time. This is usually not noticed for small number of chunks, but for
large number of chunks this becomes a problem.

This commit fixes this by adding an index to `compression_chunk_size`
catalog table, turning the sequence scan into an index scan.
2024-09-04 10:28:13 +02:00
Erik Nordström
19239ff8dd Add function to show compression information
Add a function that can be used on a compressed data value to show
some metadata information, such as the compression algorithm used and
the presence of any null values.
2024-08-05 17:34:41 +02:00
Sven Klemm
801d32c63c Post-release adjustments for 2.16.0 2024-08-01 07:08:34 +02:00
Fabrízio de Royes Mello
a4a023e89a Rename {enable|disable}_column_stats API
For better understanding we've decided to rename the public API from
`{enable|disable}_column_stats` to `{enable|disable}_chunk_skipping`.
2024-07-26 18:28:17 -03:00
Sven Klemm
af6b4a3911 Change hypertable foreign key handling
Don't copy foreign key constraints to the individual chunks and
instead modify the lookup query to propagate to individual chunks
to mimic how postgres does this for partitioned tables.
This patch also removes the requirement for foreign key columns
to be segmentby columns.
2024-07-22 14:33:00 +02:00
Nikhil Sontakke
50bca31130 Add support for chunk column statistics tracking
Allow users to specify that ranges (min/max values) be tracked
for a specific column using the enable_column_stats() API. We
will store such min/max ranges in a new timescaledb catalog table
_timescaledb_catalog.chunk_column_stats. As of now we support tracking
min/max ranges for smallint, int, bigint, serial, bigserial, date,
timestamp, timestamptz data types. Support for other stats for bloom
filters etc. will be added in the future.

We add an entry of the form (ht_id, invalid_chunk_id, col, -INF, +INF)
into this catalog to indicate that min/max values need to be calculated
for this column in a given hypertable for chunks. We also iterate
through existing chunks and add -INF, +INF entries for them in the
catalog. This allows for selection of these chunks by default since no
min/max values have been calculated for them.

This actual min-max start/end range is calculated later. One of the
entry points is during compression for now. The range is stored in
start (inclusive) and end (exclusive) form. If DML happens into a
compressed chunk then as part of marking it as partial, we also mark
the corresponding catalog entries as "invalid". So partial chunks do
not get excluded further. When recompression happens we get the new
min/max ranges from the uncompressed portion and try to reconcile the
ranges in the catalog based on these new values. This is safe to do in
case of INSERTs and UPDATEs. In case of DELETEs, since we are deleting
rows, it's possible that the min/max ranges change, but as of now we
err on the side of caution and retain the earlier values which can be
larger than the actual range.

We can thus store the min/max values for such columns in this catalog
table at the per-chunk level. Note that these min/max range values do
not participate in partitioning of the data. Such data ranges will be
used for chunk pruning if the WHERE clause of an SQL query specifies
ranges on such a column.

Note that Executor startup time chunk exclusion logic is also able to
use this metadata effectively.

A "DROP COLUMN" on a column with a statistics tracking enabled on it
ends up removing all relevant entries from the catalog tables.

A "decompress_chunk" on a compressed chunk removes its entries from the
"chunk_column_stats" catalog table since now it's available for DML.

Also a new "disable_column_stats" API has been introduced to allow
removal of min/max entries from the catalog for a specific column.
2024-07-12 14:43:16 +05:30
Fabrízio de Royes Mello
cdfa1560e5 Refactor code for getting time bucket function Oid
This is a small refactoring for getting time bucket function Oid from
a view definition. It will be necessary for a following PRs for
completely remove the uncessary catalog metadata table
`continuous_aggs_bucket_function`.

Also added a new SQL function `cagg_get_bucket_function_info` to return
all `time_bucket` information based on a user view definition.
2024-06-26 10:33:23 -03:00
Fabrízio de Royes Mello
438736f6bd Post release 2.15.1 2024-05-30 14:08:38 -03:00
Fabrízio de Royes Mello
8b994c717d Remove regprocedure oid type from catalog
In #6624 we refactored the time bucket catalog table to make it more
generic and save information for all Continuous Aggregates. Previously
it stored only variable bucket size information.

The problem is we used the `regprocedure` type to store the OID of the
given time bucket function but unfortunately it is not supported by
`pg_upgrade`.

Fixed it by changing the column to TEXT and resolve to/from OID using
builtin `regprocedurein` and `format_procedure_qualified` functions.

Fixes #6935
2024-05-22 11:01:56 -03:00
Fabrízio de Royes Mello
ca125cf620 Post-release changes for 2.15.0. 2024-05-07 16:44:43 -03:00
Sven Klemm
e298ecd532 Don't reuse job id
We shouldnt reuse job ids to make it easy to recognize the job
log entries for a job. We also need to keep the old job around
to not break loading dumps from older versions.
2024-05-03 09:05:57 +02:00
Jan Nidzwetzki
f88899171f Add migration for CAggs using time_bucket_ng
The function time_bucket_ng is deprecated. This PR adds a migration path
for existing CAggs. Since time_bucket and time_bucket_ng use different
origin values, a custom origin is set if needed to let time_bucket
create the same buckets as created by time_bucket_ng so far.
2024-04-25 16:08:48 +02:00
Fabrízio de Royes Mello
66c0702d3b Refactor job execution history table
In #6767 we introduced the ability to track job execution history
including succeeded and failed jobs.

The new metadata table `_timescaledb_internal.bgw_job_stat_history` has
two JSONB columns `config` (store config information) and `error_data`
(store the ErrorData information). The problem is that this approach is
not flexible for future history recording changes so this PR refactor
the current implementation to use only one JSONB column named `data`
that will store more job information in that form:

{
  "job": {
    "owner": "fabrizio",
    "proc_name": "error",
    "scheduled": true,
    "max_retries": -1,
    "max_runtime": "00:00:00",
    "proc_schema": "public",
    "retry_period": "00:05:00",
    "initial_start": "00:05:00",
    "fixed_schedule": true,
    "schedule_interval": "00:00:30"
  },
  "config": {
    "bar": 1
  },
  "error_data": {
    "domain": "postgres-16",
    "lineno": 841,
    "context": "SQL statement \"SELECT 1/0\"\nPL/pgSQL function error(integer,jsonb) line 3 at PERFORM",
    "message": "division by zero",
    "filename": "int.c",
    "funcname": "int4div",
    "proc_name": "error",
    "sqlerrcode": "22012",
    "proc_schema": "public",
    "context_domain": "plpgsql-16"
  }
}
2024-04-19 09:19:23 -03:00
Fabrízio de Royes Mello
52094a3103 Track job execution history
In #4678 we added an interface for troubleshoting job failures by
logging it in the metadata table `_timescaledb_internal.job_errors`.

With this PR we extended the existing interface to also store succeeded
executions. A new GUC named `timescaledb.enable_job_execution_logging`
was added to control this new behavior and the default value is `false`.

We renamed the metadata table to `_timescaledb_internal.bgw_job_stat_history`
and added a new view `timescaledb_information.job_history` to users that
have enough permissions can check the job execution history.
2024-04-04 10:39:28 -03:00
Jan Nidzwetzki
8dcb6eed99 Populate CAgg bucket catalog table for all CAggs
This changes the behavior of the CAgg catalog tables. From now on, all
CAggs that use a time_bucket function create an entry in the catalog
table continuous_aggs_bucket_function. In addition, the duplicate
bucket_width attribute is removed from the catalog table continuous_agg.
2024-03-13 16:40:56 +01:00
Sven Klemm
c87be4ab84 Remove get_chunk_colstats and get_chunk_relstats
These 2 functions were used in the multinode context and are no longer
used now.
2024-03-03 23:14:02 +01:00
Jan Nidzwetzki
fdf3aa3bfa Use NULL in CAgg bucket function catalog table
Historically, we have used an empty string for undefined values in the
catalog table continuous_aggs_bucket_function. Since #6624, the optional
arguments can be NULL. This patch cleans up the empty strings and
changes the logic to work with NULL values.
2024-02-23 20:58:32 +01:00
Jan Nidzwetzki
b01c8e7377 Unify handling of CAgg bucket_origin
So far, bucket_origin was defined as a Timestamp but used as a
TimestampTz in many places. This commit changes this and unifies the
usage of the variable.
2024-02-16 18:28:21 +01:00
Jan Nidzwetzki
ab7a09e876 Make CAgg time_bucket catalog table more generic
The catalog table continuous_aggs_bucket_function is currently only used
for variable bucket sizes. Information about the fixed-size buckets is
stored in the table continuous_agg only. This causes some problems
(e.g., we have redundant fields for the bucket_size, fixes size buckets
with offsets are not supported, ...).

This commit is the first in a row of commits that refactor the catalog
for the CAgg time_bucket function. The goals are:

* Remove the CAgg redundant attributes in the catalog
* Create an entry in continuous_aggs_bucket_function for all CAggs
  that use time_bucket

This first commit refactors the continuous_aggs_bucket_function table
and prepares it for more generic use. Not all attributes are used yet,
but these will change in follow-up PRs.
2024-02-16 15:39:49 +01:00
Fabrízio de Royes Mello
5a359ac660 Remove metadata when dropping chunk
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes #6570
2024-02-16 10:45:04 -03:00
Sven Klemm
8d8f158302 2.14.1 post release
Adjust update tests to include new version.
2024-02-15 06:15:59 +01:00
Sven Klemm
ea6d826c12 Add compression settings informational view
This patch adds 2 new views hypertable_compression_settings and
chunk_compression_settings to query the per chunk compression
settings.
2024-02-13 07:33:37 +01:00
Ante Kresic
ba3ccc46db Post-release fixes for 2.14.0
Bumping the previous version and adding tests for 2.14.0.
2024-02-12 09:32:40 +01:00
Sven Klemm
101e4c57ef Add recompress optional argument to compress_chunk
This patch deprecates the recompress_chunk procedure as all that
functionality is covered by compress_chunk now. This patch also adds a
new optional boolean argument to compress_chunk to force applying
changed compression settings to existing compressed chunks.
2024-02-07 12:19:13 +01:00
Nikhil Sontakke
2b8f98c616 Support approximate hypertable size
If a lot of chunks are involved then the current pl/pgsql function
to compute the size of each chunk via a nested loop is pretty slow.
Additionally, the current functionality makes a system call to get the
file size on disk for each chunk everytime this function is called.
That again slows things down. We now have an approximate function which
is implemented in C to avoid the issues in the pl/pgsql function.
Additionally, this function also uses per backend caching using the
smgr layer to compute the approximate size cheaply. The PG cache
invalidation clears off the cached size for a chunk when DML happens
into it. That size cache is thus able to get the latest size in a
matter of minutes. Also, due to the backend caching, any long running
session will only fetch latest data for new or modified chunks and can
use the cached data (which is calculated afresh the first time around)
effectively for older chunks.
2024-02-01 13:25:41 +05:30
Nikhil Sontakke
c715d96aa4 Don't dump unnecessary extension tables
Logging and caching related tables from the timescaledb extension
should not be dumped using pg_dump. Our scripts specify a few such
unwanted tables. Apart from being unnecessary, the "job_errors" had
some restricted permissions causing additional problems in pg_dump.

We now don't include such tables for dumping.

Fixes #5449
2024-01-25 12:01:11 +05:30
Sven Klemm
0b23bab466 Include _timescaledb_catalog.metadata in dumps
This patch changes the dump configuration for
_timescaledb_catalog.metadata to include all entries. To allow loading
logical dumps with this configuration an insert trigger is added that
turns uniqueness conflicts into updates to not block the restore.
2024-01-23 12:53:48 +01:00
Matvey Arye
e89bc24af2 Add functions for determining compression defaults
Add functions to help determine defaults for segment_by and order_by.
2024-01-22 08:10:23 -05:00
Sven Klemm
754f77e083 Remove chunks_in function
This function was used to propagate chunk exclusion decisions from
an access node to data nodes and is no longer needed with the removal
of multinode.
2024-01-22 09:18:26 +01:00
Sven Klemm
f57d584dd2 Make compression settings per chunk
This patch implements changes to the compressed hypertable to allow per
chunk configuration. To enable this the compressed hypertable can no
longer be in an inheritance tree as the schema of the compressed chunk
is determined by the compression settings. While this patch implements
all the underlying infrastructure changes, the restrictions for changing
compression settings remain intact and will be lifted in a followup patch.
2024-01-17 12:53:07 +01:00
Mats Kindahl
662fcc1b1b Make extension state available through function
The extension state is not easily accessible in release builds, which
makes debugging issue with the loader very difficult. This commit
introduces a new schema `_timescaledb_debug` and makes the function
`ts_extension_get_state` available also in release builds as
`_timescaledb_debug.extension_state`.

See #1682
2024-01-11 10:52:35 +01:00
Jan Nidzwetzki
df7a8fed6f Post-release fixes for 2.13.1
Bumping the previous version and adding tests for 2.13.1
2024-01-09 16:31:07 +01:00
Sven Klemm
8f73f95c2a Remove replication_factor field from _timescaledb_catalog.hypertable 2023-12-18 10:53:27 +01:00
Sven Klemm
11dd9af847 Remove multinode catalog objects
This patch removes the following objects:

tables:
- _timescaledb_catalog.chunk_data_node
- _timescaledb_catalog.dimension_partition
- _timescaledb_catalog.hypertable_data_node
- _timescaledb_catalog.remote_txn

views:
- timescaledb_information.data_nodes

functions:
- _timescaledb_functions.hypertable_remote_size
- _timescaledb_functions.chunks_remote_size
- _timescaledb_functions.indexes_remote_size
- _timescaledb_functions.compressed_chunk_remote_stats
2023-12-18 10:53:27 +01:00
Sven Klemm
6395b249a9 Remove remote connection handling code
Remove the code used by multinode to handle remote connections.
This patch completely removes tsl/src/remote and any remaining
distributed hypertable checks.
2023-12-15 19:13:08 +01:00
Sven Klemm
06867af966 Remove multinode functions from crossmodule struct
This commit removes the multinode specific entries from the cross
module function struct. It also removes the function
set_chunk_default_data_node
2023-12-14 21:32:14 +01:00
Sven Klemm
11df1dd648 Remove experimental multinode functions
This commit removes the following functions:
- timescaledb_experimental.block_new_chunks
- timescaledb_experimental.allow_new_chunks
- timescaledb_experimental.subscription_exec
- timescaledb_experimental.move_chunk
- timescaledb_experimental.copy_chunk
- timescaledb_experimental.cleanup_copy_chunk_operation
2023-12-13 23:38:32 +01:00
Sven Klemm
8a2029f569 Remove rxid type and distributed size util functions 2023-12-13 23:38:32 +01:00