timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-21 21:21:22 +08:00

Go to file

Erik Nordström 08050fb9d6 Add support for merging chunks

New procedures to `merge_chunks` are introduced that can merge an
arbitrary number of chunks if the right conditions apply. Basic checks
are done to ensure that the chunks can be merged from a partitioning
perspective. Some more advanced cases that are potentially mergeable
are not supported at this time (e.g., chunks with non-adjacent
partitioning) and merging of compressed chunks.

Merging compressed chunks requires additional work, although the same
basic rewrite approach should work also on the internal compressed
relations. Still, one needs to handle merges of a compressed chunk and
a non-compressed chunk, or two compressed chunks with different
compression settings, partially compressed chunks, and so forth. This
is left for a future enhancement.

Currently, the merge defaults to taking an AccessExclusive lock on the
merged chunks to prevent deadlocks and concurrent
modifications. Weaker locking is supported via an anonymous settings
variable, and it is used in tests to illustrate various deadlock
scenarios. Alternative locking approaches, including
multi-transactional merges, can be considered in the future.

The actual merging is done by rewriting all the data from multiple
chunks into a (temporary) merged heap using the same approach as that
implemented to support VACUUM FULL and CLUSTER. Then this new heap is
swapped into one of the original relations while the rest are
dropped. This approach is MVCC compliant and implements correct
visibility under higher isolation levels, while also cleaning up
garbage tuples.

2025-01-15 16:55:21 +01:00

.github

Restart scheduler on error

2025-01-10 21:57:03 +01:00

.unreleased

Add support for merging chunks

2025-01-15 16:55:21 +01:00

cmake

Hyperstore renamed to hypercore

2024-10-16 13:13:34 +02:00

coccinelle

Add support for foreign keys to hypertables

2024-06-18 17:54:33 +02:00

codecov

Enable branch-level code coverage

2023-05-15 18:33:22 +02:00

docs

Fix broken link for installation docs

2024-12-16 15:41:04 -03:00

scripts

Disable false positive warning on shellcheck

2024-12-16 17:55:46 -03:00

sql

Add support for merging chunks

2025-01-15 16:55:21 +01:00

src

Add support for merging chunks

2025-01-15 16:55:21 +01:00

test

Restart scheduler on error

2025-01-10 21:57:03 +01:00

tsl

Add support for merging chunks

2025-01-15 16:55:21 +01:00

.clang-format

Make clang-format sort includes

2024-05-25 20:05:20 +02:00

.codecov.yml

Fix our codecov repository yaml

2023-02-11 00:12:18 +04:00

.dir-locals.el

Update emacs configuration

2023-05-23 12:57:27 -05:00

.editorconfig

Add .editorconfig for better github display

2019-02-14 16:09:18 -05:00

.git-blame-ignore-revs

Add clang-tidy changes to git blame ignore list

2023-05-16 21:32:52 +02:00

.gitignore

Add .gdb_history to gitignore

2024-10-23 11:36:06 -03:00

.perltidyrc

Move perltidyrc to root

2022-09-26 17:27:16 +02:00

.pull-review

Updated reviewer list for pull-review bot

2024-02-29 21:04:24 +01:00

.yamllint.yaml

Support hyperstore in compression policy

2024-10-16 13:13:34 +02:00

bootstrap

Add shellcheck to CI

2021-11-15 14:54:14 +03:00

bootstrap.bat

Refactor telemetry and fixes

2018-09-10 13:29:59 -04:00

CHANGELOG.md

Release 2.17.2 -- main branch (#7421 )

2024-11-07 12:30:15 +00:00

CMakeLists.txt

Set license specific extension comment in install script

2024-10-08 07:13:35 +02:00

CONTRIBUTING.md

Remove outdated windows CI information

2023-09-26 09:58:26 +02:00

LICENSE

Add new top-level licensing information

2018-12-29 00:12:45 -10:00

LICENSE-APACHE

Add new top-level licensing information

2018-12-29 00:12:45 -10:00

NOTICE

Update copyright notice to 2024

2024-06-03 18:13:51 -04:00

README.md

Update README.md (#7550 )

2024-12-19 15:43:23 +00:00

SECURITY.md

Add security policy to repositoy

2022-05-16 22:02:56 +01:00

timescaledb.control.in

Set license specific extension comment in install script

2024-10-08 07:13:35 +02:00

version.config

Release 2.17.2 -- main branch (#7421 )

2024-11-07 12:30:15 +00:00

README.md

TimescaleDB is a PostgreSQL extension for high-performance real-time analytics on time-series and event data

Install TimescaleDB

Install from a Docker container:

Run the TimescaleDB container:

docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg17

Connect to a database:

docker exec -it timescaledb psql -d "postgres://postgres:password@localhost/postgres"

See other installation options or try Timescale Cloud for free.

Create a hypertable

You create a regular table and then convert it into a hypertable. A hypertable automatically partitions data into chunks based on your configuration.

-- Create timescaledb extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Create a regular SQL table
CREATE TABLE conditions (
  time        TIMESTAMPTZ       NOT NULL,
  location    TEXT              NOT NULL,
  temperature DOUBLE PRECISION  NULL,
  humidity    DOUBLE PRECISION  NULL
);

-- Convert the table into a hypertable that is partitioned by time
SELECT create_hypertable('conditions', by_range('time'));

See more:

Enable columnstore

TimescaleDB's hypercore is a hybrid row-columnar store that boosts analytical query performance on your time-series and event data, while reducing data size by more than 90%. This keeps your queries operating at lightning speed and ensures low storage costs as you scale. Data is inserted in row format in the rowstore and converted to columnar format in the columnstore based on your configuration.

Configure the columnstore on a hypertable:

ALTER TABLE conditions SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'location'
);

Create a policy to automatically convert chunks in row format that are older than seven days to chunks in the columnar format:
```
SELECT add_compression_policy('conditions', INTERVAL '7 days');
```

See more:

Insert and query data

Insert and query data in a hypertable via regular SQL commands. For example:

Insert data into a hypertable named conditions:

INSERT INTO conditions
  VALUES
    (NOW(), 'office',   70.0, 50.0),
    (NOW(), 'basement', 66.5, 60.0),
    (NOW(), 'garage',   77.0, 65.2);

Return the number of entries written to the table conditions in the last 12 hours:

SELECT
  COUNT(*)
FROM
  conditions
WHERE
  time > NOW() - INTERVAL '12 hours';

See more:

Create time buckets

Time buckets enable you to aggregate data in hypertables by time interval and calculate summary values.

For example, calculate the average daily temperature in a table named conditions. The table has a time and temperature columns:

SELECT
  time_bucket('1 day', time) AS bucket,
  AVG(temperature) AS avg_temp
FROM
  conditions
GROUP BY
  bucket
ORDER BY
  bucket ASC;

See more:

Create continuous aggregates

Continuous aggregates are designed to make queries on very large datasets run faster. They continuously and incrementally refresh a query in the background, so that when you run such query, only the data that has changed needs to be computed, not the entire dataset. This is what makes them different from regular PostgreSQL materialized views, which cannot be incrementally materialized and have to be rebuilt from scratch every time you want to refresh it.

For example, create a continuous aggregate view for daily weather data in two simple steps:

Create a materialized view:

CREATE MATERIALIZED VIEW conditions_summary_daily
WITH (timescaledb.continuous) AS
SELECT
  location,
  time_bucket(INTERVAL '1 day', time) AS bucket,
  AVG(temperature),
  MAX(temperature),
  MIN(temperature)
FROM
  conditions
GROUP BY
  location,
  bucket;

Create a policy to refresh the view every hour:

SELECT
  add_continuous_aggregate_policy(
    'conditions_summary_daily',
    start_offset => INTERVAL '1 month',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 hour'
);

See more:

Want TimescaleDB hosted and managed for you? Try Timescale Cloud

Timescale Cloud is a cloud-based PostgreSQL platform for resource-intensive workloads. We help you build faster, scale further, and stay under budget. A Timescale Cloud service is a single optimized 100% PostgreSQL database instance that you use as is, or extend with capabilities specific to your business needs. The available capabilities are:

Time-series and analytics: PostgreSQL with TimescaleDB. The PostgreSQL you know and love, supercharged with functionality for storing and querying time-series data at scale for analytics and other use cases. Get faster time-based queries with hypertables, continuous aggregates, and columnar storage. Save on storage with native compression, data retention policies, and bottomless data tiering to Amazon S3.
AI and vector: PostgreSQL with vector extensions. Use PostgreSQL as a vector database with purpose built extensions for building AI applications from start to scale. Get fast and accurate similarity search with the pgvector and pgvectorscale extensions. Create vector embeddings and perform LLM reasoning on your data with the pgai extension.
PostgreSQL: the trusted industry-standard RDBMS. Ideal for applications requiring strong data consistency, complex relationships, and advanced querying capabilities. Get ACID compliance, extensive SQL support, JSON handling, and extensibility through custom functions, data types, and extensions. All services include all the cloud tooling you'd expect for production use: automatic backups, high availability, read replicas, data forking, connection pooling, tiered storage, usage-based storage, and much more.