timescaledb

postgres/timescaledb

Fork 0

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-18 03:23:37 +08:00

Commit Graph

Author	SHA1	Message	Date
Erik Nordström	417b66e974	Fix boundary handling in time types and constraints Time types, like date and timestamps, have limits that aren't the same as the underlying storage type. For instance, while a timestamp is stored as an `int64` internally, its max supported time value is not `INT64_MAX`. Instead, `INT64_MAX` represents `+Infinity` and the actual largest possible timestamp is close to `INT64_MAX` (but not `INT64_MAX-1` either). The same applies to min values. Unfortunately, time handling code does not check for these boundaries; in most cases, overflow handling when, e.g., bucketing, are checked against the max integer values instead of type-specific boundaries. In other cases, overflows simply throw errors instead of clamping to the boundary values, which makes more sense in many situations. Using integer time suffers from similar issues. To take one example, simply inserting a valid `smallint` value close to the max into a table with a `smallint` time column fails: ``` INSERT INTO smallint_table VALUES ('32765', 1, 2.0); ERROR: value "32770" is out of range for type smallint ``` This happens because the code that adds dimensional constraints always checks for overflow against `INT64_MAX` instead of the type-specific max value. Therefore, it tries to create a chunk constraint that ends at `32770`, which is outside the allowed range of `smallint`. The resolve these issues, several time-related utility functions have been implemented that, e.g., return type-specific range boundaries, and perform saturated addition and subtraction while clamping to supported boundaries. Fixes #2292	2020-09-04 23:27:22 +02:00
Joshua Lockerman	584f5d1061	Implement time-series compression algorithms This commit introduces 4 compression algorithms as well as 3 ADTs to support them. The compression algorithms are time-series optimized. The following algorithms are implemented: - DeltaDelta compresses integer and timestamp values - Gorilla compresses floats - Dictionary compression handles any data type and is optimized for low-cardinality datasets. - Array stores any data type in an array-like structure and does not actually compress it (though TOAST-based compression can be applied on top). These compression algorithms are are fully described in tsl/src/compression/README.md. The Abstract Data Types that are implemented are - Vector - A dynamic vector that can store any type. - BitArray - A dynamic vector to store bits. - SimpleHash - A hash table implementation from PG12. More information can be found in src/adts/README.md	2019-10-29 19:02:58 -04:00
Joshua Lockerman	e051842fee	Add interval to internal conversions, and tests for both this and time conversions We find ourselves needing to store intervals (specifically time_bucket widths) in upcoming PRs, so this commit adds that functionality, along with tests that we perform the conversion in a sensible, round-tripa-able, manner. This commit fixes a longstanding bug in plan_hashagg where negative time values would prevent us from using a hashagg. The old logic for to_internal had a flag that caused the function to return -1 instead of throwing an error, if it could not perform the conversion. This logic was incorrect, as -1 is a valid time val The new logic throws the error uncoditionally, and forces the user to CATCH it if they wish to handle that case. Switching plan_hashagg to using the new logic fixed the bug. The commit adds a single SQL file, c_unit_tests.sql, to be the driver for all such pure-C unit tests. Since the tests run quickly, and there is very little work to be done at the SQL level, it does not seem like each group of such tests requires their own SQL file. This commit also upates the test/sql/.gitignore, as some generated files were missing.	2019-03-29 14:47:41 -04:00

Author

SHA1

Message

Date

Erik Nordström

417b66e974

Fix boundary handling in time types and constraints

Time types, like date and timestamps, have limits that aren't the same
as the underlying storage type. For instance, while a timestamp is
stored as an `int64` internally, its max supported time value is not
`INT64_MAX`. Instead, `INT64_MAX` represents `+Infinity` and the
actual largest possible timestamp is close to `INT64_MAX` (but not
`INT64_MAX-1` either). The same applies to min values.

Unfortunately, time handling code does not check for these boundaries;
in most cases, overflow handling when, e.g., bucketing, are checked
against the max integer values instead of type-specific boundaries. In
other cases, overflows simply throw errors instead of clamping to the
boundary values, which makes more sense in many situations.

Using integer time suffers from similar issues. To take one example,
simply inserting a valid `smallint` value close to the max into a
table with a `smallint` time column fails:

```
INSERT INTO smallint_table VALUES ('32765', 1, 2.0);
ERROR:  value "32770" is out of range for type smallint
```

This happens because the code that adds dimensional constraints always
checks for overflow against `INT64_MAX` instead of the type-specific
max value. Therefore, it tries to create a chunk constraint that ends
at `32770`, which is outside the allowed range of `smallint`.

The resolve these issues, several time-related utility functions have
been implemented that, e.g., return type-specific range boundaries,
and perform saturated addition and subtraction while clamping to
supported boundaries.

Fixes #2292

2020-09-04 23:27:22 +02:00

Joshua Lockerman

584f5d1061

Implement time-series compression algorithms

This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md

2019-10-29 19:02:58 -04:00

Joshua Lockerman

e051842fee

Add interval to internal conversions, and tests for both this and time conversions

We find ourselves needing to store intervals (specifically time_bucket widths) in
upcoming PRs, so this commit adds that functionality, along with tests that we
perform the conversion in a sensible, round-tripa-able, manner.

This commit fixes a longstanding bug in plan_hashagg where negative time values
would prevent us from using a hashagg. The old logic for to_internal had a flag
that caused the function to return -1 instead of throwing an error, if it could
not perform the conversion. This logic was incorrect, as -1 is a valid time val
The new logic throws the error uncoditionally, and forces the user to CATCH it
if they wish to handle that case. Switching plan_hashagg to using the new logic
fixed the bug.

The commit adds a single SQL file, c_unit_tests.sql, to be the driver for all such
pure-C unit tests. Since the tests run quickly, and there is very little work to
be done at the SQL level, it does not seem like each group of such tests requires
their own SQL file.

This commit also upates the test/sql/.gitignore, as some generated files were
missing.

2019-03-29 14:47:41 -04:00

3 Commits