3 Commits

Author SHA1 Message Date
Erik Nordström
417b66e974 Fix boundary handling in time types and constraints
Time types, like date and timestamps, have limits that aren't the same
as the underlying storage type. For instance, while a timestamp is
stored as an `int64` internally, its max supported time value is not
`INT64_MAX`. Instead, `INT64_MAX` represents `+Infinity` and the
actual largest possible timestamp is close to `INT64_MAX` (but not
`INT64_MAX-1` either). The same applies to min values.

Unfortunately, time handling code does not check for these boundaries;
in most cases, overflow handling when, e.g., bucketing, are checked
against the max integer values instead of type-specific boundaries. In
other cases, overflows simply throw errors instead of clamping to the
boundary values, which makes more sense in many situations.

Using integer time suffers from similar issues. To take one example,
simply inserting a valid `smallint` value close to the max into a
table with a `smallint` time column fails:

```
INSERT INTO smallint_table VALUES ('32765', 1, 2.0);
ERROR:  value "32770" is out of range for type smallint
```

This happens because the code that adds dimensional constraints always
checks for overflow against `INT64_MAX` instead of the type-specific
max value. Therefore, it tries to create a chunk constraint that ends
at `32770`, which is outside the allowed range of `smallint`.

The resolve these issues, several time-related utility functions have
been implemented that, e.g., return type-specific range boundaries,
and perform saturated addition and subtraction while clamping to
supported boundaries.

Fixes #2292
2020-09-04 23:27:22 +02:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00
Joshua Lockerman
e051842fee Add interval to internal conversions, and tests for both this and time conversions
We find ourselves needing to store intervals (specifically time_bucket widths) in
upcoming PRs, so this commit adds that functionality, along with tests that we
perform the conversion in a sensible, round-tripa-able, manner.

This commit fixes a longstanding bug in plan_hashagg where negative time values
would prevent us from using a hashagg. The old logic for to_internal had a flag
that caused the function to return -1 instead of throwing an error, if it could
not perform the conversion. This logic was incorrect, as -1 is a valid time val
The new logic throws the error uncoditionally, and forces the user to CATCH it
if they wish to handle that case. Switching plan_hashagg to using the new logic
fixed the bug.

The commit adds a single SQL file, c_unit_tests.sql, to be the driver for all such
pure-C unit tests. Since the tests run quickly, and there is very little work to
be done at the SQL level, it does not seem like each group of such tests requires
their own SQL file.

This commit also upates the test/sql/.gitignore, as some generated files were
missing.
2019-03-29 14:47:41 -04:00