5103 Commits

Author SHA1 Message Date
Fabrízio de Royes Mello
3c707bf28a Release 2.17.1 on main
This release contains performance improvements and bug fixes since
the 2.17.0 release. We recommend that you upgrade at the next
available opportunity.

**Features**
* #7360 Add chunk skipping GUC

**Bugfixes**
* #7335 Change log level used in compression
* #7342 Fix collation for in-memory tuple filtering

**Thanks**
* @gmilamjr for reporting an issue with the log level of compression messages
* @hackbnw for reporting an issue with collation during tuple filtering
2024-10-21 15:16:05 -03:00
Erik Nordström
d83383615a Fix flaky Hypercore join test
The join test could sometimes pick a seqscan+sort instead of an
indexscan when doing a MergeAppend+MergeJoin. Disabling seqscan should
make it deterministic.
2024-10-20 22:29:21 +02:00
Erik Nordström
132d14fe7d Fix flaky Hypercore index test
Having multiple indexes that include same prefix of columns caused the
planner to sometimes pick a different index for one of the querires,
which led to different test output. Temporarily remove the alternative
index to make the test predictible.
2024-10-20 22:29:21 +02:00
Sven Klemm
23b736e449 Fix flaky continuous_aggs test
Add missing ORDER BY clause to continuous_aggs test to make output
deterministic.
2024-10-19 14:03:52 +02:00
Sven Klemm
2e3cf30cbd Fix flaky rowsecurity test
Check sql state code instead of error message in row security foreign
key check.
2024-10-19 13:47:43 +02:00
Sven Klemm
5945e01456 Fix approval count workflow
When queried from within action context .authorAssociation is not
filled in as MEMBER but CONTRIBUTOR instead so adjust to query to
take that into account.
2024-10-19 11:34:40 +02:00
Sven Klemm
694fcf428e Remove obsolote multinode comment about chunk status 2024-10-19 11:34:40 +02:00
Sven Klemm
a732b19084 Pushdown ORDER BY for realtime caggs
Previously ordered queries on realtime caggs would always lead to full
table scan as the query plan would have a sort with the limit on top.
With this patch this gets changed so that the ORDER BY can be pushed
down so the query can benefit from the ordered append optimization and
does not require full table scan.

Since the internal structure is different on PG 14 and 15 this
optimization will only be available on PG 16 and 17.

Fixes #4861
2024-10-18 22:06:09 +02:00
Fabrízio de Royes Mello
aa9bc607ce Use proper INVALID_{HYPERTABLE|CHUNK}_ID macros 2024-10-18 14:25:11 -03:00
Sven Klemm
de6b478208 Add workflow to check number of approvals
All PRs except trivial ones should require 2 approvals, since this a global setting
we cannot allow trivial PRs to only have 1 from github configuration alone. So we set
the required approvals in github to 1 and make this check required which will enforce
2 approvals unless overwritten or only CI files are touched.
2024-10-18 19:01:05 +02:00
Ante Kresic
8767565e3f Add chunk skipping GUC
Add the ability to enable/disable new chunk skipping functionality
completely.
2024-10-18 17:47:22 +02:00
Sven Klemm
b65083ef69 Pin setup-wsl version to 3.1.1
Looks like version 3.1.2 does not work so pin to the previous version
instead of generic v3.
2024-10-18 12:45:51 +02:00
Sven Klemm
4316f2c203 Remove multinode ssl tests 2024-10-16 18:03:18 +02:00
Fabrízio de Royes Mello
c359c16c74 PG17: Enable Windows tests on CI
We're forcing PG17 installation since the package still on moderation by
the Chocolatey Community:

https://community.chocolatey.org/packages/postgresql17
2024-10-16 10:46:03 -03:00
Erik Nordström
ed19e29985 Add changelog entry for Hypercore TAM
The Hypercore table access method (TAM) wraps TimescaleDB's columnar
compression engine in a table access method. The TAM API enables
sevaral features that were previously not available on compressed
data, including (but not limited to):

- Ability to build indexes on compressed data (btree, hash).

- Proper statistics, including column stats via ANALYZE

- Better support for vacuum and vacuum full

- Skip-scans on top of compressed data

- Better support for DML (copy/insert/update) directly on compressed
  chunks

- Ability to dynamically create constraints (check, unique, etc.)

- Better lock handling including via CURSORs
2024-10-16 13:13:34 +02:00
Mats Kindahl
e0a7a6f6e1 Hyperstore renamed to hypercore
This changes the names of all symbols, comments, files, and functions
to use "hypercore" rather than "hyperstore".
2024-10-16 13:13:34 +02:00
Mats Kindahl
406901d838 Rename files using "hyperstore" to use "hypercore"
Files and directories using "hyperstore" as part of the name is moved
to the new name using "hypercore".
2024-10-16 13:13:34 +02:00
Mats Kindahl
5798b9f534 Add Hypercore analyze support for PG17
PG17 changed the TAM API to use the new `ReadStream` API instead of the
previous block-oriented API. This commit port the existing
block-oriented solution to use the new `ReadStream` API by setting up
separate read streams for the two relations and using the provided read
stream as a block sampler, fetching the apropriate block from either
the non-compressed or compressed relation.
2024-10-16 13:13:34 +02:00
Erik Nordström
eb2ee0bc5c Refactor hyperstore handling in compress_chunk()
Break out any hypestore handling in `compress_chunk()` into separate
functions. This makes the code more readable.
2024-10-16 13:13:34 +02:00
Mats Kindahl
10c78f1137 Remove memory context switch macro
The macro `TS_WITH_MEMORY_CONTEXT` was used to switch memory context
for a block of code and restore it afterwards. This is checked using
Coccinelle rules instead and the macro is removed.
2024-10-16 13:13:34 +02:00
Mats Kindahl
2ab527e9e3 Fix TRUNCATE of hyperstore tables
Truncate the compressed relation when truncating a hyperstore relation.
This can happen in two situations: either a non-transactional context
or in a transactional context.

For the transactional context, `relation_set_new_filelocator`
will be called to replace the file locator. If this happens, we need to
replace the file locator for the compressed relation as well, if there
is one.

For the non-transactional case, `relation_nontransactional_truncate`
will be called, and we will just forward the call to the compressed
relation as well, if it exists.
2024-10-16 13:13:34 +02:00
Mats Kindahl
29cb359d46 Optimize check for segmentby-only index scans
If an index scan is on segment-by columns only, the index is optimized
to only contain references to complete segments. However, deciding if a
scan is only on segment-by columns requires checking all columns used
in the index scan and since this does not change during a scan, but
needs to be checked for each tuple, we cache this information for the
duration of the scan.
2024-10-16 13:13:34 +02:00
Erik Nordström
ee0a3afee1 Fix Hyperstore index builds with null segments
The index build function didn't properly handle the case when all
rolled-up values in a compressed column were null, thus having a
null-segment. The code has been slightly refactored to handle this
case.

A test is also added for this case.
2024-10-16 13:13:34 +02:00
Erik Nordström
b5b73dc3b6 Fix handling of dropped columns in Arrow slot
Dropped columns need to be included in a tuple table slot's values
array after having called slot_getsomeattrs(). The arrow slot didn't
do this and instead skipped dropped columns, which lead to assertion
errors in some cases.
2024-10-16 13:13:34 +02:00
Erik Nordström
201cfe3b94 Fix issue when recompressing Hyperstore
When recompressing a Hyperstore after changing compression settings,
the compressed chunk could be created twice, leading to a conflict
error when inserting two compression chunk size rows.

The reason this happened was that Hyperstore creates a compressed
chunk on-demand if it doesn't exist when the relation is opened. And,
the recompression code had not yet associated the compressed chunk
with the main chunk when compressing the data.

Fix this by associating the compressed chunk with the main chunk
before opening the main chunk relation to compress the data.
2024-10-16 13:13:34 +02:00
Erik Nordström
ea31d4f5c2 Refactor setting attributes in Arrow getsomeattrs()
When populating an Arrow slot's tts_values array with values in the
getsomeattrs() function, the function set_attr_value() is called. This
function requires passing in an ArrowArray which is acquired via a
compression cache lookup. However, that lookup is not necassary for
segmentby columns (which aren't compressed) and, to avoid it, a
special fast-path was created for segmentby columns outside
set_attr_value(). That, unfortunately, created som code duplication.

This change moves the cache lookup into set_attr_value() instead,
where it can be performed only for the columns that need it. This
leads to cleaner code and less code duplication.
2024-10-16 13:13:34 +02:00
Mats Kindahl
e73d0ceb04 Always copy into non-compressed slot of arrow slot
When copying from a non-arrow slot to a arrow slot, we should always
copy the data into the non-compressed slot and never to the compressed
slot.

The previous check for matching number of attributes fail when you drop
one column from the hyperstore.
2024-10-16 13:13:34 +02:00
Mats Kindahl
86fb747202 Disable hash agg for hypertable_index_btree
We disable hash aggregation in favor of group aggregation to get stable
test. It was flaky because it could pick either group aggregate or hash
aggregate.
2024-10-16 13:13:34 +02:00
Mats Kindahl
1a9d319d4b Fix issue when copying into arrow slot
If you set the table access method for a hypertable all new chunks will
use `ArrowTupleTableSlot` but the copy code assumes that the parent
table has a virtual tuple table slot. This causes a crash when copying
a heap tuple since the values are stored in the "main" slot and not in
either of the child tuple table slots.

Fix this issue by storing the values in the uncompressed slot when it
is empty.
2024-10-16 13:13:34 +02:00
Mats Kindahl
d28a9fc892 Raise error when using Hyperstore with plain table
If an attempt is made to use the hyperstore table access method with a
plain table during creation, throw an error instead of allowing the
table access method to be used.

The table access method currently only supports hypertables and expect
chunks to exist for the table.
2024-10-16 13:13:34 +02:00
Erik Nordström
8f311b7844 Do simple projection in columnar scan
When a columnar scan needs to return a subset of the columns in a scan
relation, it is possible to do a "simple" projection that just copies
the column values to the projection result slot. This avoids a more
costly projection done by PostgreSQL.
2024-10-16 13:13:34 +02:00
Erik Nordström
ff940170cd Always set tts_tableOid in Arrow slot
The tableOid was not set in an Arrow slot when hyperstore was
delivering the next arrow value from the same compressed child slot,
assuming that the tableOid would remain the same since since
delivering the previous value.

This is not always the case, however, as the same slot can be used in
a CREATE TABLE AS or similar statement that inserts the data into
another table. In that case, the insert function of that table will
change the slot's tableOid.

To fix this, hyperstore will always set the tableOid on the slot when
delivering new values.
2024-10-16 13:13:34 +02:00
Erik Nordström
689b1bdd76 Pass on scankeys in parallel columnar scans
This fixes an issue where scankeys were not applied in parallel scans
due to PG not passing on the scan keys to the underlying table access
method when using the function `table_beginscan_parallel()`.

To test the use of scankeys in parallel scans, a test is added that
uses a filter on a segmentby column (this is, currently, the only case
where scankeys are used instead of quals).
2024-10-16 13:13:34 +02:00
Erik Nordström
d7724f348c Add fast-path for iterating an arrow slot
When consuming the values of an arrow array (via and arrow slot)
during a scan, it is best to try to increment the slot as quickly as
possible without doing other (unnecessary) work. Ensuring this "fast
path" exists gives a decent speed boost.
2024-10-16 13:13:34 +02:00
Erik Nordström
cb8f4c2e68 Avoid cache lookup for segmentby columns in arrow slot
When calling slot_getsomeattrs() on an arrow slot, the slot's values
array is populated with data, which includes a potential lookup into
the arrow cache and decompression of the values.

However, if the column is a segmentby column, there is nothing to
decompress and it is not necessary to check the decompression
cache. Avoid this cache lookup by adding a fast path for segmentby
columns that directly copies the value from the underlying
(compressed) tuple.
2024-10-16 13:13:34 +02:00
Erik Nordström
3e0daf6ad4 Refactor attribute map in Arrow slot
Refactor the function to get the attribute offset map in
ArrowTupleTableSlot so that it has an inlined fast path and a slow
path that initializes the map during the first call. After
initalization, the fast path simply returns the map.
2024-10-16 13:13:34 +02:00
Mats Kindahl
09e5aee285 Add whitelist for Hyperstore index access methods
Indexes for Hyperstore require special considerations so we want to
whitelist index access methods that are supported and create an option
to allow the whitelist to be set in the configuration file using the
`timescaledb.hyperstore_indexam_whitelist` option.
2024-10-16 13:13:34 +02:00
Mats Kindahl
a29e9acd50 Error out on expression index with Hyperstore
If an attempt was made to create an expression index a debug build
would abort because it tried to use a system attribute number (zero or
negative).

This commit fixes this by adding a check that expression indexes or
system attribute numbers are not used when building the index and error
out if that happens.
2024-10-16 13:13:34 +02:00
Erik Nordström
ba9a2743c1 Declare ColumnarScan as projection capable
ColumnarScan supports projections, but didn't announce it did. Make
sure it sets CUSTOMPATH_SUPPORT_PROJECTION in the CustomPath flags so
that the planner doesn't add unnecessary Result nodes.
2024-10-16 13:13:34 +02:00
Erik Nordström
0dc1a0e645 Improve decompression cache stats explain
Make the decompression cache stats track more information, including
actual cache hits, misses, and evictions (in terms of hash-table
lookups).

One of the most interesting metrics is number of
decompressions. However, the this statistic was internally tracked as
cache hits, which was confusing since it doesn't have anything to
do to with cache hits.

Conversely, every non-hit, or "avoided decompression", was tracked as
cache misses, which is also a bit ambiguous because ideally one should
never try to decompress something that is already decompressed. This
is further complicated by the fact that some columns should not be
decompressed at all, but are still counted towards this metric. For
now, simply label this as "decompress calls" and hide it by default
unless explain uses verbose.
2024-10-16 13:13:34 +02:00
Erik Nordström
3076fd4ccb Cache typbyval in ArrowArray private for arrow slot
Calling get_typbyval() when creating datums from arrow arrays has a
non-negligible performance impact due to syscache lookup. Optimize
this for a noticable performance gain by caching the typbyval
information in the arrow array's private field.
2024-10-16 13:13:34 +02:00
Erik Nordström
db68f6eeb8 Cache text datums in hyperstore ArrowArray private
Reduce the amount of memory allocations when creating text datums from
arrow array values by creating a reusable memory area in the
ArrowArray's private data storage.
2024-10-16 13:13:34 +02:00
Erik Nordström
9d71fec1af Save ArrowColumnCache entry in Arrow slot header
The ArrowColumnCache entry is valid for an arrow slot until it the
next compressed tuple is stored (or a non-compressed one). Therefore,
it is possible to avoid repeated hash-table lookups by saving the
ArrowColumnCache entry in the Arrow slot header after the first
lookup. This gives a noticable speed up when iterating the arrow array
in the slot.
2024-10-16 13:13:34 +02:00
Erik Nordström
9fcc3a250f Refactor getsomeattrs() in arrow slot
The getsomeattrs() function is on the "hot path" for table scans.
Simplify and optimize this function and related subfunctions it calls
to make it more efficient. This makes the overall flow easier to
understand, ensures quick exit if there's nothing to do.

The refactor also fixes an issue that caused unreferenced columns
being set with getmissingattr(). In this case, getmissingattr()
doesn't do anything when the attribute is not marked with
`atthasmissing`, but it caused unnecessary function calls and checks.
2024-10-16 13:13:34 +02:00
Erik Nordström
774e742210 Use bool array for referenced attrs in arrow slot
Turning referenced_attrs from a bitmapset to a bool array has a
measureable performance impact, unlike segmentby_attrs. Furthermore,
making all attrs-tracking sets into similar bool arrays makes things
consistent.
2024-10-16 13:13:34 +02:00
Erik Nordström
90891638c8 Use bool array for segmentby_attrs in arrow slot
The segmentby_attrs in arrow slot is a bitmapset similar to how
valid_attrs used to be a bitmapset, and bitmapsets can be
slow. Therefore also make segmentby_attrs into a bool array.

Since segmentby_attrs isn't cleared and reallocated when iterating an
arrow slot (unlike valid_attrs), the performance impact isn't as big
(or even measurable) as for valid_attrs. Still, the extra overhead of
a bool array doesn't make a big difference.
2024-10-16 13:13:34 +02:00
Erik Nordström
375516376b Use bool array for valid_attrs in arrow slot
The valid_attrs in the arrow tuple table slot is a bitmapset used to
track columns/attributes that are "materialized" in the slot. This
bitmapset is cleared by freeing the set and then reallocating it again
for the next row in an arrow array. The reallocation happens on a
performance-critical "hot path", and has a significant peformance
impact.

The performance is improved by making valid_attrs a bool array
instead, and preallocating the array at slot initialization. Clearing
it is a simple memset(). While a bool array takes a bit more space
than a bitmapset, it has simpler semantics and is always the size of
the number of attributes.
2024-10-16 13:13:34 +02:00
Mats Kindahl
efdf236f26 Add tests for update
This commit adds tests for update of a segment-by column and update
using the RETURNING clause, mainly as a sanity check.
2024-10-16 13:13:34 +02:00
Mats Kindahl
5b5991c2f0 Add tests for MERGE command 2024-10-16 13:13:34 +02:00
Mats Kindahl
8be54d759d Reduce runtime of tests based on setup_hyperstore
This commit reduces the number of tuples added to the hyperstore table
to reduce the runtime and also fixes `hyperstore_scans`. For
`hyperstore_scans` it is necessary to reduce the number of locations
since we want to trigger dictionary compression and make sure that it
works for that as well.
2024-10-16 13:13:34 +02:00