This change replaces the existing `clang-tidy` linter target with
CMake's built-in support for it. The old way of invoking the linter
relied on the `run-clang-tidy` wrapper script, which is not installed
by default on some platforms. Discovery of the `clang-tidy` tool has
also been improved to work with more installation locations.
As a result, linting now happens at compile time and is enabled
automatically when `clang-tidy` is installed and found.
In enabling `clang-tidy`, several non-trivial issues were discovered
in compression-related code. These might be false positives, but,
until a proper solution can be found, "warnings-as-errors" have been
disabled for that code to allow compilation to succeed with the linter
enabled.
Change our GUC names to use enable-prefix for all boolean GUCs
similar to postgres GUC names.
This patch renames disable_optimizations to enable_optimizations and
constraint_aware_append to enable_constraint_aware_append and removes
optimize_non_hypertables.
This change updates the AppVeyor configuration for multinode-related
tests. These changes include, but are not limited to:
* Set `max_prepared_transactions` for 2PC.
* Add SSL/TLS configuration (although this is off for now due to
failing `loader` test when SSL is on).
* Update port settings since `add_data_node` outputs port.
* Ignore `remote_connection` and `remote_txn` since they use a "node
killer" which does not work on Windows (SIGTERM not supported).
* Set timezone and datestyle
If the extension is not available on the data node, a strange error
message will be displayed since the extension cannot be installed. This
commit check for the availability of the extension before trying to
bootstrap the node and print a more helpful informational message if
the extension is not available.
Related to the issue #1702.
Current insert batching implementation depends on a number of
table columns and a batch size. It does not take into account a
maximum number of prepared statements arguments, which by default
can be exceeded with tables having a large number of columns.
This PR has two effects:
1) It automatically recalculates insert batch size instead of
using fixed TUPLE_THRESHOLD value, if the expected total number
of prepared statement arguments will exceed the limit.
2) If fixes integer overload in INSERT statement deparsing if
the number of arguments is greater then 16k.
Add test for grant propagation when attaching a data node to a table.
Function `data_node_attach` already calls
`hypertable_assign_data_nodes`, which assigns data nodes, so grants are
properly propagated to data nodes when they are attached.
This change introduce fix and adds support for serial columns
on distributed hypertables.
It fixes issue #1663.
Basically a SERIAL type is a syntax sugar which automatically
creates a SEQUENCE object and makes it dependable on
the column. It also sets DEFAULT expression for using the sequence.
The idea behind the fix is to avoid using the default expression
when deparsing and recreating tables on data nodes in case if the
column has a dependable sequence object.
User certificates and keys for logging into data nodes are stored at
the top level of the `ssl_dir` or in the data directory. This can cause
some confusion since a lot of files with user names resembling existing
configuration files will be created as users are added, so this commit
change the location of the user certificates and keys to be in the
`timescaledb/certs` subdirectory of either the `ssl_dir` or data
directory.
In addition, since user names can contain strange characters (quoted
names are allowed as role names, which can contain anything) the commit
changes the names for certificates and keys to use the MD5 sum as hex
string as base name for the files. This will prevent strange user names
from accessing files outside the certificate directory.
The subdirectory is currently hardcoded.
This change allows to use ALTER TABLE SET/RESET, SET OIDS and SET
WITHOUT OIDS clauses with a distributed hypertable.
This PR has two effects:
1. It prevents having to copy storage options for foreign table chunks
when their objects are created on the AN. The command updates only root
table options on the AN and passes it for execution on the data nodes.
2 It prevents distributed hypertable chunks to be updated in the
'ddl_command_end' event trigger on AN, because PostgreSQL does not
support altering storage options for foreign tables.
This change allows to deparse and include a main table storage
options for the CREATE TABLE command which is executed during
the create_distributed_hypertable() call.
This change fixes an issue with port conversion in the `add_data_node`
command that results in an error when a port is not explicitly given
and PostgreSQL is configured to use a high port number. Note that this
issue does _not_ occur when the port number is given as an explicit
argument to `add_data_node`.
The underlying issue is that, without an explicit port number, the
remote port is assumed to be the same as the port configured for the
local server instance. The conversion of that port number was done
using a _signed_ two-byte integer, while the valid port range fits
within an _unsigned_ two-byte integer.
To test higher port ranges without an explicit argument to
`add_data_node`, the default port for test instances has been updated
to a high port number to test integer range overflow for small signed
integers.
This initial implementation allows to deparse LIMIT clause and
include it in the push down query sended to the data nodes.
Current implementation is quite restrictive and allows to use
LIMIT only for simple queries without aggregates or in conjunction
with the ORDER BY clause.
Initial support for compression on distributed hypertables. This
_only_ includes the ability to run `compress_chunk` and
`decompress_chunk` on a distributed hypertable. There is no support
for automation, at least not beyond what one can do individually on
each data node.
Note that an access node keeps no local metadata about which
distributed hypertables have compressed chunks. This information needs
to be fetched directly from data nodes, although such functionality is
not yet implemented. For example, informational views on the access
nodes will not yet report the correct compression states for
distributed hypertables.
This change adds a new case to the data_node test that verifies that
attaching a data node to a hypertable on a data node will fail (as
hypertables are not marked as distributed on data nodes).
Implements SQL function set_replication_factor, which changes
replication factor of a distributed hypertable. The change of the
replication factor doesn't affect existing chunks. Newly created
chunks are replicated according to new replication factor.
This change adds a new parameter to the detach_data_node and
delete_data_node functions that will allow the user to automatically
shrink their space dimension to match the number of nodes.
Prior to this change attempting to add a dimension to a distributed
hypertable which currently or previously contained data would fail
with an opaque error. This change will properly test distributed
hypertables when adding dimensions and will print appropriate errors.
It is not possible to properly reference another table from a
distributed hypertable since this would require replication of the
referenced table.
This commit add a warning message when a distributed hypertable attempt
to reference any other table using a foreign key.
This fixes a couple of warnings about unused variables used for assert
checking that appear in release builds. The `PG_USED_FOR_ASSERTS_ONLY`
attribute has been applied to the variable declarations to quench the
warnings.
This release includes user experience improvements for managing data
nodes, more efficient statistics collection for distributed
hypertables, and miscellaneous fixes and improvements.
Previously, the memory context for the chunk insert state was freed
using a reset callback on the per-tuple context. This created an
unfortunate cyclic dependency between memory contexts, since both the
per-tuple context and chunk insert state shared the same memory
context parent (the query memory context).
Thus, when deletion happens by calling MemoryContextDelete on the
parent, without having deleted the children first, the parent could
first delete the chunk insert state child, followed by the per-tuple
context which then tried to delete the chunk insert state again.
A better way to handle this is to simply switch the parent of the
chunk insert state's memory context to be the per-tuple context as
long as it is still valid, thus breaking the cycle.
Problem lies in incorrect handling of reset callback for the
ChunkInsertState during the transaction abort procedure, which frees
parent memory context.
Because reset callbacks are executed only after all child memory
context got deleted, it is possible to end up in the sutiation when
the context is already deleted before this callback function being
called.
Tests that run `ANALYZE` on distributed hypertables are susceptible to
non-deterministic behavior because `ANALYZE` does random sampling and
might have a different seed depending on data node. This change fixes
this issue by running `setseed()` on all data node sessions before
`ANALYZE` is run.
Unfortunately, while this makes the behavior on a specific machine
more deterministic, it doesn't produce the exact same statistics
across different machines and/or C-libraries since those might have
different PRNG implementations for `random()`.
When fetching remote column statistics (`pg_statistic`) from data
nodes, the `stanumbers` field was not turned into an array
correctly. This caused values to be corrupted when importing them to
the access node. This issue has been fixed along with some compiler
warning issues (e.g., mixed declaration and code).
This change adds a new command to return a subset of the column
stats for a hypertable (column width, percent null, and percent
distinct). As part of the execution of this command on an access
node, these stats will be collected for distributed chunks and
updated on the access node.
The `postgres` database might not exists on a data node, but
`template1` will always exist so if a connection using `postgres`
fails, we use `template1` as a secondary database.
This is similar to how `connectMaintenanceDatabase` in the PostgreSQL
code base works.
Tests for queries on distributed hypertables are now consolidated in
the `dist_query` test. Not only does this test provide more consistent
EXPLAIN output, but it also runs all queries against different types
of tables holding the same data, including comparing the result output
with `diff`.
The different types of tables compared are:
- Regular table (for reference)
- One-dimensional distributed hypertabe
- Two-dimensional distributed hypertabe (which is partially
repartitioned)
EXPLAINs are provided on the two-dimensional table showing the effect
on plans when quering repartitioned time ranges. In most case, FULL
push-down is not possible in such cases.
In addition to test refactoring, this change includes a fix for
handling `HAVING` clauses in remote partialized queries. Such clauses
should not be sent to the remote end in case of partial queries since
any aggregates in the `HAVING` clause must be returned in the result
target list. Fortunately, modification of the target list is already
taken care of by the main planner.
This change makes the dist_query test PG version-specific in
preparation for test changes that will produce different output
between, e.g., PG11 and PG12.
Tuple expression context memory context is not properly
reset during chunk dispatch execution which eventually
consumes all available memory during the query execution:
INSERT INTO test_table
SELECT now() - random() * interval '2 years', (i/100)::text, random()
FROM
generate_series(1,700000) AS sub(i);
This problem does not reproduces for a distributed hypertables
with disabled batching and for a regular hypertables.
Because luckly the tuple expression context got freed during
the ModifyTable node execution.
When printing paths in debug mode, the "considered" paths saved in the
private rel info is a superset of the paths in the rel's pathlist. The
result of this is that many paths are printed multiple times.
This change makes sure we only print the "diff", i.e, the pruned paths
that were considered but are no longer in the pathlist.
A number of other issues with the debug output has also been
addressed, like consistent function naming and being more robust to
printing rels that might not have `fdw_private` set.
In case when there are no stats (number of tuples/pages) we
can use two approaches to estimate relation size: interpolate
relation size using stats from previous chunks (if exists)
or estimate using shared buffer size (shared buffer size should
align with chunk size).
Before this commit, an existing extension would cause an error and
abort the addition of a data node if `bootstrap` was `true`. To allow
extension to already exist on the data node, this commit will first
check if the extension exists on the data node. If the extension
exists, it will be validated, otherwise the extension will be created
on the data node.
To better understand choices that planner makes we need to print
all the paths (with costs) that planner considered. Otherwise it
might be hard to understand why certain path is not picked (eg. due
to high startup/total costs) since it will never show up in relation
path list that we print. This should help while working on improving
distributed cost mode. This fix focuses only on paths that involve
data nodes.
If a table contains a column with a same name as table name then query
parser will get confused when parsing `chunks_in` function. The parser
would think that we are passing in column instead of a table. Using
qualified table name fixes this problem. Note that we needed to expand
table using .* in order to avoid parser confusion caused by
schema.table syntax.
Currently, if the major version of the extension on the access node is
later than the version of the extension on the data node, the data node
is accepted. Since major versions are not compatible, it should not be
accepted.
Changed the check to only accept the data node if:
- The major version is the same on the data node and the access node.
- The minor version on the data node is same or earlier than than
access node.
In addition, the code will print a warning if the version on the data
node is older than the version on the access node.
This change includes telemetry fixes which extends HypertablesStat
with num_hypertables_compressed. It also updates the way how the
number of regular hypertables is calculated, which is now treated as a
non-compressed and not related to continuous aggregates.
When a data node needs bootstrapping and the database to be
bootstrapped already exists, it was blindly assumed to be configured
correctly. With this commit, we validate the database if it already
existed before proceeding and raise an error if it is not correctly
configured.
When validating the data node and bootstrap is `true`, we are connected
to the `postgres` database rather than the database to validate.
This means that we cannot use `current_database()` and instead pass the
database name as a parameter to `data_node_validate_database`.
Before this commit, grants and revokes where not propagated to data
nodes. After this commit, grant and revokes on a distributed hypertable
are propagated to the data nodes of the hypertable.
When creating a hypertable, grants were not propagated to the table on
the remote node, which causes later statements to fail when not
executed as the owner of the table.
This commit deparse grant statements from the table definition and add
the grants to the deparsed statement to send when creating the table on
the data node.
Since NULL value for replication factor in SQL DDL corresponds to
HYPERTABLE_REGULAR now, which is different from
HYPERTABLE_DISTRIBUTED_MEMBER, there is no need to check for non-NULL
value and comparing with HYPERTABLE_DISTRIBUTED_MEMBER is enough.
Test code in remote_exec fails to build due to the maybe uninitialized
error on 32-bit alpine package on a string variable. This fix moves
initialization to the string variable declaration, refactors a loop to
have a single place with exit condition, which checks for both NULL
value and empty string.