2043 Commits

Author SHA1 Message Date
Erik Nordström
1dd9314f4d Improve linting support with clang-tidy
This change replaces the existing `clang-tidy` linter target with
CMake's built-in support for it. The old way of invoking the linter
relied on the `run-clang-tidy` wrapper script, which is not installed
by default on some platforms. Discovery of the `clang-tidy` tool has
also been improved to work with more installation locations.

As a result, linting now happens at compile time and is enabled
automatically when `clang-tidy` is installed and found.

In enabling `clang-tidy`, several non-trivial issues were discovered
in compression-related code. These might be false positives, but,
until a proper solution can be found, "warnings-as-errors" have been
disabled for that code to allow compilation to succeed with the linter
enabled.
2020-05-29 14:04:25 +02:00
Sven Klemm
1c1b3c856e Cleanup GUC names
Change our GUC names to use enable-prefix for all boolean GUCs
similar to postgres GUC names.

This patch renames disable_optimizations to enable_optimizations and
constraint_aware_append to enable_constraint_aware_append and removes
optimize_non_hypertables.
2020-05-28 18:35:09 +02:00
Ruslan Fomkin
6bc4765f4d Remove regression tests on PG 9.6 and 10
The first step of removing support for PG 9.6 and 10 is to remove the
regression tests, which run against PostgreSQL versions 9.6 and 10.
2020-05-28 15:14:09 +02:00
Erik Nordström
14492cc562 Add AppVeyor configuration for multinode
This change updates the AppVeyor configuration for multinode-related
tests. These changes include, but are not limited to:

* Set `max_prepared_transactions` for 2PC.
* Add SSL/TLS configuration (although this is off for now due to
  failing `loader` test when SSL is on).
* Update port settings since `add_data_node` outputs port.
* Ignore `remote_connection` and `remote_txn` since they use a "node
  killer" which does not work on Windows (SIGTERM not supported).
* Set timezone and datestyle
2020-05-27 17:31:09 +02:00
Mats Kindahl
c2744e13ad Show error message on unavailable extension
If the extension is not available on the data node, a strange error
message will be displayed since the extension cannot be installed. This
commit check for the availability of the extension before trying to
bootstrap the node and print a more helpful informational message if
the extension is not available.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
ac94947199 Fix insert batch size calculation for prepared statements
Related to the issue #1702.

Current insert batching implementation depends on a number of
table columns and a batch size. It does not take into account a
maximum number of prepared statements arguments, which by default
can be exceeded with tables having a large number of columns.

This PR has two effects:

1) It automatically recalculates insert batch size instead of
using fixed TUPLE_THRESHOLD value, if the expected total number
of prepared statement arguments will exceed the limit.

2) If fixes integer overload in INSERT statement deparsing if
the number of arguments is greater then 16k.
2020-05-27 17:31:09 +02:00
Mats Kindahl
f214b64b31 Add test for grant propagation
Add test for grant propagation when attaching a data node to a table.
Function `data_node_attach` already calls
`hypertable_assign_data_nodes`, which assigns data nodes, so grants are
properly propagated to data nodes when they are attached.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
5044f5d115 Allow SERIAL columns for distributed hypertables
This change introduce fix and adds support for serial columns
on distributed hypertables.

It fixes issue #1663.

Basically a SERIAL type is a syntax sugar which automatically
creates a SEQUENCE object and makes it dependable on
the column. It also sets DEFAULT expression for using the sequence.

The idea behind the fix is to avoid using the default expression
when deparsing and recreating tables on data nodes in case if the
column has a dependable sequence object.
2020-05-27 17:31:09 +02:00
Mats Kindahl
7a93a2f805 Change location of user certificates and keys
User certificates and keys for logging into data nodes are stored at
the top level of the `ssl_dir` or in the data directory. This can cause
some confusion since a lot of files with user names resembling existing
configuration files will be created as users are added, so this commit
change the location of the user certificates and keys to be in the
`timescaledb/certs` subdirectory of either the `ssl_dir` or data
directory.

In addition, since user names can contain strange characters (quoted
names are allowed as role names, which can contain anything) the commit
changes the names for certificates and keys to use the MD5 sum as hex
string as base name for the files. This will prevent strange user names
from accessing files outside the certificate directory.

The subdirectory is currently hardcoded.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
7b5275e540 Allow ALTER TABLE SET on distributed hypertable
This change allows to use ALTER TABLE SET/RESET, SET OIDS and SET
WITHOUT OIDS clauses with a distributed hypertable.

This PR has two effects:

1. It prevents having to copy storage options for foreign table chunks
when their objects are created on the AN. The command updates only root
table options on the AN and passes it for execution on the data nodes.

2 It prevents distributed hypertable chunks to be updated in the
'ddl_command_end' event trigger on AN, because PostgreSQL does not
support altering storage options for foreign tables.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
9b4aae813f Support storage options for distributed hypertables
This change allows to deparse and include a main table storage
options for the CREATE TABLE command which is executed during
the create_distributed_hypertable() call.
2020-05-27 17:31:09 +02:00
Erik Nordström
26c6e156d7 Fix port conversion issue in add_data_node
This change fixes an issue with port conversion in the `add_data_node`
command that results in an error when a port is not explicitly given
and PostgreSQL is configured to use a high port number. Note that this
issue does _not_ occur when the port number is given as an explicit
argument to `add_data_node`.

The underlying issue is that, without an explicit port number, the
remote port is assumed to be the same as the port configured for the
local server instance. The conversion of that port number was done
using a _signed_ two-byte integer, while the valid port range fits
within an _unsigned_ two-byte integer.

To test higher port ranges without an explicit argument to
`add_data_node`, the default port for test instances has been updated
to a high port number to test integer range overflow for small signed
integers.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
0f2d7251cf Basic LIMIT push down support
This initial implementation allows to deparse LIMIT clause and
include it in the push down query sended to the data nodes.

Current implementation is quite restrictive and allows to use
LIMIT only for simple queries without aggregates or in conjunction
with the ORDER BY clause.
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
effdc478ae Check replication factor for exceeding data nodes
set_replication_factor will check if the replication factor is bigger than the amount of
attached data nodes. It returns an error in such case.
2020-05-27 17:31:09 +02:00
Erik Nordström
686860ea23 Support compression on distributed hypertables
Initial support for compression on distributed hypertables. This
_only_ includes the ability to run `compress_chunk` and
`decompress_chunk` on a distributed hypertable. There is no support
for automation, at least not beyond what one can do individually on
each data node.

Note that an access node keeps no local metadata about which
distributed hypertables have compressed chunks. This information needs
to be fetched directly from data nodes, although such functionality is
not yet implemented. For example, informational views on the access
nodes will not yet report the correct compression states for
distributed hypertables.
2020-05-27 17:31:09 +02:00
Brian Rowe
bf343d7718 Add a test to verify attach_node behavior
This change adds a new case to the data_node test that verifies that
attaching a data node to a hypertable on a data node will fail (as
hypertables are not marked as distributed on data nodes).
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
c44a202576 Implement altering replication factor
Implements SQL function set_replication_factor, which changes
replication factor of a distributed hypertable. The change of the
replication factor doesn't affect existing chunks. Newly created
chunks are replicated according to new replication factor.
2020-05-27 17:31:09 +02:00
Brian Rowe
d49e9a5739 Add repartition option on detach/delete_data_node
This change adds a new parameter to the detach_data_node and
delete_data_node functions that will allow the user to automatically
shrink their space dimension to match the number of nodes.
2020-05-27 17:31:09 +02:00
Erik Nordström
32f3d17cde Rename hypertable_distributed test
The `hypertable_distributed` test is now renamed to `dist_hypertable`
for consistency with other distributed tests that have the `dist_`
prefix.
2020-05-27 17:31:09 +02:00
Brian Rowe
0017208368 Test dimension add on distributed hypertables
Prior to this change attempting to add a dimension to a distributed
hypertable which currently or previously contained data would fail
with an opaque error.  This change will properly test distributed
hypertables when adding dimensions and will print appropriate errors.
2020-05-27 17:31:09 +02:00
Mats Kindahl
8d28fad66d Error on reference from distributed hypertable
It is not possible to properly reference another table from a
distributed hypertable since this would require replication of the
referenced table.

This commit add a warning message when a distributed hypertable attempt
to reference any other table using a foreign key.
2020-05-27 17:31:09 +02:00
Erik Nordström
32bdf64205 Fix compiler warning in release builds
This fixes a couple of warnings about unused variables used for assert
checking that appear in release builds. The `PG_USED_FOR_ASSERTS_ONLY`
attribute has been applied to the variable declarations to quench the
warnings.
2020-05-27 17:31:09 +02:00
Erik Nordström
f20ad8231d Release 2.0.0-beta4
This release includes user experience improvements for managing data
nodes, more efficient statistics collection for distributed
hypertables, and miscellaneous fixes and improvements.
2.0.0-beta4
2020-05-27 17:31:09 +02:00
Erik Nordström
55803125f3 Better handling of chunk insert state destruction
Previously, the memory context for the chunk insert state was freed
using a reset callback on the per-tuple context. This created an
unfortunate cyclic dependency between memory contexts, since both the
per-tuple context and chunk insert state shared the same memory
context parent (the query memory context).

Thus, when deletion happens by calling MemoryContextDelete on the
parent, without having deleted the children first, the parent could
first delete the chunk insert state child, followed by the per-tuple
context which then tried to delete the chunk insert state again.

A better way to handle this is to simply switch the parent of the
chunk insert state's memory context to be the per-tuple context as
long as it is still valid, thus breaking the cycle.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
2cec573213 Fix crash when cancelling long distributed insert
Problem lies in incorrect handling of reset callback for the
ChunkInsertState during the transaction abort procedure, which frees
parent memory context.

Because reset callbacks are executed only after all child memory
context got deleted, it is possible to end up in the sutiation when
the context is already deleted before this callback function being
called.
2020-05-27 17:31:09 +02:00
Erik Nordström
150041566c Fix non-determinism in distributed query tests
Tests that run `ANALYZE` on distributed hypertables are susceptible to
non-deterministic behavior because `ANALYZE` does random sampling and
might have a different seed depending on data node. This change fixes
this issue by running `setseed()` on all data node sessions before
`ANALYZE` is run.

Unfortunately, while this makes the behavior on a specific machine
more deterministic, it doesn't produce the exact same statistics
across different machines and/or C-libraries since those might have
different PRNG implementations for `random()`.
2020-05-27 17:31:09 +02:00
Erik Nordström
8887f26baf Fix array construction issue for remote colstats
When fetching remote column statistics (`pg_statistic`) from data
nodes, the `stanumbers` field was not turned into an array
correctly. This caused values to be corrupted when importing them to
the access node. This issue has been fixed along with some compiler
warning issues (e.g., mixed declaration and code).
2020-05-27 17:31:09 +02:00
Brian Rowe
fad33fe954 Collect column stats for distributed tables.
This change adds a new command to return a subset of the column
stats for a hypertable (column width, percent null, and percent
distinct).  As part of the execution of this command on an access
node, these stats will be collected for distributed chunks and
updated on the access node.
2020-05-27 17:31:09 +02:00
Mats Kindahl
222bf75910 Use template1 as secondary connection database
The `postgres` database might not exists on a data node, but
`template1` will always exist so if a connection using `postgres`
fails, we use `template1` as a secondary database.

This is similar to how `connectMaintenanceDatabase` in the PostgreSQL
code base works.
2020-05-27 17:31:09 +02:00
Erik Nordström
7a25d4bfb3 Fix mixed declaration and code warning
This change fixes a "mixed declaration and code" warning in the remote
chunk estimation code.
2020-05-27 17:31:09 +02:00
Erik Nordström
f747e9df8b Remove the partitionwise_distributed test
The partitionwise_distributed test is now superseeded by dist_query,
which is a much cleaner and better test for the same things.
2020-05-27 17:31:09 +02:00
Erik Nordström
597d04a77a Refactor distributed query tests
Tests for queries on distributed hypertables are now consolidated in
the `dist_query` test. Not only does this test provide more consistent
EXPLAIN output, but it also runs all queries against different types
of tables holding the same data, including comparing the result output
with `diff`.

The different types of tables compared are:

- Regular table (for reference)
- One-dimensional distributed hypertabe
- Two-dimensional distributed hypertabe (which is partially
  repartitioned)

EXPLAINs are provided on the two-dimensional table showing the effect
on plans when quering repartitioned time ranges. In most case, FULL
push-down is not possible in such cases.

In addition to test refactoring, this change includes a fix for
handling `HAVING` clauses in remote partialized queries. Such clauses
should not be sent to the remote end in case of partial queries since
any aggregates in the `HAVING` clause must be returned in the result
target list. Fortunately, modification of the target list is already
taken care of by the main planner.
2020-05-27 17:31:09 +02:00
Erik Nordström
88d59735f9 Make dist_query test PG version specific
This change makes the dist_query test PG version-specific in
preparation for test changes that will produce different output
between, e.g., PG11 and PG12.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
6f5da9b5eb Fix memory leak during long distributed insert
Tuple expression context memory context is not properly
reset during chunk dispatch execution which eventually
consumes all available memory during the query execution:

INSERT INTO test_table
  SELECT now() - random() * interval '2 years', (i/100)::text, random()
FROM
  generate_series(1,700000) AS sub(i);

This problem does not reproduces for a distributed hypertables
with disabled batching and for a regular hypertables.
Because luckly the tuple expression context got freed during
the ModifyTable node execution.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
71e2c35d48 Run distributed VACUUM/ANALYZE without FDW API
Run VACUUM/ANALYZE and automatically import the updated stats
using the distributed DDL functionality instead of FDW
analyze wrappers.
2020-05-27 17:31:09 +02:00
Erik Nordström
59b35db9c9 Print only pruned paths in debug output
When printing paths in debug mode, the "considered" paths saved in the
private rel info is a superset of the paths in the rel's pathlist. The
result of this is that many paths are printed multiple times.

This change makes sure we only print the "diff", i.e, the pruned paths
that were considered but are no longer in the pathlist.

A number of other issues with the debug output has also been
addressed, like consistent function naming and being more robust to
printing rels that might not have `fdw_private` set.
2020-05-27 17:31:09 +02:00
niksa
c60cabd768 Improve relation size estimate
In case when there are no stats (number of tuples/pages) we
can use two approaches to estimate relation size: interpolate
relation size using stats from previous chunks (if exists)
or estimate using shared buffer size (shared buffer size should
align with chunk size).
2020-05-27 17:31:09 +02:00
Mats Kindahl
29ce1510a5 Allow extension on data node
Before this commit, an existing extension would cause an error and
abort the addition of a data node if `bootstrap` was `true`. To allow
extension to already exist on the data node, this commit will first
check if the extension exists on the data node. If the extension
exists, it will be validated, otherwise the extension will be created
on the data node.
2020-05-27 17:31:09 +02:00
niksa
66255eb5cb Improve planner debug output
To better understand choices that planner makes we need to print
all the paths (with costs) that planner considered. Otherwise it
might be hard to understand why certain path is not picked (eg. due
to high startup/total costs) since it will never show up in relation
path list that we print. This should help while working on improving
distributed cost mode. This fix focuses only on paths that involve
data nodes.
2020-05-27 17:31:09 +02:00
niksa
3bd1f914f1 Use qualified table name for chunks_in
If a table contains a column with a same name as table name then query
parser will get confused when parsing `chunks_in` function. The parser
would think that we are passing in column instead of a table. Using
qualified table name fixes this problem. Note that we needed to expand
table using .* in order to avoid parser confusion caused by
schema.table syntax.
2020-05-27 17:31:09 +02:00
Mats Kindahl
267a13ec98 Fix data node extension version check
Currently, if the major version of the extension on the access node is
later than the version of the extension on the data node, the data node
is accepted. Since major versions are not compatible, it should not be
accepted.

Changed the check to only accept the data node if:
- The major version is the same on the data node and the access node.
- The minor version on the data node is same or earlier than than
  access node.

In addition, the code will print a warning if the version on the data
node is older than the version on the access node.
2020-05-27 17:31:09 +02:00
Dmitry Simonenko
11ef10332e Add number of compressed hypertables to stat
This change includes telemetry fixes which extends HypertablesStat
with num_hypertables_compressed. It also updates the way how the
number of regular hypertables is calculated, which is now treated as a
non-compressed and not related to continuous aggregates.
2020-05-27 17:31:09 +02:00
Mats Kindahl
d5f5d92790 Refactor to add database representation
Adding a `Database` structure to keep track of database name,
collation, encoding, and character type.
2020-05-27 17:31:09 +02:00
niksa
aa327518d6 Row-by-row fetcher hardening
Fix dangling pointers when closing async response results. Remove
unnecessary data fetch call.
2020-05-27 17:31:09 +02:00
Mats Kindahl
fdc7138bda Validate database on data node
When a data node needs bootstrapping and the database to be
bootstrapped already exists, it was blindly assumed to be configured
correctly. With this commit, we validate the database if it already
existed before proceeding and raise an error if it is not correctly
configured.

When validating the data node and bootstrap is `true`, we are connected
to the `postgres` database rather than the database to validate.
This means that we cannot use `current_database()` and instead pass the
database name as a parameter to `data_node_validate_database`.
2020-05-27 17:31:09 +02:00
Mats Kindahl
c14948ad98 Propagate grants to data nodes
Before this commit, grants and revokes where not propagated to data
nodes. After this commit, grant and revokes on a distributed hypertable
are propagated to the data nodes of the hypertable.
2020-05-27 17:31:09 +02:00
Mats Kindahl
96dd266a0b Propagate grants when creating hypertables
When creating a hypertable, grants were not propagated to the table on
the remote node, which causes later statements to fail when not
executed as the owner of the table.

This commit deparse grant statements from the table definition and add
the grants to the deparsed statement to send when creating the table on
the data node.
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
ef823a3060 Remove unnecessary check from distributed DDL
Since NULL value for replication factor in SQL DDL corresponds to
HYPERTABLE_REGULAR now, which is different from
HYPERTABLE_DISTRIBUTED_MEMBER, there is no need to check for non-NULL
value and comparing with HYPERTABLE_DISTRIBUTED_MEMBER is enough.
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
6aec69f9c4 Rename exported test functions to follow convention
Rename exported functions used in distributed tests to follow the
convention of ts_ prefix, which was recently forced in non-distributed
tests.
2020-05-27 17:31:09 +02:00
Ruslan Fomkin
78a5ba5bf2 Fix uninitialized warning in test help function
Test code in remote_exec fails to build due to the maybe uninitialized
error on 32-bit alpine package on a string variable. This fix moves
initialization to the string variable declaration, refactors a loop to
have a single place with exit condition, which checks for both NULL
value and empty string.
2020-05-27 17:31:09 +02:00