Before this PR only SELECTs would be optimized to exclude unneeded
chunks by our planner. This PR enables such optimizations on SELECTs
found within an INSERT as well. This should speed up commands of the
form
INSERT INTO <hypertable> (SELECT ... FROM <hyepertable> WHERE ...)
We would like to enable this for all commands, but currently DELETE and
UPDATE can not handle them, and cause errors when the optimizations are
enabled.
This commit also fixes an issue that would occur if we tried to exclude
chunks based off of infinite time values.
TimescaleDB will want to run multiple background jobs. This PR
adds a simple scheduler so that jobs inserted into a jobs table
could be run on a schedule. This first implementation has two limitations:
1) The list of jobs to be run is read from the database when the scheduler
is first started. We do not update this list if the jobs table changes.
2) There is no prioritization for when to run jobs.
There design of the scheduler is as follows:
The scheduler itself is a background job that continuously runs and waits
for a time when jobs need to be scheduled. It then launches jobs as new
background workers that it controls through the background worker handle.
Aggregate statistics about a job are kept in the job_stat catalog table.
These statistics include the start and finish times of the last run of the job
as well as whether or not the job succeeded. The next_start is used to
figure out when next to run a job after a scheduler is restarted.
The statistics table also tracks consecutive failures and crashes for the job
which is used for calculating the exponential backoff after a crash or failure
(which is used to set the next_start after the crash/failure). Note also that
there is a minimum time after the db scheduler starts up and a crashed job
is restarted. This is to allow the operator enough time to disable the job
if needed.
Note that the number of crashes is an overestimate of the actual number of crashes
for a job. This is so that we are conservative and never miss a crash and fail to
use the appropriate backoff logic. Note that there is some complexity
in ensuring that all crashes are counted since a crash in Postgres causes /all/
processes to SIGQUIT: we must commit changes to the stats
table /before/ a job starts so that we can then deduce after a job has crashed
and the scheduler comes back up that a job was started, and not finished before
the crash (meaning that it could have been the crashing process).
SubspaceStore keeps a running count of the number of objects added to
it called `descendants`. This patch fixes that count, so that it always
keeps track of the number of objects sitting at the leaves of the
SubspaceStore. (The current version treats `descendants` as keeping
track of the number of leaves at some places, and the number of objects
sitting at the next level at others, resulting in the counter containing
neither.
Also fixes UB in dimension vector: memcpy cannot be used on overlapping memory.
Discovered a crash when doing an explain analyze on a cte containing an insert that
did not reference the cte in the select statement. This fixes by copying the eref
from the parent hypertable to the chunk so that columns can be described. It will
only copy the eref from the hypertable if there is a range table entry available.
Also modify copy so it actually sets a dummy variable as the RTE index rather than 1,
which is a valid RTE index, even if there is no RTE for the hypertable.
The extension now works with PostgreSQL 10, while
retaining compatibility with version 9.6.
PostgreSQL 10 has numerous internal changes to functions and
APIs, which necessitates various glue code and compatibility
wrappers to seamlessly retain backwards compatiblity with older
versions.
Test output might also differ between versions. In particular,
the psql client generates version-specific output with `\d` and
EXPLAINs might differ due to new query optimizations. The test
suite has been modified as follows to handle these issues. First,
tests now use version-independent functions to query system
catalogs instead of using `\d`. Second, changes have been made to
the test suite to be able to verify some test outputs against
version-dependent reference files.
When inserting into a hypertable using a sub-select clause that
involves an aggregate, the insert fails with the error "Aggref found
in non-Agg plan node". This happens because the target list from the
aggregate sub-select expects an aggregate parent node and we simply
reuse the target list when modifying the insert plan with new
CustomScan plan nodes.
This change creates a modified target list to use with the CustomScan
node that avoids this issue.
With this change, hypertables no longer rely on an INSERT trigger to
dispatch tuples to chunks. While an INSERT trigger worked well for
both INSERTs and COPYs, it caused issues with supporting some regular
triggers on hypertables, and didn't support RETURNING statements and
upserts (ON CONFLICT DO UPDATE).
INSERTs are now handled by modifying the plan for INSERT statements. A
custom plan node is inserted as a subplan to a ModifyTable plan node,
taking care of dispatching tuples to chunks by setting the result
table for every tuple scanned.
COPYs are handled by modifying the regular copy code. Unfortunately,
this required copying a significant amount of regular PostgreSQL
source code since there are no hooks to add modifications. However,
since the modifications are small it should be fairly easy to keep the
code in sync with upstream changes.
Previously the count returned by insert and copy was wrong because
the count was reset on every execute. But, often there is an execute
in the middle of an insert (i.e. to create a chunk). This fixes the
logic to reset the count only at the start of the top-level statement.
Fixes#64
Clean up the table schema to get rid of legacy tables and functionality
that makes it more difficult to provide an upgrade path.
Notable changes:
* Get rid of legacy tables and code
* Simplify directory structure for SQL code
* Simplify table hierarchy: remove root table and make chunk tables
* inherit directly from main table
* Change chunk table suffix from _data to _chunk
* Simplify schema usage: _timescaledb_internal for internal functions.
* _timescaledb_catalog for metadata tables.
* Remove postgres_fdw dependency
* Improve code comments in sql code
remove all murmur3-related source code. Alter regression tests
to reflect new hash values for inputs, and a slightly different
set of input data to ensure that sufficient chunks and partitions
are tested. Some changes to .sh scripts in sql/setup that seem
to be used only to power the "unit tests", which I cannot
yet run successfully.
Previously, each test set their own (although mostly the same)
configuration for log output and error verbosity. This is now set
globally in the test runner so that tests only need to set these
configuration parameters if they need to override the defaults. The
log verbosity is also reduced so that errors aren't generated with the
line number of the source file that output the error. Line numbers in
the output can break tests when upgrading to a new PostgreSQL version
that outputs a different line number.
If an error is generated in any of the insert triggers, the insert
state kept during a batch insert might be left in an undefined state,
breaking the next insert. This patch makes sure errors are captured in
the insert triggers so that the state can be cleaned up.
- Directory structure now matches common practices
- Regression tests now run with pg_regress via the PGXS infrastructure.
- Unit tests do not integrate well with pg_regress and have to be run
separately.
- Docker functionality is separate from main Makefile. Run with
`make -f docker.mk` to build and `make -f docker.mk run` to run
the database in a container.