50 Commits

Author SHA1 Message Date
Brian Rowe
79fb46456f Rename server to data node
The timescale clustering code so far has been written referring to the
remote databases as 'servers'.  This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest.  In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database.  Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.

As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes.  This change has updated the code to rename
those instances.
2020-05-27 17:31:09 +02:00
niksa
2fd99c6f4b Block new chunks on data nodes
This functionality enables users to block or allow creation of new
chunks on a data node for one or more hypertables. Use cases for this
include the ability to block new chunks when a data node is running
low on disk space or to affect chunk distribution across data nodes.

Sometimes blocking data nodes for new chunks can make a hypertable
under-replicated. For that case an additional argument `force => true`
can be supplied to force blocking new chunks.

Here are some examples.

Block for a specific hypertable:
`SELECT * FROM block_new_chunks_on_server('server_1', 'disttable');`

Block for all hypertables on the server:
`SELECT * FROM block_new_chunks_on_server('server_1', force =>true);`

Unblock:
`SELECT * FROM allow_new_chunks_on_server('server_1', true);`

This change adds the `force` argument to `detach_server` as well.  If
detaching or blocking new chunks will make a hypertable
under-replicated then `force => true` needs to used.
2020-05-27 17:31:09 +02:00
Matvey Arye
e7ba327f4c Add resolve and heal infrastructure for 2PC
This commit adds the ability to resolve whether or not 2PC
transactions have been committed or aborted and also adds a heal
function to resolve transactions that have been prepared but not
committed or rolled back.

This commit also removes the server id of the primary key on the
remote_txn table and adds another index. This was done because the
`remote_txn_persistent_record_exists` should not rely on the server
being contacted but should rather just check for the existance of the
id. This makes the resolution safe to setups where two frontend server
definitions point to the same database. While this may not be a
properly configured setup, it's better if the resolution process is
robust to this case.
2020-05-27 17:31:09 +02:00
Matvey Arye
0e109d209d Add tables for saving 2pc persistent records
The remote_txn table records commit decisions for 2pc transactions.
A successful 2pc transaction will have one row per remote connection
recorded in this table. In effect it is a mapping between the
distributed transaction and an identifier for each remote connection.

The records are needed to protect against crashes after a
frontend send a `COMMIT TRANSACTION` to one node
but not all nodes involved in the transaction. Towards this end,
the commitment of remote_txn rows represent a crash-safe irrevocable
promise that all participating datanodes will eventually get a `COMMIT
TRANSACTION` and occurs before any datanodes get a `COMMIT TRANSACTION`.

The irrevocable nature of the commit of these records means that this
can only happen after the system is sure all participating transactions
will succeed. Thus it can only happen after all datanodes have succeeded
on a `PREPARE TRANSACTION` and will happen as part of the frontend's
transaction commit..
2020-05-27 17:31:09 +02:00
Matvey Arye
d2b4b6e22e Add remote transaction ID module
The remote transaction ID is used in two phase commit. It is the
identifier sent to the datanodes in PREPARE TRANSACTION and related
postgresql commands.

This is the first in a series of commits for adding two phase
commit support to our distributed txn infrastructure.
2020-05-27 17:31:09 +02:00
Erik Nordström
596be8cda1 Add mappings table for remote chunks
A frontend node will now maintain mappings from a local chunk to the
corresponding remote chunks in a `chunk_server` table.

The frontend creates local chunks as foreign tables and adds entries
to `chunk_server` for each chunk it creates on remote data node.

Currently, the creation of remote chunks is not implemented, so a
dummy chunk_id for the remote chunk will be added instead for testing
purposes.
2020-05-27 17:31:09 +02:00
Erik Nordström
ece582d458 Add mappings table for remote hypertables
In a multi-node (clustering) setup, TimescaleDB needs to track which
remote servers have data for a particular distributed hypertable. It
also needs to know which servers to place new chunks on and to use in
queries against a distributed hypertable.

A new metadata table, `hypertable_server` is added to map a local
hypertable ID to a hypertable ID on a remote server. We require that
the remote hypertable has the same schema and name as the local
hypertable.

When a local server is removed (using `DROP SERVER` or our
`delete_server()`), all remote hypertable mappings for that server
should also be removed.
2020-05-27 17:31:09 +02:00
Sven Klemm
cbda1acd4f Record cagg view state in catalog
Record materialized_only state of continuous aggregate view in
catalog and show state in timescaledb_information.continuous_aggregates.
2020-04-14 06:57:33 +02:00
Matvey Arye
2c594ec6f9 Keep catalog rows for some dropped chunks
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).

If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
2019-12-30 09:10:44 -05:00
Matvey Arye
5eb047413b Allow drop_chunks while keeping continuous aggs
Allow dropping raw chunks on the raw hypertable while keeping
the continuous aggregate. This allows for downsampling data
and allows users to save on TCO. We only allow dropping
such data when the dropped data is older than the
`ignore_invalidation_older_than` parameter on all the associated
continuous aggs. This ensures that any modifications to the
region of data which was dropped should never be reflected
in the continuous agg and thus avoids semantic ambiguity
if chunks are dropped but then again recreated due to an
insert.

Before we drop a chunk we need to make sure to process any
continuous aggregate invalidations that were registed on
data inside the chunk. Thus we add an option to materialization
to perform materialization transactionally, to only process
invalidations, and to process invalidation only before a timestamp.

We fix drop_chunks and policy to properly process
`cascade_to_materialization` as a tri-state variable (unknown,
true, false); Existing policy rows should change false to NULL
(unknown) and true stays as true since it was explicitly set.
Remove the form data for bgw_policy_drop_chunk because there
is no good way to represent the tri-state variable in the
form data.

When dropping chunks with cascade_to_materialization = false, all
invalidations on the chunks are processed before dropping the chunk.
If we are so far behind that even the  completion threshold is inside
the chunks being dropped, we error. There are 2 reasons that we error:
1) We can't safely process new ranges transactionally without taking
   heavy weight locks and potentially locking the entire sytem
2) If a completion threshold is that far behind the system probably has
   some serious issues anyway.
2019-12-30 09:10:44 -05:00
Matvey Arye
08ad7b6612 Add ignore_invalidation_older_than to continuous aggs
We added a timescaledb.ignore_invalidation_older_than parameter for
continuous aggregatess. This parameter accept a time-interval (e.g. 1
month). if set, it limits the amount of time for which to process
invalidation. Thus, if
	timescaledb.ignore_invalidation_older_than = '1 month'
then any modifications for data older than 1 month from the current
timestamp at insert time will not cause updates to the continuous
aggregate. This limits the amount of work that a backfill can trigger.
This parameter must be >= 0. A value of 0 means that invalidations are
never processed.

When recording invalidations for the hypertable at insert time, we use
the maximum ignore_invalidation_older_than of any continuous agg attached
to the hypertable as a cutoff for whether to record the invalidation
at all. When materializing a particular continuous agg, we use that
aggs  ignore_invalidation_older_than cutoff. However we have to apply
that cutoff relative to the insert time not the materialization
time to make it easier for users to reason about. Therefore,
we record the insert time as part of the invalidation entry.
2019-12-04 15:47:03 -05:00
Matvey Arye
122856c1bd Fix update scripts for type functions
Type functions have to be CREATE OR REPLACED on every update
since they need to point to the correct .so. Thus,
split the type definitions into a pre, functions,
and post part and rerun the functions part on both
pre_install and on every update.
2019-11-11 17:10:13 -05:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Matvey Arye
0db50e7ffc Handle drops of compressed chunks/hypertables
This commit add handling for dropping of chunks and hypertables
in the presence of associated compressed objects. If the uncompressed
chunk/hypertable is dropped than drop the associated compressed object
using DROP_RESTRICT unless cascading is explicitly enabled.

Also add a compressed_chunk_id index on compressed tables for
figuring out whether a chunk is compressed or not.

Change a bunch of APIs to use DropBehavior instead of a cascade bool
to be more explicit.

Also test the drop chunks policy.
2019-10-29 19:02:58 -04:00
gayyappan
6e60d2614c Add compress chunks policy support
Add and drop compress chunks policy using bgw
infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
b9674600ae Add segment meta min/max
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.

We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
gayyappan
1f4689eca9 Record chunk sizes after compression
Compute chunk size before/after compressing a chunk and record in
catalog table.
2019-10-29 19:02:58 -04:00
gayyappan
44941f7bd2 Add UI for compress_chunks functionality
Add support for compress_chunks function.

This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.

The parsing code will most likely be changed to use PG raw_parser
function.
2019-10-29 19:02:58 -04:00
gayyappan
1c6aacc374 Add ability to create the compressed hypertable
This happens when compression is turned on for regular hypertables.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
584f5d1061 Implement time-series compression algorithms
This commit introduces 4 compression algorithms
as well as 3 ADTs to support them. The compression
algorithms are time-series optimized. The following
algorithms are implemented:

- DeltaDelta compresses integer and timestamp values
- Gorilla compresses floats
- Dictionary compression handles any data type
  and is optimized for low-cardinality datasets.
- Array stores any data type in an array-like
  structure and does not actually compress it (though
  TOAST-based compression can be applied on top).

These compression algorithms are are fully described in
tsl/src/compression/README.md.

The Abstract Data Types that are implemented are
- Vector - A dynamic vector that can store any type.
- BitArray - A dynamic vector to store bits.
- SimpleHash - A hash table implementation from PG12.

More information can be found in
src/adts/README.md
2019-10-29 19:02:58 -04:00
gayyappan
3edc016dfc Add catalog tables to support compression
This commit adds catalog tables that will be used by the
compression infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
7ea492f29e Add last_successful_finish to bgw_job_stats
This allows people to better monitor the bgw job health. It
indicates when the last time the job made progress was.
2019-10-15 19:14:14 -04:00
Narek Galstyan
62de29987b Add a notion of now for integer time columns
This commit implements functionality for users to give a custom
definition of now() for integer open dimension typed hypertables.
Such a now() function enables us to talk about intervals in the context
of hypertables with integer time columns. In order to simplify future
code. This commit defines a custom ts_interval type that unites the
usual postgres intervals and integer time dimension intervals under a
single composite type.

The commit also enables adding drop chunks policy on hypertables with
integer time dimensions if a custom now() function has been set.
2019-08-19 23:23:28 +04:00
gayyappan
e9df3bc1b6 Fix continuous agg catalog table insert failure
The primary key on continuous_aggs_materialization_invalidation_log
prevents multiple records with the same materialization id. Remove
the primary key to fix this problem.
2019-07-08 14:53:36 -04:00
gayyappan
60cfe6cc90 Support for multiple continuous aggregates
Allow multiple continuous aggregates to be defined on a hypertable.
2019-06-24 17:05:49 -04:00
Brian Rowe
aeac52aef6 Rename telemetry_metadata table to just metadata
This change renames the _timescale_catalog.telemetry_metadata to
_timescale_catalog.metadata.  It also adds a new boolean column to this
table which is used to flag data which should be included in telemetry.

It also renamed the src/telemetry/metadata.{h,c} files to
src/telemetry/telemetry_metadata.{h,c} and updated the API to reflect
this.  Finally it also includes the logic to use the new boolean column
when populating the telemetry parse state.
2019-05-17 17:04:42 -07:00
Joshua Lockerman
899cd0538d Allow scheduled drop_chunks to cascade to aggs
This commit adds a cascade_to_materializations flag to the scheduled
version of drop_chunks that behaves much like the one from manual
drop_chunks: if a hypertable that has a continuous aggregate tries to
drop chunks, and this flag is not set, the chunks will not be dropped.
2019-04-30 15:46:49 -04:00
Joshua Lockerman
3895e5ce0e Add a setting for max an agg materializes per run
Add a setting max_materialized_per_run which can be set to prevent a
continuous aggregate from materializing too much of the table in a
single run. This will prevent a single run from locking the hypertable
for too long, when running on a large data set.
2019-04-26 13:08:00 -04:00
gayyappan
b8f9b91e60 Add user view query definition for cont aggs
Add the query definition to
timescaledb_information.continuous_aggregates.

The user query (specified in the CREATE VIEW stmt of a continuous
aggregate) is transformed in the process of creating a continuous
aggregate and this modified query is saved in the pg_rewrite catalog
tables. In order to display the original query, we create an internal
view which is a replica of the user query. This is used to display the
definition in timescaledb_information.continuous_aggregates.

As an alternative we could save the original user query in our internal
catalogs.  But this approach involves replicating a lot of postgres code
and causes portability problems.
2019-04-26 13:08:00 -04:00
Matvey Arye
dc0e250428 Add pg_dump/restore tests for continuous aggs
The data in caggs needs to survive dump/restore. This
test makes sure that caggs that are materialized both
before and after restore are correct.

Two code changes were necessary to make this work:
1) the valid_job_type constraint on bgw_job needed to be altered to add
'continuous_aggregate' as a valid job type

2) The user_view_query field needed to be changed to a text because
dump/restore does not support pg_node_tree.
2019-04-26 13:08:00 -04:00
Matvey Arye
19d47daf23 Delete related catalog rows when continuous aggs are dropped
This PR deletes related rows from the following tables
* completed_threshold
* invalidation threshold
* hypertable invalidation log

The latter two tables are only affected if no other continuous aggs
exist on the raw hyperatble.

This commit also adds locks to prevent concurrent raw table inserts
and any access to the materialization table when dropping caggs. It
also moves all locks to the beginning of the function so that the lock
order is easier to track and reason about.

Also added a few formatting fixes.
2019-04-26 13:08:00 -04:00
gayyappan
1cbd8c74f7 Add invalidation trigger for continuous aggs
Add invalidation trigger for DML changes to the hypertable used in
the continuous aggregate query.

Also add user_view_query definition in continuous_agg catalog table.
2019-04-26 13:08:00 -04:00
Joshua Lockerman
0737b370a3 Add the actual bgw job for continuous aggregates
This commit adds the the actual background worker job that runs the continuous
aggregate automatically. This job gets created when the continuous aggregate is
created and is deleted when the aggregate is DROPed. By default this job will
attempt to run every two bucket widths, and attempts to materialize up to two
bucket widths behind the end of the table.
2019-04-26 13:08:00 -04:00
David Kohn
f17aeea374 Initial cont agg INSERT/materialization support
This commit adds initial support for the continuous aggregate materialization
and INSERT invalidations.

INSERT path:
  On INSERT, DELETE and UPDATE we log the [max, min] time range that may be
  invalidated (that is, newly inserted, updated, or deleted) to
  _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log
  will be used to re-materialize these ranges, to ensure that the aggregate
  is up-to-date. Currently these invalidations are recorded in by a trigger
  _timescaledb_internal.continuous_agg_invalidation_trigger, which should be
  added to the hypertable when the continuous aggregate is created. This trigger
  stores a cache of min/max values per-hypertable, and on transaction commit
  writes them to the log, if needed. At the moment, we consider them to always
  be needed, unless we're in ReadCommitted mode or weaker, and the min
  invalidated value is greater than the hypertable's invalidation threshold
  (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold)

Materialization path:
  Materialization currently happens in multiple phase: in phase 1 we determine
  the timestamp at which we will end the new set of materializations, then we
  update the hypertable's invalidation threshold to that point, and finally we
  read the current invalidations, then materialize any invalidated rows, the new
  range between the continuous aggregate's completed threshold (found in
  _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's
  invalidation threshold. After all of this is done we update the completed
  threshold to the invalidation threshold. The portion of this protocol from
  after the invalidations are read, until the completed threshold is written
  (that is, actually materializing, and writing the completion threshold) is
  included with this commit, with the remainder to follow in subsequent ones.
  One important caveat is that since the thresholds are exclusive, we invalidate
  all values _less_ than the invalidation threshold, and we store timevalue
  as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is
  invalidated. To avoid this problem, we never materialize the time bucket
  containing PG_INT64_MAX.
2019-04-26 13:08:00 -04:00
gayyappan
2dbc28df82 Create base infrastructure for continuous aggs
This PR adds a catalog table for storing metadata about
continuous aggregates. It also adds code for creating the
materialization hypertable and 2 views that are used by the
continuous aggregate system:

1) The user view - This is the actual view queried by the enduser.
   It is a query on top of the materialized hypertable and is
   responsible for finalizing and combining partials in a manner
   that return to the user the data as defined by the original
   user-defined view.
2) The partial view - which queries the raw table and returns
   columns as defined in the materialized table. This will be used
   by the materializer to calculate the data that will be inserted
   into the materialization table. Note the data here is the partial
   state of any aggregates.
2019-04-26 13:08:00 -04:00
Sven Klemm
7961fc77e9 Rename installation_metadata to telemetry_metadata 2019-04-15 21:44:10 +02:00
Sven Klemm
f89fd07c5b Remove year from SQL file license text
This changes the license text for SQL files to be identical
with the license text for C files.
2019-01-13 23:30:22 +01:00
Joshua Lockerman
4e1e15f079 Add reorder command
New cluster-like command which writes to a new index than swaps,
much like is done for the data table, and only acquires
exclusive locks for said swap. This trades off disk usage for
lower contention: we hold locks for a much lower period of time,
allowing reads to work concurrently, but we have both the old
and new versions of the table existing at once, approximately
doubling storage usage while reorder is running.

Currently only works on chunks.
2019-01-02 15:43:48 -05:00
Amy Tai
be7c74cdf3 Add logic for automatic DB maintenance functions
This commit adds logic for manipulating internal metadata tables used for enabling users to schedule automatic drop_chunks and recluster policies. This commit includes:

- SQL for creating policy tables and chunk stats table
- Catalog code and C code for accessing these three tables programatically
- Implement and expose new user API functions:  add_*_policy and remove_*_policy
- Stub scheduler logic for running the policies
2019-01-02 15:43:48 -05:00
Joshua Lockerman
e06733acf0 Fix casing in SQL license header to be consistent with elsewhere 2018-11-15 15:18:58 -05:00
Joshua Lockerman
20ec6914c0 Add license headers to SQL files and test code 2018-10-29 13:28:19 -04:00
David Kohn
9ccda0df00 Start stopped workers on restart message
Modify the restart action to start schedulers if they do not exist, this fixes a
potential race condition where a scheduler could be started for a given
database, but before it has shut down (because the extension does not exist) a
create extension command is run, the start action then would not change the
state of the worker but it would be waiting on the wrong vxid, so not see that
the extension exists. This also makes it so the start action can be truly
idempotent and not set the vxid on its startup, thereby respecting any restart
action that has taken place before and better defining how each interacts with
the system.

Additionally, we decided that the previous behavior in which launchers were not
started up on, say, alter extension update actions was not all that desirable as
it worked if the stop action had happened, but the database had not restarted,
if the database restarted, then the stop action would have no effect. We decided
that if we desire the ability to disable schedulers for a particular database,
we will implement it in the future as a standalone feature that takes effect
across server restarts rather than having somewhat ill-defined behavior with an
implicit feature of the stop action.
2018-10-18 11:28:20 -04:00
Sven Klemm
248f6621e4 Fix pg_dump for unprivileged users
When timescaledb is installed in template1 and a user with only createdb
privileges creates a database, the user won't be able to dump the
database because of lacking permissions. This patch grants the missing
permissions to PUBLIC for pg_dump to succeed.

We need to grant SELECT to PUBLIC for all tables even those not
marked as being dumped because pg_dump will try to access all
tables initially to detect inheritance chains and then decide
which objects actually need to be dumped.
2018-09-26 18:04:11 +02:00
Erik Nordström
18b8068ad7 Remove unnecessary index on dimension metadata table
The `dimension` metadata table had both a `(hypertable_id)` and a
`UNIQUE(hypertable_id, column_name)` index. Having only the latter
index should suffice.

This change removes the unnecessary index, which will save some space,
and make the schema more clear.
2018-09-24 13:24:22 +02:00
Matvey Arye
f662ae1191 Add telemetry job and turn off default jobs in tests
This adds the telemetry job to the job scheduler. Telemetry is
scheduled to run every 24 hours with a 1 hour exponential backoff
retry period. Additional fixes related to the  telemetry job:

- Add separate ID space to the bgw_job table for default jobs. We do not dump this ID space inside pg_dump to prevent job insertion conflicts.
- Add check to update scripts for default jobs.
- Change shmem_callback so that it doesn't modify state since state transitions are not atomic with respect to interrupts and interrupt callbacks.
- Disable default telemetry job in regression and docker tests.
2018-09-10 13:29:59 -04:00
Erik Nordström
ebe0915669 Refactor telemetry and fixes
The installation metadata has been refactored:

- The installation metadata store now takes Datums of any
  type as input and output
- Move metadata functions from uuid.c -> metadata.c
- Make metadata functions return native types rather than text,
  including for tests

Telemetry tests for ssl and nossl have been combined.

Note that PG 9.6 does not have pg_backend_random() that gives us a
secure random numbers for UUIDs that we send in telemetry. Therefore,
we fall back to the generating the UUID from the timestamp if we are
on PG 9.6.

This change also fixes a number of test issues. For instance, in the
telemetry test the escape char `E` was passed on as part of the
response string when set as a variable with `\set`. This was not
detected before because the response parser didn't parse the start of
the response properly.

A number of fixes have been made to the formatting of log messages for
telemetry to conform to the PostgreSQL standard as well as being
consistent with other messages.

Numerous build issues on Windows have been resolved. There is also new
functionality to get OS version info on Windows (for telemetry),
including a SQL function get_os_info() to retrieve this information.

The net library will now allow connecting to a servicename, e.g., http
or https. A port is resolved from this service name via getaddrinfo().
An explicit port can still be given, and it that case it will not
resolve the service name.

Databases the are updated to the new version of the extension will
have an install_timestamp in their installation metadata that does not
reflect the actual original install date of the extension. To be able
to distinguish these updated installations from those that are freshly
installed, we add a bogus "epoch" install_timestamp in the update
script.

Parsing of the version string in the telemetry response has been
refactored to be more amenable to testing. Tests have been added.
2018-09-10 13:29:59 -04:00
Amy Tai
faf481b061 Add telemetry functionality
Adding the telemetry BGW and all auxiliary functions, such as generating a UUID, creating the internal metadata
table for storing UUIDs, and parsing the server-side response with the latest version of TimescaleDB.
2018-09-10 13:29:59 -04:00
Matvey Arye
5d8c7cc6f6 Add a scheduler for background jobs
TimescaleDB will want to run multiple background jobs. This PR
adds a simple scheduler so that jobs inserted into a jobs table
could be run on a schedule. This first implementation has two limitations:

1) The list of jobs to be run is read from the database when the scheduler
is first started. We do not update this list if the jobs table changes.

2) There is no prioritization for when to run jobs.

There design of the scheduler is as follows:
The scheduler itself is a background job that continuously runs and waits
for a time when  jobs need to be scheduled. It then launches jobs as new
background workers that it controls through the background worker handle.

Aggregate statistics about a job are kept in the job_stat catalog table.
These statistics include the start and finish times of the last run of the job
as well as whether or not the job succeeded. The next_start is used to
figure out when next to run a job after a scheduler is restarted.

The statistics table also tracks consecutive failures and crashes for the job
which is used for calculating the exponential backoff after a crash or failure
(which is used to set the next_start after the crash/failure). Note also that
there is a minimum time after the db scheduler starts up and a crashed job
is restarted. This is to allow the operator enough time to disable the job
if needed.

Note that the number of crashes is an overestimate of the actual number of crashes
for a job. This is so that we are conservative and never miss a crash and fail to
use the appropriate backoff logic.  Note that there is some complexity
in ensuring that all crashes are counted since a crash in Postgres causes /all/
processes to SIGQUIT: we must commit changes to the stats
table /before/ a job starts so that we can then deduce after a job has crashed
and the scheduler comes back up that a job was started, and not finished before
the crash (meaning that it could have been the crashing process).
2018-09-10 13:29:59 -04:00
David Kohn
55a7141953 Implement a cluster-wide launcher for background workers
The launcher controls how Timescale DB schedulers for each database are stopped/started
both at server start time and if they are started or stopped while the server is running
which can happen when, say, an update of the extension is performed.
Includes tests for multiple types of behavior within the launcher, but only a mock for the
db schedulers which will be dealt with in future commits. This launcher code is mostly in the loader,
as such it must remain backwards compatible for the foreseeable future, so significant thought and design
has gone into making interactions with this code well defined and consistent so that maintaining
backwards compatibility is relatively easy.
2018-09-10 13:29:59 -04:00