timescaledb

mirror of https://github.com/timescale/timescaledb.git synced 2025-05-16 18:43:18 +08:00

Author	SHA1	Message	Date
Joshua Lockerman	899cd0538d	Allow scheduled drop_chunks to cascade to aggs This commit adds a cascade_to_materializations flag to the scheduled version of drop_chunks that behaves much like the one from manual drop_chunks: if a hypertable that has a continuous aggregate tries to drop chunks, and this flag is not set, the chunks will not be dropped.	2019-04-30 15:46:49 -04:00
Joshua Lockerman	3895e5ce0e	Add a setting for max an agg materializes per run Add a setting max_materialized_per_run which can be set to prevent a continuous aggregate from materializing too much of the table in a single run. This will prevent a single run from locking the hypertable for too long, when running on a large data set.	2019-04-26 13:08:00 -04:00
gayyappan	b8f9b91e60	Add user view query definition for cont aggs Add the query definition to timescaledb_information.continuous_aggregates. The user query (specified in the CREATE VIEW stmt of a continuous aggregate) is transformed in the process of creating a continuous aggregate and this modified query is saved in the pg_rewrite catalog tables. In order to display the original query, we create an internal view which is a replica of the user query. This is used to display the definition in timescaledb_information.continuous_aggregates. As an alternative we could save the original user query in our internal catalogs. But this approach involves replicating a lot of postgres code and causes portability problems.	2019-04-26 13:08:00 -04:00
Matvey Arye	dc0e250428	Add pg_dump/restore tests for continuous aggs The data in caggs needs to survive dump/restore. This test makes sure that caggs that are materialized both before and after restore are correct. Two code changes were necessary to make this work: 1) the valid_job_type constraint on bgw_job needed to be altered to add 'continuous_aggregate' as a valid job type 2) The user_view_query field needed to be changed to a text because dump/restore does not support pg_node_tree.	2019-04-26 13:08:00 -04:00
Matvey Arye	19d47daf23	Delete related catalog rows when continuous aggs are dropped This PR deletes related rows from the following tables * completed_threshold * invalidation threshold * hypertable invalidation log The latter two tables are only affected if no other continuous aggs exist on the raw hyperatble. This commit also adds locks to prevent concurrent raw table inserts and any access to the materialization table when dropping caggs. It also moves all locks to the beginning of the function so that the lock order is easier to track and reason about. Also added a few formatting fixes.	2019-04-26 13:08:00 -04:00
gayyappan	1cbd8c74f7	Add invalidation trigger for continuous aggs Add invalidation trigger for DML changes to the hypertable used in the continuous aggregate query. Also add user_view_query definition in continuous_agg catalog table.	2019-04-26 13:08:00 -04:00
Joshua Lockerman	0737b370a3	Add the actual bgw job for continuous aggregates This commit adds the the actual background worker job that runs the continuous aggregate automatically. This job gets created when the continuous aggregate is created and is deleted when the aggregate is DROPed. By default this job will attempt to run every two bucket widths, and attempts to materialize up to two bucket widths behind the end of the table.	2019-04-26 13:08:00 -04:00
David Kohn	f17aeea374	Initial cont agg INSERT/materialization support This commit adds initial support for the continuous aggregate materialization and INSERT invalidations. INSERT path: On INSERT, DELETE and UPDATE we log the [max, min] time range that may be invalidated (that is, newly inserted, updated, or deleted) to _timescaledb_catalog.continuous_aggs_hypertable_invalidation_log. This log will be used to re-materialize these ranges, to ensure that the aggregate is up-to-date. Currently these invalidations are recorded in by a trigger _timescaledb_internal.continuous_agg_invalidation_trigger, which should be added to the hypertable when the continuous aggregate is created. This trigger stores a cache of min/max values per-hypertable, and on transaction commit writes them to the log, if needed. At the moment, we consider them to always be needed, unless we're in ReadCommitted mode or weaker, and the min invalidated value is greater than the hypertable's invalidation threshold (found in _timescaledb_catalog.continuous_aggs_invalidation_threshold) Materialization path: Materialization currently happens in multiple phase: in phase 1 we determine the timestamp at which we will end the new set of materializations, then we update the hypertable's invalidation threshold to that point, and finally we read the current invalidations, then materialize any invalidated rows, the new range between the continuous aggregate's completed threshold (found in _timescaledb_catalog.continuous_aggs_completed_threshold) and the hypertable's invalidation threshold. After all of this is done we update the completed threshold to the invalidation threshold. The portion of this protocol from after the invalidations are read, until the completed threshold is written (that is, actually materializing, and writing the completion threshold) is included with this commit, with the remainder to follow in subsequent ones. One important caveat is that since the thresholds are exclusive, we invalidate all values _less_ than the invalidation threshold, and we store timevalue as an int64 internally, we cannot ever determine if the row at PG_INT64_MAX is invalidated. To avoid this problem, we never materialize the time bucket containing PG_INT64_MAX.	2019-04-26 13:08:00 -04:00
gayyappan	2dbc28df82	Create base infrastructure for continuous aggs This PR adds a catalog table for storing metadata about continuous aggregates. It also adds code for creating the materialization hypertable and 2 views that are used by the continuous aggregate system: 1) The user view - This is the actual view queried by the enduser. It is a query on top of the materialized hypertable and is responsible for finalizing and combining partials in a manner that return to the user the data as defined by the original user-defined view. 2) The partial view - which queries the raw table and returns columns as defined in the materialized table. This will be used by the materializer to calculate the data that will be inserted into the materialization table. Note the data here is the partial state of any aggregates.	2019-04-26 13:08:00 -04:00
Sven Klemm	7961fc77e9	Rename installation_metadata to telemetry_metadata	2019-04-15 21:44:10 +02:00
Sven Klemm	f89fd07c5b	Remove year from SQL file license text This changes the license text for SQL files to be identical with the license text for C files.	2019-01-13 23:30:22 +01:00
Joshua Lockerman	4e1e15f079	Add reorder command New cluster-like command which writes to a new index than swaps, much like is done for the data table, and only acquires exclusive locks for said swap. This trades off disk usage for lower contention: we hold locks for a much lower period of time, allowing reads to work concurrently, but we have both the old and new versions of the table existing at once, approximately doubling storage usage while reorder is running. Currently only works on chunks.	2019-01-02 15:43:48 -05:00
Amy Tai	be7c74cdf3	Add logic for automatic DB maintenance functions This commit adds logic for manipulating internal metadata tables used for enabling users to schedule automatic drop_chunks and recluster policies. This commit includes: - SQL for creating policy tables and chunk stats table - Catalog code and C code for accessing these three tables programatically - Implement and expose new user API functions: add__policy and remove__policy - Stub scheduler logic for running the policies	2019-01-02 15:43:48 -05:00
Joshua Lockerman	e06733acf0	Fix casing in SQL license header to be consistent with elsewhere	2018-11-15 15:18:58 -05:00
Joshua Lockerman	20ec6914c0	Add license headers to SQL files and test code	2018-10-29 13:28:19 -04:00
Sven Klemm	248f6621e4	Fix pg_dump for unprivileged users When timescaledb is installed in template1 and a user with only createdb privileges creates a database, the user won't be able to dump the database because of lacking permissions. This patch grants the missing permissions to PUBLIC for pg_dump to succeed. We need to grant SELECT to PUBLIC for all tables even those not marked as being dumped because pg_dump will try to access all tables initially to detect inheritance chains and then decide which objects actually need to be dumped.	2018-09-26 18:04:11 +02:00
Erik Nordström	18b8068ad7	Remove unnecessary index on dimension metadata table The `dimension` metadata table had both a `(hypertable_id)` and a `UNIQUE(hypertable_id, column_name)` index. Having only the latter index should suffice. This change removes the unnecessary index, which will save some space, and make the schema more clear.	2018-09-24 13:24:22 +02:00
Matvey Arye	f662ae1191	Add telemetry job and turn off default jobs in tests This adds the telemetry job to the job scheduler. Telemetry is scheduled to run every 24 hours with a 1 hour exponential backoff retry period. Additional fixes related to the telemetry job: - Add separate ID space to the bgw_job table for default jobs. We do not dump this ID space inside pg_dump to prevent job insertion conflicts. - Add check to update scripts for default jobs. - Change shmem_callback so that it doesn't modify state since state transitions are not atomic with respect to interrupts and interrupt callbacks. - Disable default telemetry job in regression and docker tests.	2018-09-10 13:29:59 -04:00
Erik Nordström	ebe0915669	Refactor telemetry and fixes The installation metadata has been refactored: - The installation metadata store now takes Datums of any type as input and output - Move metadata functions from uuid.c -> metadata.c - Make metadata functions return native types rather than text, including for tests Telemetry tests for ssl and nossl have been combined. Note that PG 9.6 does not have pg_backend_random() that gives us a secure random numbers for UUIDs that we send in telemetry. Therefore, we fall back to the generating the UUID from the timestamp if we are on PG 9.6. This change also fixes a number of test issues. For instance, in the telemetry test the escape char `E` was passed on as part of the response string when set as a variable with `\set`. This was not detected before because the response parser didn't parse the start of the response properly. A number of fixes have been made to the formatting of log messages for telemetry to conform to the PostgreSQL standard as well as being consistent with other messages. Numerous build issues on Windows have been resolved. There is also new functionality to get OS version info on Windows (for telemetry), including a SQL function get_os_info() to retrieve this information. The net library will now allow connecting to a servicename, e.g., http or https. A port is resolved from this service name via getaddrinfo(). An explicit port can still be given, and it that case it will not resolve the service name. Databases the are updated to the new version of the extension will have an install_timestamp in their installation metadata that does not reflect the actual original install date of the extension. To be able to distinguish these updated installations from those that are freshly installed, we add a bogus "epoch" install_timestamp in the update script. Parsing of the version string in the telemetry response has been refactored to be more amenable to testing. Tests have been added.	2018-09-10 13:29:59 -04:00
Amy Tai	faf481b061	Add telemetry functionality Adding the telemetry BGW and all auxiliary functions, such as generating a UUID, creating the internal metadata table for storing UUIDs, and parsing the server-side response with the latest version of TimescaleDB.	2018-09-10 13:29:59 -04:00
Matvey Arye	5d8c7cc6f6	Add a scheduler for background jobs TimescaleDB will want to run multiple background jobs. This PR adds a simple scheduler so that jobs inserted into a jobs table could be run on a schedule. This first implementation has two limitations: 1) The list of jobs to be run is read from the database when the scheduler is first started. We do not update this list if the jobs table changes. 2) There is no prioritization for when to run jobs. There design of the scheduler is as follows: The scheduler itself is a background job that continuously runs and waits for a time when jobs need to be scheduled. It then launches jobs as new background workers that it controls through the background worker handle. Aggregate statistics about a job are kept in the job_stat catalog table. These statistics include the start and finish times of the last run of the job as well as whether or not the job succeeded. The next_start is used to figure out when next to run a job after a scheduler is restarted. The statistics table also tracks consecutive failures and crashes for the job which is used for calculating the exponential backoff after a crash or failure (which is used to set the next_start after the crash/failure). Note also that there is a minimum time after the db scheduler starts up and a crashed job is restarted. This is to allow the operator enough time to disable the job if needed. Note that the number of crashes is an overestimate of the actual number of crashes for a job. This is so that we are conservative and never miss a crash and fail to use the appropriate backoff logic. Note that there is some complexity in ensuring that all crashes are counted since a crash in Postgres causes /all/ processes to SIGQUIT: we must commit changes to the stats table /before/ a job starts so that we can then deduce after a job has crashed and the scheduler comes back up that a job was started, and not finished before the crash (meaning that it could have been the crashing process).	2018-09-10 13:29:59 -04:00
David Kohn	55a7141953	Implement a cluster-wide launcher for background workers The launcher controls how Timescale DB schedulers for each database are stopped/started both at server start time and if they are started or stopped while the server is running which can happen when, say, an update of the extension is performed. Includes tests for multiple types of behavior within the launcher, but only a mock for the db schedulers which will be dealt with in future commits. This launcher code is mostly in the loader, as such it must remain backwards compatible for the foreseeable future, so significant thought and design has gone into making interactions with this code well defined and consistent so that maintaining backwards compatibility is relatively easy.	2018-09-10 13:29:59 -04:00

22 Commits