This patch refactors the source code so that a bunch of unrelated code
for the planner, process utilities and transaction management, which
was previously located in the common file timescaledb.c, is now broken
up into separate source files.
The hypertable cache stores negative cache entries to speed up checks
for tables that aren't hypertables. However, a full hypertable cache
entry was allocated for these negative entries, thus wasting cache
storage. With this update, the full entry is only allocated for
positive entries that actually represent hypertables.
Previously, the planner used a direct query via the SPI interface to
retrieve metadata info needed for query planner functions like query
rewriting. This commit updates the planner to use our caching system.
This is a performance improvement for pretty much all operations,
both data modifications and queries.
For hypertables, this added a cache keyed by the main table OID and
added negative entries (because the planner often needs to know if a
table is /not/ a hypertable).
Allow non-superusers to work with the db (but also mess up the catalog)
Approved-by: RobAtticus NA
Approved-by: Olof Rensfelt
Approved-by: ci-vast
Approved-by: enordstr NA
Previous to this commit non-superusers could not do anything inside
a database with the timescale extension loaded. Now, non-superuser
can create their own hypertables and work inside the db. There are
two big caveats:
1) All users have read/write permissions to the timescaledb
catalog.
2) Permission changes applied to the main tables are not
propagated to the associated tables.
This patch continues work to consolidate catalog information
and definitions to the catalog module.
It also refactors the naming of some data types to adhere
to camelcase naming convention (Hypertable, PartitionEpoch).
This patch refactors the insert path to use insert triggers
instead of a temporary copy table. The copy table previously
served as an optimization for big batches where the cost of
inserting tuple-by-tuple into chunks was amortized by inserting
all tuples for a specific chunk in one insert. However, to avoid
deadlocks the tuples also had to inserted in a specific chunk
order, requiring adding an index to the copy table.
With trigger insertion, tuples are instead collected over a batch
into a sorting state, which is sorted in an "after" trigger. This
removes the overhead of the copy table and index. It also provides
a fast-path for single-tuple batches that avoids doing sorting
altogether.
Currently TimescaleDB does not close chunks mid-insert, so large
batches will overfill a chunk. This commit adds a script to
split large CSV files into smaller batches to allow reasonable
closing of chunks.
Previously, chunk replicas were retreived with an SPI query. Now, all
catalog items are retrieved with native scans, with the exception of
newly created chunks.
This commit also refactors the chunk (replica) cache, removing some
data structures that were duplicating information. Now chunks are
cached by their ID (including their replicas) instead of just the
set of replicas. This removes the need for additional data structures,
such as the replica set which looked like a chunk minus time info,
and the cache entry wrapper struct. Another upside is that chunks
can be retrieved from the cache directly by ID.
This change is a performance improvement. Previously each insert called
a plpgsql function to check if there is a need to close the chunk. This
patch implements a c-only fastpath for the case when the table size is
less than the configured chunk size.