If a lot of chunks are involved then the current pl/pgsql function
to compute the size of each chunk via a nested loop is pretty slow.
Additionally, the current functionality makes a system call to get the
file size on disk for each chunk everytime this function is called.
That again slows things down. We now have an approximate function which
is implemented in C to avoid the issues in the pl/pgsql function.
Additionally, this function also uses per backend caching using the
smgr layer to compute the approximate size cheaply. The PG cache
invalidation clears off the cached size for a chunk when DML happens
into it. That size cache is thus able to get the latest size in a
matter of minutes. Also, due to the backend caching, any long running
session will only fetch latest data for new or modified chunks and can
use the cached data (which is calculated afresh the first time around)
effectively for older chunks.
The approximate_row_count function was using the reltuples from
compressed chunks and multiplying that with 1000 which is the default
batch size. This was leading to a huge skew between the actual row
count and the approximate one. We now use the numrows_pre_compression
value from the timescaledb catalog which accurately represents the
number of rows before the actual compression.
In ae21ee96 we fixed a race condition when running a query to get the
hypertable sizes and one or more chunks was dropped in a concurrent
session leading to exception because the chunks does not exist.
In fact the table lock introduced is useless because we also added
proper joins with Postgres catalog tables to ensure that the relation
exists in the database when calculating the sizes. And even worse with
this table lock now dropping chunks wait for the functions that
calculate the hypertable sizes.
Fixed it by removing the useless table lock and also added isolation
tests to make sure we'll not end up with race conditions again.
The approximate_row_count function is executed directly on the user view
instead of corresponding materialized hypertable which returns 0 for
caggs. The function is updated to fetch the details for materialized
hypertable information corresponding to cagg and then get the
approximate_row_count for the materialized hypertable.
Fixes#6051
To increase schema security we do not want to mix our own internal
objects with user objects. Since chunks are created in the
_timescaledb_internal schema our internal functions should live in
a different dedicated schema. This patch make the necessary
adjustments for the following functions:
- relation_size(regclass)
- data_node_hypertable_info(name, name, name)
- data_node_chunk_info(name, name, name)
- hypertable_local_size(name, name)
- hypertable_remote_size(name, name)
- chunks_local_size(name, name)
- chunks_remote_size(name, name)
- range_value_to_pretty(bigint, regtype)
- get_approx_row_count(regclass)
- data_node_compressed_chunk_stats(name, name, name)
- compressed_chunk_local_stats(name, name)
- compressed_chunk_remote_stats(name, name)
- indexes_local_size(name, name)
- data_node_index_size(name, name, name)
- indexes_remote_size(name, name, name)
To increase schema security we do not want to mix our own internal
objects with user objects. Since chunks are created in the
_timescaledb_internal schema our internal functions should live in
a different dedicated schema. This patch make the necessary
adjustments for the following functions:
- to_unix_microseconds(timestamptz)
- to_timestamp(bigint)
- to_timestamp_without_timezone(bigint)
- to_date(bigint)
- to_interval(bigint)
- interval_to_usec(interval)
- time_to_internal(anyelement)
- subtract_integer_from_now(regclass, bigint)
This patch adds support to pass continuous aggregate names to
`chunk_detailed_size` to align it to the behavior of other functions
such as `show_chunks`, `drop_chunks`, `hypertable_size`.
Internal Server Error when loading Explorer tab (SDC #995)
This is with reference to a weird scenarios where chunk table entry exist in
timescaledb catalog but it does not exist in PG catalog. The stale entry blocks
executing hypertable_size function on the hypertable.
The changes in this patch are related to improvements suggested for
hypertable_size function which involves:
1. Locking the hypertable in ACCESS SHARE mode in function hypertable_size to
avoid risk of chunks being dropped by another concurrent process.
2. Joining the hypertable and inherited chunk tables with "pg_class" to make
sure that a stale table without an entry is pg_catalog is not included as part
of hypertable size calculation.
3. An additional filter (schema_name) is required on pg_class to avoid
calculating size of multiple hypertables with same in different schema.
NOTE: With this change calling hypertable_size function will require select
privilege on the table.
Disable-check: force-changelog-file
During chunk creation, the chunk's dimensional CHECK constraints are
created via an "upcall" to PL/pgSQL code. However, creating
dimensional constraints in PL/pgSQL code sometimes fails, especially
during high-concurrency inserts, because PL/pgSQL code scans metadata
using a snapshot that might not see the same metadata as the C
code. As a result, chunk creation sometimes fail during constraint
creation.
To fix this issue, implement dimensional CHECK-constraint creation in
C code. Other constraints (FK, PK, etc.) are still created via an
upcall, but should probably also be rewritten in C. However, since
these constraints don't depend on recently updated metadata, this is
left to a future change.
Fixes#5456
During the compression autovacuum use to be disabled for uncompressed
chunk and enable after decompression. This leads to postgres
maintainence issue. Let's not disable autovacuum for uncompressed
chunk anymore. Let postgres take care of the stats in its natural way.
Fixes#309
This small patch adds support for continuous aggregates to the
`hypertable_detailed_size` (and with that `hypertable_size`).
It adds an additional check to see if a continuous aggregate exists
if a hypertable with the given regclass name isn't found.
Changed queries to use LATERAL join on size functions and views instead
of CTEs and it eliminate a lot of unnecessary projections and give a
chance for the planner to push-down predicates.
Closes#4775
Postgres will prepend pg_temp to the effective search_path if it
is not present in the search_path. While pg_temp will never be
used to look up functions or operators unless explicitly requested
pg_temp will be used to look up relations. Putting pg_temp in
search_path makes sure objects in pg_temp will be considered last
and pg_temp cannot be used to mask existing objects.
Reorganize the code and fix minor bug that was not computing the size
of FSM, VM and INIT forks of the parent hypertable.
Fixed the bug by exposing the `ts_relation_size` function to the SQL
level to encapsulate the logic to compute `heap`, `indexes` and `toast`
sizes.
This patch locks down search_path in extension install and update
scripts to only contain pg_catalog, this requires that any reference
in those scripts is fully qualified. Additionally we add explicit
create commands to all update scripts for objects added to the
public schema. This change will make update scripts fail if a
function with identical signature already exists when installing
or upgrading instead reusing the existing object.
Size utility functions, such as `hypertable_size()`, excluded
non-responding data nodes from size calculations, which led to the
functions succeeding but returning the wrong size information. To
avoid reporting confusing numbers, it is better to fail.
This change updates the SQL queries for the relevant functions to no
longer exclude non-responding data nodes and also adds a TAP test to
illustrate the error when data nodes are not responding.
Fixes#3713
Simplify the CTE to recursively inspect all partitions of a relation
and calculate the sum of `pg_class.reltuples` taking in account the
differences introduced by PG14.
Rewrite approximate_row_count to SQL instead of PLpgSQL and remove
superfluous JOINs against pg_namespace. Adjust tuple calculation
for PG14 since in PG14 reltuples for partitioned tables is the sum
of it's children so we need to exclude those from calculation to
not doublecount.
The view uses cached information from compression_chunk_size to
report the size of compressed chunks. Since compressed chunks
can be modified, we call pg_relation_size on the compressed chunk
while reporting the size
The view also incorrectly used the hypertable's reltoastrelid to
calculate toast bytes. It has been changed to use the chunk's
reltoastrelid.
Fix a number of issues with size and stats functions:
* Return `0` size instead of `NULL` in several functions when
hypertables have no chunks (e.g., `hypertable_size`,
`hypertable_detailed_size`).
* Return `NULL` when functions are called on non-hypertables instead
of simply failing with generic error `query returned no rows`.
* Include size of "root" hypertable, which can have non-zero size
indexes and other objects even if the root table holds no data.
* Make `hypertable_detailed_size` include one additional row for
storage size of objects on the access node. While the access node
stores no data, the empty hypertable may still take up some disk
space.
* Improve test coverage for all size utility functions. In particular,
add tests on regular tables as well as empty and compressed
hypertables.
* Several size utility functions that were defined as `PL/pgSQL`
functions have been converted to simple `SQL` functions since they
ran only a single SQL query.
The `dist_util` test is moved to the solo test group because,
otherwise, it gives different size output when run in parallel vs. in
isolation.
Fixes#2871
Renaming the parameter `hypertable_or_cagg` in functions `drop_chunks`
and `show_chunks` to `relation` and changing parameter name from
`main_table` to `hypertable` or `relation` depending on context.
This change renames function to approximate_row_count() and adds
support for regular tables. Return a row count estimate for a
table instead of a table list.
The timescale clustering code so far has been written referring to the
remote databases as 'servers'. This terminology is a bit overloaded,
and in particular we don't enforce any network topology limitations
that the term 'server' would suggest. In light of this we've decided
to change to use the term 'node' when referring to the different
databases in a distributed database. Specifically we refer to the
frontend as an 'access node' and to the backends as 'data nodes',
though we may omit the access or data qualifier where it's unambiguous.
As the vast bulk of the code so far has been written for the case where
there was a single access node, almost all instances of 'server' were
references to data nodes. This change has updated the code to rename
those instances.
This change adds a new utility function for postgres
`server_hypertable_info`. This function will contact a provided node
and pull down the space information for all the distributed hypertables
on that node.
Additionally, a new view `distributed_server_info` has been added to
timescaledb_information. This view leverages the new
remote_hypertable_data function to display a list of nodes, along with
counts of tables, chunks, and total bytes used by distributed data.
Finally, this change also adds a `hypertable_server_relation_size`
function, which, given the name of a distributed hypertable, will print
the space information for that hypertable on each node of the
distributed database.
Function hypertable_relation_size includes chunks that were dropped
which causes a failure when looking up the size of dropped chunks.
This patch adds a constraint to ignore dropped chunks when determining
the size of the hypertable.
A bug in the SQL for getting the size of chunks would use the
TOAST size of the main/dummy table as the toast size for the
chunks rather than each chunks' own toast size.
Getting an approximate row count for a hypertable involves getting
estimates for all of its chunks rather than just looking up a
single value in the catalog tables. This PR provides a convenience
function for doing the JOINs/summing.
We now use INT64_MAX and INT64_MIN as the max and min values for
dimension_slice ranges. If a dimension_slice has a range_start of
INT64_MIN or the range_end is INT64_MAX, we remove the corresponding
check constraint on the chunk since it signifies that this end of the
range is infinite. Closed ranges now always have INT64_MIN as range_end
of first slice and range_end of INT64_MAX for the last slice.
Also, points corresponding to INT64_MAX are always
put in the same slice as INT64_MAX-1 to avoid problems with the
semantics that coordinate < range_end.
This change refactors the chunk index handling to make better use
of standard PostgreSQL catalog information, while removing the
hypertable_index metadata table and associated triggers, including
those on the chunk_index table. The chunk_index table itself is
also simplified.
A benefit of this refactoring is that indexes are no longer
created using string mangling to construct the CREATE INDEX command
for a chunk, based on the string definition of the hypertable
index. Instead, indexes are created in C using proper index-related
internal data structures.
Chunk indexes can now also be renamed and are added in the parent
index tablespace. Changing tablespace on a hypertable index also
recurses to chunks, as expected. Default indexes that are added when
creating a hypertable use the hypertable's tablespace.
Creating Hypertable indexes with the CONCURRENTLY modifier is
currently blocked, due to unclear semantics regarding concurrent
creation over many tables, including how to deal with snapshots.
The hypertable, chunk, and index size functions are
now split into main function and a corresponding ´pretty´
function. In chunk_relation_size_pretty() the ranges are
now converted into a human readable form when they are time types.