56 Commits

Author SHA1 Message Date
Matvey Arye
2c594ec6f9 Keep catalog rows for some dropped chunks
If a chunk is dropped but it has a continuous aggregate that is
not dropped we want to preserve the chunk catalog row instead of
deleting the row. This is to prevent dangling identifiers in the
materialization hypertable. It also preserves the dimension slice
and chunk constraints rows for the chunk since those will be necessary
when enabling this with multinode and is necessary to recreate the
chunk too. The postgres objects associated with the chunk are all
dropped (table, constraints, indexes).

If data is ever reinserted to the same data region, the chunk is
recreated with the same dimension definitions as before. The postgres
objects are simply recreated.
2019-12-30 09:10:44 -05:00
Matvey Arye
d9d1a44d2e Refactor chunk handling to separate out stub
Previously, the Chunk struct was used to represent both a full
chunk and the stub used for joins. The stub used for joins
only contained valid values for some chunk fields and not others.
After the join determined that a Chunk was complete, it filled
in the rest of the chunk field. The fact that a chunk could have
only some fields filled out and not others at different times,
made the code hard to follow and error prone.

So we separate out the stub state of the chunk into a separate
struct that doesn't contain the not-filled-out fields inside
of it.  This leverages the type system to prevent errors that
try to access invalid fields during the join phase and makes
the code easier to follow.
2019-12-06 15:04:51 -05:00
Joshua Lockerman
48ef701fa9 Set toast_tuple_target to 128B when able
We want compressed data to be stored out-of-line whenever possible so
that the headers are colocated and scans on the metadata and segmentbys
are cheap. This commit lowers toast_tuple_target to 128 bytes, so that
more tables will have this occur; using the default size, very often a
non-trivial portion of the data ends up in the main table, and only
very few rows are stored in a page.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
efb131dd6f Add missing tests discovered by Codecov 2
This commit adds tests for DATE, TIMESTAMP, and FLOAT compression and
decompression, NULL compression and decompression in dictionaries and
fixes a bug where the database would refuse to decompress DATEs. This
commit also removes the fallback allowing any binary compatible 8-byte
types to be compressed by our integer compressors as I believe I found
a bug in said fallback last time I reviewed it, and cannot recall what
the bug was. These can be re-added later, with appropriate tests.
2019-10-29 19:02:58 -04:00
Sven Klemm
e2df62c81c Fix transparent decompression interaction with first/last
Queries with the first/last optimization on compressed chunks
would not properly decompress data but instead access the uncompressed
chunk. This patch fixes the behaviour and also unifies the check
whether a hypertable has compression.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
07841670a7 Fix issues discovered by coverity
This commit fixes issues reported by coverity. Of these, the only real
issue is an integer overflow in bitarray, which can never happen in its
current usages. This also adds a PG_USED_FOR_ASSERTS_ONLY for a
variable only used for Assert.
2019-10-29 19:02:58 -04:00
Matvey Arye
85d30e404d Add ability to turn off compression
Since enabling compression creates limits on the hypertable
(e.g. types of constraints allowed) even if there are no
compressed chunks, we add the ability to turn off compression.
This is only possible if there are no compressed chunks.
2019-10-29 19:02:58 -04:00
Matvey Arye
2fe51d2735 Improve (de)compress_chunk API
This commit improves the API of compress_chunk and decompress_chunk:

- have it return the chunk regclass processed (or NULL in the
  idempotent case);
- mark it as STRICT
- add if_not_compressed/if_compressed options for idempotency
2019-10-29 19:02:58 -04:00
Matvey Arye
92aa77247a Improve minor UIUX
Some small improvements:

- allow alter table with empty segment by if the original definition
  had an empty segment by. Improve error msgs.
- block compression on tables with OIDs
- block compression on tables with RLS
2019-10-29 19:02:58 -04:00
Matvey Arye
b8a98c1f18 Make compressed chunks use same tablespace as uncompressed
For tablepaces with compressed chunks the semantics are the following:
  - compressed chunks get put into the same tablespace as the
    uncommpressed chunk on compression.
 - set tablespace on uncompressed hypertable cascades to compressed hypertable+chunks
 - set tablespace on all chunks is blocked (same as w/o compression)
 - move chunks on a uncompressed chunk errors
 - move chunks on compressed chunk works

In the future we will:
 - add tablespace option to compress_chunk function and policy (this will override the setting
   of the uncompressed chunk). This will allow changing tablespaces upon compression
 - Note: The current plan is to never listen to the setting on compressed hypertable. In fact,
   we will block setting tablespace on  compressed hypertables
2019-10-29 19:02:58 -04:00
Joshua Lockerman
91a73c3e17 Set statistics on compressed chunks
The statistics on segmentby and metadata columns are very important as
they affect the decompressed data a thousand-fold. Statistics on the
compressed columns are irrelevant, as the regular postgres planner
cannot understand the compressed columns. This commit sets the
statistics for compressed tables based on this, weighting the
uncompressed columns greatly, and the compressed columns not-at-all.
2019-10-29 19:02:58 -04:00
gayyappan
72588a2382 Restrict constraints on compressed hypertables.
Primary and unqiue constraints are limited to segment_by and order_by
columns and foreign key constraints are limited to segment_by columns
when creating a compressed hypertable. There are no restrictions on
check constraints.
2019-10-29 19:02:58 -04:00
Matvey Arye
0f3e74215a Split segment meta min_max into two columns
This simplifies the code and the access to the min/max
metadata. Before we used a custom type, but now the min/max
are just the same type as the underlying column and stored as two
columns.

This also removes the custom type that was used before.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
6687189a6c Free memory earlier in decompress_chunk
This was supposed to be part of an earlier commit, but seems to have
been lost. This should reduce peak memory usage of that function.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
64f56d5088 Create indexes on segmentby columns
This commit creates indexes on all segmentby columns of the compressed
hypertable.
2019-10-29 19:02:58 -04:00
gayyappan
909b0ece78 Block updates/deletes on compressed chunks 2019-10-29 19:02:58 -04:00
gayyappan
edd3999553 Add trigger to block INSERT on compressed chunk
Prevent insert on compressed chunks by adding a trigger that blocks it.
Enable insert if the chunk gets decompressed.
2019-10-29 19:02:58 -04:00
Matvey Arye
12929fc813 Use DatumSerialize for binary strings
This is a refactor of the array and dictionary code to use binary
string functions in datum serialize to consolidate code. We also
made the datum serialize more flexible in that it no longer must use
a byte to store the encoding type (binary or text) but instead can
get that as input. This makes the encoding use less data in the
array case.
2019-10-29 19:02:58 -04:00
Matvey Arye
14f02f423e Switch the array code to use DatumSerializer
This commit switches the array compressor code to using
DatumSerializer/DatumDeserializer to reduce code duplication
and to add in some more efficiency.
2019-10-29 19:02:58 -04:00
Matvey Arye
300db8594a Fix detoasting bug and add tests
Previously, the detoasting in Array was incorrect and so the compressed
table stored pointers into the toast table of the uncomoressed table.
This commit fixes the bug and also add logic to the test to remove
the uncompressed table so such a bug would cause test failures in
the future.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
fac8eca0b3 Free Memory Earlier in decompress_chunk
This commit alters decompress_chunk to free memory as soon as possible
instead of waiting until the function ends. This should decrease peak
memory usage from roughly the size of the dataset to roughly the size
of the a single compressed row.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
0606aeba9e Reduce Peak Memory Usage for compress_chunk
Before this PR some state (most notably deTOASTed values) would persist
across compressed rows during compress_chunk, despite the fact that
they were no longer needed. This increased peak memory usage of
compress_chunk. This commit adds a MemoryContext that is reset after
each compressed row is inserted, ensuring that state needed for only
one row does not hang around longer than needed.
2019-10-29 19:02:58 -04:00
Matvey Arye
6465a4e85a Switch to using get_attnum function
This is a fix for a rebase on master since `attno_find_by_attname`
was removed.
2019-10-29 19:02:58 -04:00
Matvey Arye
8250714a29 Add fixes for Windows
- Fix declaration of functions wrt TSDLLEXPORT consistency
- Empty structs need to be created with '{ 0 }' syntax.
- Alignment sentinels have to use uint64 instead of a struct
  with a 0-size member
- Add some more ORDER BY clauses in the tests to constrain
  the order of results
- Add ANALYZE after running compression in
  transparent-decompression test
2019-10-29 19:02:58 -04:00
Matvey Arye
df4c444551 Delete related rows for compression
This fixes delete of relate rows when we have compressed
hypertables. Namely we delete rows from:

- compression_chunk_size
- hypertable_compression

We also fix hypertable_compression to handle NULLS correctly.

We add a stub for tests with continuous aggs as well as compression.
But, that's broken for now so it's commented out. Will be fixed
in another PR.
2019-10-29 19:02:58 -04:00
Matvey Arye
0db50e7ffc Handle drops of compressed chunks/hypertables
This commit add handling for dropping of chunks and hypertables
in the presence of associated compressed objects. If the uncompressed
chunk/hypertable is dropped than drop the associated compressed object
using DROP_RESTRICT unless cascading is explicitly enabled.

Also add a compressed_chunk_id index on compressed tables for
figuring out whether a chunk is compressed or not.

Change a bunch of APIs to use DropBehavior instead of a cascade bool
to be more explicit.

Also test the drop chunks policy.
2019-10-29 19:02:58 -04:00
Matvey Arye
2bf97e452d Push down quals to segment meta columns
This commit pushes down quals or order_by columns to make
use of the SegmentMetaMinMax objects. Namely =,<,<=,>,>= quals
can now be pushed down.

We also remove filters from decompress node for quals that
have been pushed down and don't need a recheck.

This commit also changes tests to add more segment by and
order-by columns.

Finally, we rename segment meta accessor functions to be smaller
2019-10-29 19:02:58 -04:00
gayyappan
6e60d2614c Add compress chunks policy support
Add and drop compress chunks policy using bgw
infrastructure.
2019-10-29 19:02:58 -04:00
Matvey Arye
5c891f732e Add sequence id metadata col to compressed table
Add a sequence id to the compressed table. This id increments
monotonically for each compressed row in a way that follows
the order by clause. We leave gaps to allow for the
possibility to fill in rows due to e.g. inserts down
the line.

The sequence id is global to the entire chunk and does not reset
for each segment-by-group-change since this has the potential
to allow some micro optimizations when ordering by a segment by
columns as well.

The sequence number is a INT32, which allows up to 200 billion
uncompressed rows per chunk to be supported (assuming 1000 rows
per compressed row and a gap of 10). Overflow is checked in the
code and will error if this is breached.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
6d0dfdfe1a Switch Timestamptz to use deltadelta and bugfixes
Timestamptz is an integer-like type, and thus should use deltadelta
encoding by default. Making this change uncovered a bug where RLE was
truncating values on decompression, which has also been fixed.
2019-10-29 19:02:58 -04:00
Matvey Arye
b4a7108492 Integrate segment meta into compression
This commit integrates the SegmentMetaMinMax into the
compression logic. It adds metadata columns to the compressed table
and correctly sets it upon compression.

We also fix several errors with datum detoasting in SegmentMetaMinMax
2019-10-29 19:02:58 -04:00
Matvey Arye
be199bec70 Add type cache
Add a type cache to get the OID corresponding to a particular
defined SQL type.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
2b1e950df3 Store first deltadelta element in simple8b
This commit changes deltadelta compression to store the first element
in the simple8b array instead of out-of-line. Besides shrinking the
data in some cases, this also ensures that the simple8b array is never
empty, fixing the case where only a single element is stored.
2019-10-29 19:02:58 -04:00
Sven Klemm
3d55595ad0 Fix error hint for compress_chunk
The errror hint for compress_chunk misspelled the option to use
for enabling compression. This patch changes the error hint and
also makes the hint a proper sentence.
2019-10-29 19:02:58 -04:00
Matvey Arye
b9674600ae Add segment meta min/max
Add the type for min/max segment meta object. Segment metadata
objects keep metadata about data in segments (compressed rows).
The min/max variant keeps the min and max values inside the compressed
object. It will be used on compression order by columns to allow
queries that have quals on those columns to be able to exclude entire
segments if no uncompressed rows in the segment may match the qual.

We also add generalized infrastructure for datum serialization
/ deserialization for arbitrary types to and from memory as well
as binary strings.
2019-10-29 19:02:58 -04:00
Joshua Lockerman
8b273a5187 Fix flush when num-rows overflow
We should only free the segment-bys when we're changing groups not when
we've got too many rows to compress, in that case we'll need them.
2019-10-29 19:02:58 -04:00
Matvey Arye
ea7d2c7e60 Enforce license checks for compression
Enforce enterprise license check for compression. Note: these
checks are now outdated as compression is now a community,
not enterprise feature.
2019-10-29 19:02:58 -04:00
gayyappan
6832ed2ca5 Modify storage type for toast columns
This PR modifies the toast type for compressed columns based on
the algorithm used for compression.
2019-10-29 19:02:58 -04:00
Matvey Arye
bce292a64f Fix locking when altering compression options
Take an exclusive lock when taking compression options as it is
safer.
2019-10-29 19:02:58 -04:00
Matvey Arye
0059360522 Fix indexes during compression and decompression
This rebuilds indexes during compression and decompression. Previously,
indexes were not updated during these operations. We also fix
a small bug with orderby and segmentby handling of empty strings/
lists.

Finally, we add some more tests.
2019-10-29 19:02:58 -04:00
Matvey Arye
cdf6fcb69a Allow altering compression options
We now allow changing the compression options on a hypertable
as long as there are no existing compressed chunks.
2019-10-29 19:02:58 -04:00
Matvey Arye
eba612ea2e Add time column to compressed order by list
Add the column to the order by list if it's not already there.
This is never wrong and might improve performance. This
also guarantees that we have at least one ordering column
during compression and therefore can always use tuplesort
(o/w we'd need a non-tuplesort method of getting tuples).
2019-10-29 19:02:58 -04:00
Matvey Arye
6f22a7a68c Improve parsing of segment by and order by lists
Replace custom parsing of order by and segment by lists
with the postgres parser. The segment by list is now
parsed in the same way as the GROUP BY clause and the
order by list in the same way as the ORDER BY clause.

Also fix default for nulls first/last to follow the PG
convention: LAST for ASC, FIRST for DESC.
2019-10-29 19:02:58 -04:00
Matvey Arye
f6573f9247 Add a metadata count column to compressed table
This is useful, if some or all compressed columns are NULL.
The count reflects the number of uncompressed rows that are
in the compressed row. Stored as a 32-bit integer.
2019-10-29 19:02:58 -04:00
Matvey Arye
a078781c2e Add decompress_chunk function
This is the opposite dual of compress_chunk.
2019-10-29 19:02:58 -04:00
Sven Klemm
bdc599793c Add helper function to get decompression iterator init function 2019-10-29 19:02:58 -04:00
Matvey Arye
9223f08d68 Truncate chunks after (de-)compression
This commit will truncate the original chunk after compression
or decompression.
2019-10-29 19:02:58 -04:00
Matvey Arye
5bdb29b8f7 Fix compression for PG96
Fixes some compilation and test errors.
2019-10-29 19:02:58 -04:00
gayyappan
1f4689eca9 Record chunk sizes after compression
Compute chunk size before/after compressing a chunk and record in
catalog table.
2019-10-29 19:02:58 -04:00
gayyappan
44941f7bd2 Add UI for compress_chunks functionality
Add support for compress_chunks function.

This also adds support for compress_orderby and compress_segmentby
parameters in ALTER TABLE. These parameteres are used by the
compress_chunks function.

The parsing code will most likely be changed to use PG raw_parser
function.
2019-10-29 19:02:58 -04:00