rocksdb/unreleased_history
Jay Huh cc487ba367 Fix Compaction Stats for Remote Compaction and Tiered Storage (#13464)
Summary:
## Background

Compaction statistics are collected at various levels across different classes and structs.

* `InternalStats::CompactionStats`: Per-level Compaction Stats within a job (can be at subcompaction level which later get aggregated to the compaction level)
* `InternalStats::CompactionStatsFull`: Contains two per-level compaction stats - `output_level_stats` for primary output level stats and `proximal_level_stats` for proximal level stats. Proximal level statistics are only relevant when using Tiered Storage with the per-key placement feature enabled.
* `InternalStats::CompactionOutputsStats`: Simplified version of `InternalStats::CompactionStats`. Only has a subset of fields from `InternalStats::CompactionStats`
* `CompactionJobStats`: Job-level Compaction Stats. (can be at subcompaction level which later get aggregated to the compaction level)

Please note that some fields in Job-level stats are not in Per-level stats and they don't map 1-to-1 today.

## Issues

* In non-remote compactions, proximal level compaction statistics were not being aggregated into job-level statistics. Job level statistics were missing stats for proximal level for tiered storage compactions with per-key-replacement feature enabled.
* During remote compactions, proximal level compaction statistics were pre-aggregated into job-level statistics on the remote side. However, per-level compaction statistics were not part of the serialized compaction result, so that primary host lost that information and weren't able to populate `per_key_placement_comp_stats_` and `internal_stats_.proximal_level_stats` properly during the installation.
* `TieredCompactionTest` was only checking if (expected stats > 0 && actual stats > 0) instead actual value comparison

## Fixes

* Renamed `compaction_stats_` to `internal_stats_` for `InternalStats::CompactionStatsFull` in `CompactionJob` for better readability
* Removed the usage of `InternalStats::CompactionOutputsStats` and consolidated them to `InternalStats::CompactionStats`.
* Remote Compactions now include the internal stats in the serialized `CompactionServiceResult`. `output_level_stats` and `proximal_level_stats` get later propagated in sub_compact output stats accordingly.
* `CompactionJob::UpdateCompactionJobStats()` now takes `CompactionStatsFull` and aggregates the `proximal_level_stats` as well
* `TieredCompactionTest` is now doing the actual value comparisons for input/output file counts and record counts. Follow up is needed to do the same for the bytes read / written.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13464

Test Plan:
Unit Tests updated to verify stats

```
./compaction_service_test
```
```
./tiered_compaction_test
```

Reviewed By: pdillinger

Differential Revision: D71220393

Pulled By: jaykorean

fbshipit-source-id: ad70bffd9614ced683f90c7570a17def9b5c8f3f
2025-03-18 16:28:18 -07:00
..

Adding release notes
--------------------

When adding release notes for the next release, add a file to one of these
directories:

unreleased_history/new_features
unreleased_history/behavior_changes
unreleased_history/public_api_changes
unreleased_history/bug_fixes

with a unique name that makes sense for your change, preferably using the .md
extension for syntax highlighting.

There is a script to help, as in

$ unreleased_history/add.sh unreleased_history/bug_fixes/crash_in_feature.md

or simply

$ unreleased_history/add.sh

will take you through some prompts.

The file should usually contain one line of markdown, and "* " is not
required, as it will automatically be inserted later if not included at the
start of the first line in the file. Extra newlines or missing trailing
newlines will also be corrected.

The only times release notes should be added directly to HISTORY are if
* A release is being amended or corrected after it is already "cut" but not
tagged, which should be rare.
* A single commit contains a noteworthy change and a patch release version bump


Ordering of entries
-------------------

Within each group, entries will be included using ls sort order, so important
entries could start their file name with a small three digit number like
100pretty_important.md.

The ordering of groups such as new_features vs. public_api_changes is
hard-coded in unreleased_history/release.sh


Updating HISTORY.md with release notes
--------------------------------------

The script unreleased_history/release.sh does this. Run the script before
updating version.h to the next development release, so that the script will pick
up the version being released. You might want to start with

$ DRY_RUN=1 unreleased_history/release.sh | less

to check for problems and preview the output. Then run

$ unreleased_history/release.sh

which will git rm some files and modify HISTORY.md. You still need to commit the
changes, or revert with the command reported in the output.


Why not update HISTORY.md directly?
-----------------------------------

First, it was common to hit unnecessary merge conflicts when adding entries to
HISTORY.md, which slowed development. Second, when a PR was opened before a
release cut and landed after the release cut, it was easy to add the HISTORY
entry to the wrong version's history. This new setup completely fixes both of
those issues, with perhaps slightly more initial work to create each entry.
There is also now an extra step in using `git blame` to map a release note
to its source code implementation, but that is a relatively rare operation.