mirror of
https://github.com/apple/foundationdb.git
synced 2025-05-14 09:58:50 +08:00
* blob: read TenantMap during recovery Future functionality in the blob subsystem will rely on the tenant data being loaded. This fixes this issue by loading the tenant data before completing recovery such that continued actions on existing blob granules will have access to the tenant data. Example scenario with failover, splits are restarted before loading the tenant data: BM - BlobManager epoch 3: epoch 4: BM record intent to split. Epoch fails. BM recovery begins. BM fails to persist split. BM recovery finishes. BM.checkBlobWorkerList() maybeSplitRange(). BM.monitorClientRanges(). loads tenant data. bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \ -s 223570924 -b on --crash --trace_format json * blob: add tuple key truncation for blob granule alignment FDB has a backup system available using the blob manager and blob granule subsystem. If we want to audit the data in the blobs, it's a lot easier if we can align them to something meaningful. When a blob granule is being split, we ask the storage metrics system for split points as it holds approximate data distribution metrics. These keys are then processed to determine if they are a tuple and should be truncated according to the new knob, BG_KEY_TUPLE_TRUNCATE_OFFSET. Here we keep all aligned keys together in the same granule even if it is larger than the allowed granule size. The following commit will address this by adding merge boundaries. * blob: minor clean ups in merging code 1. Rename mergeNow -> seen. This is more inline with clocksweep naming and removes the confusion between mergeNow and canMergeNow. 2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make a clear distinction what we're accomplishing. 3. Rename canMergeNow() -> mergeEligble(). * blob: add explicit (hard) boundaries Blob ranges can be specified either through explicit ranges or at the tenant level. Right now this is managed implicitly. This commit aims to make it a little more explicit. Blobification begins in monitorClientRanges() which parses either the explicit blob ranges or the tenant map. As we do this and add new ranges, let's explicitly track what is a hard boundary and what isn't. When blob merging occurs, we respect this boundary. When a hard boundary is encountered, we submit the found eligible ranges and start looking for a new range beginning with this hard boundary. * blob: create BlobGranuleSplitPoints struct This is a setup for the following commit. Our goal here is to provide a structure for split points to be passed around. The need is for us to be able to carry uncommitted state until it is committed and we can apply these mutations to the in-memory data structures. * blob: implement soft boundaries An earlier commit establishes the need to create data boundaries within a tenant. The reality is we may encounter a set of keys that degnerate to the same key prefix. We'll need to be able to split those across granules, but we want to ensure we merge the split granules together before merging with other granules. This adds to the BlobGranuleSplitPoints state of new BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state saying if it is a left or right boundary. This information is used to, like hard boundaries, force merging of like granules first. We read the BlobGranuleMergeBoundary map into memory at recovery.