Instead, in order to enforce the maximum fault tolerance for snapshots,
update getStorageWorkers to return the number of unavailable storage
servers (instead of throwing an error when unavailable storage servers
exist).
Passing nullptr to these methods is invalid, but previously the
signature didn't indicate this. We previously needed to pass pointers
due to actor compiler restrictions, but these restrictions no longer
apply.
* add storagemetadata
* add StorageWiggler;
* fix serverMetadataKey bug
* add metadata tracker in storage tracker
* finish StorageWiggler
* update next storage ID
* change pid to server id
* write metadata when seed SS
* add status json fields
* remove pid based ppw iteration
* fix time expression
* fix tss metadata nonexistence; fix transaction retry when retrieving metadata
* fix checkMetadata bug when store type is wrong
* fix remove storage status json
* format code
* refactor updateNextWigglingStoragePID
* seperate storage metadata tracker and store type tracker
* rename pid
* wiggler stats
* fix completion between waitServerListChange and storageRecruiter
* solve review comments
* rename system key
* fix database lock timeout by adding lock_aware
* format code
* status json
* resolve code format/naming comments
* delete expireNow; change PerpetualStorageWiggleID's value to KeyBackedObjectMap<UID, StorageWiggleValue>
* fix omit start rount
* format code
* status json reset
* solve status json format
* improve status json latency; replace binarywriter/reader to objectwriter/reader; refactor storagewigglerstats transactions
* status timestamp
Patch updates code documentation to reflect the recent code
refactoring where ClusterController process drives recovery
instead of sequencer/master process.
* Trigger buildTeam operation if server transition from unhealthy -> healthy
DataDistribution actor helps in building teams as server count changes
(add/removal), however, it is possible that total_healthy_server count
is insufficient to allow team formation. If happens, even healthy server
count recover, the buildTeam operation will not be triggered.
Patch proposal is to trigger `checkBuildTeam` operation if server
transitions from unhealthy -> healthy state. Incase system already
has created enough teams (desiredTeamCount/maxTeamCount), the operation
incurs a very minimal cost.