* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Test disabling audit for sims.
* Cleanup.
Co-authored-by: He Liu <heliu@apple.com>
* Improved SHARD_ENCODE_LOCATION_METADATA migration.
* Cleanup.
* Cancel itself if a data move finds a conflicting data move. Fixed
transaction reset issue.
* Cancel data move in a retry loop to avoid corrupted mutations.
Co-authored-by: He Liu <heliu@apple.com>
Previously with EaR we always enable authentication (e.g. we encrypt Redwood pages). The authentication is a form of checksum, so dedicated page checksum was not needed. This PR adds back xxhash page checksum when authentication is disabled. Also change the knob to default disable authentication.
Bug behavior:
When DD has zero healthy machine teams but more unhealthy machine teams
than the max machine teams DD plans to build, DD will stop building
new machine teams. Due to zero healthy machine team (and zero healthy
server team), DD cannot find a healthy destination team to relocate data.
When data relocation stops, exclusion stops progressing and stuck.
Bug happens when we *shrink* a k-host cluster by
first adding k/2 new host;
then quickly excluding all old hosts.
Fix:
Let DD build temporary extra teams to relocate data.
The extra teams will be cleaned up later by DD's remove extra teams logic.
Simulation test:
There is no simulation test to cover cluster expansion scnenario.
To most closely simulate this behavior, we intentionally overbuild all possible
machine teams to trigger the condition that unhealthy teams is larger than
the maximum teams DD wants to build later.