Specifying the `--no-config-db` option when changing coordinators
through fdbcli will prevent the command from hanging when the
configuration database is not active. Failing to specify this option
when the configuration database is not active will not affect the
correctness of the command, but it will hang instead of returning.
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
* Encryption data at-rest db-config
Description
diff-1: Handle 'force' updates to encryption_at_rest db-config
Major changes proposed:
1. Introduce 'encryption_data_at_rest_mode" 'configure new'
option to enable Encryption data at-rest. The feature is disabled
by default.
2. The configuration is meant to be set at the time of database
creation, addition checks will be done to avoid updating the config
in subsequent PR.
3. DatabaseConfiguration validity check to account for "tenant_mode"
set to `required` if Encryption data at-rest is selected given
EncryptionDomain matches Tenant boundaries.
Testing
devCorrectness - 100K
* Enable configuring the next future protocol version as the current protocol version in FDB client, fdbserver, and fdbcli
* Auto format python files used in upgrade tests
* Add a test for upgrading to a future FDB version
* Emphasize that the options for using future protocol version are intended for test purposes only
* Make the global variable for current protocol version visible only locally
* Refactirng to avoid using currentProtocolVersion() in static intialization
* Update go bindings
There was a rare but possible issue where a change coordinators request
could be in flight, but reading `\xff/coordinators` would return the old
set of coordinators, even though a previous attempt to write it returned
`commit_unknown_result`. This happened if there was an unrelated
recovery between setting `\xff/coordinators` and the updated
coordinators reaching each machine. The `commit_unknown_result` caused
by this recovery meant the change coordinators retry loop would read
`\xff/coordinators`, but since the request was ongoing it would read the
old set of coordinators. The fix is to have the cluster controller
ignore any change coordinators request from an old generation, meaning
when the retry loop reads the old coordiantors from `\xff/coordinators`,
it is guaranteed that the in progress change coordinators message will
be rejected when arriving at the cluster controller.
* proof of concept
* use code-probe instead of test
* code probe working on gcc
* code probe implemented
* renamed TestProbe to CodeProbe
* fixed refactoring typo
* support filtered output
* print probes at end of simulation
* fix missed probes print
* fix deduplication
* Fix refactoring issues
* revert bad refactor
* make sure file paths are relative
* fix more wrong refactor changes
This change adds:
* ability to store the mapping from tenants to quota in the system keyspace,
* a setter and getter function
* a new workload to test this functionality
FDBCORE-2437
* Fix comments
* Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT
* A work version with correctness clean
* Remove unnecessay comments; debugging symbols
* Only check secondary address for coordinators, same as before
* Change the trace to SevError and remove the ASSERT(false)
* Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest
* Add retry for network failures
* Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race
* Add retry limit as a knob and make backoff exponentail
* Add getDatabaseConfiguration(Transaction* tr)
* revert back to send request for each role once
* update some comments
This is to fix an issue when recovery and change coordinator key happens
together. The issue will occur when:
1. Recovery starts
2. Coordinator key change transaction started
3. During the recovery the coordinator key is read from cluster file and
stored in the storage server
4. The cluster controller received `ChangeCoordiatorsRequest`, and
updated the cluster name with the new value.
at this stage, the value related to coordinator key in storage server and
the worker is inconsistent.
5. changeQuorumChecker is called, which will verify such consistency.
Since they are different, the call is returning failure and the
caller, which could be a TEST_CASE, fails.
This is a rare race issue, and it is also noticed that when the
recovery/coordinator key change process is done, the database is in a
proper state which allows changeQuorumChecker behave properly. In this
case, a retry mechanism should be sufficiently fix corresponding test
failures.
* coordinatorsKey should not storing IP addresses.
Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.
* Remove the legacy coordinators() function.
* Update async_resolve().
ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.
* Clean code format.
* Fix typo.
* Remove SpecifiedQuorumChange and NoQuorumChange.
MacOS warnings are format warnings, e.g., `format specifies type 'long' but the argument has type 'Version' (aka 'long long')`.
Windows warnings are `ACTOR does not contain a wait() statement`.