The current code uses one restore interface to handle the work
for all restore roles, i.e., master, loader and applier.
This makes it harder to review or maintain or scale.
This commit split the restore into multiple roles by mimicing FDB
transaction system:
1) It uses a RestoreWorker as the process to host restore roles;
This commit assumes one restore role per RestoreWorker; but
it should be easy to extend to support multiple roles per RestoreWorker;
2) It creates 3 restore roles:
RestoreMaster: Coordinate the restore process and send commands to the other two roles;
RestoreLoader: Parse backup files to mutations and send mutations to appliers;
RestoreApplier: Sort received mutations and apply them to DB in order.
Compilable version. To be tested in correctness.
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
This fixed a bug found by upgrade test, where the configuration monitor of the
data distributor was monitoring excludedServersVersionKey, which doesn't
change in ChangeConfig workload. As a result, data distributor was not aware of
configuration changes.
Adding this new key and make sure this key is updated in configuration changes
so that the monitor can detect configuration changes.
Applier may crash in applying mutations.
Node crash may make master waits infinitely for the reply from all nodes.
Change waitForAll semantics to waitForAny when waiting for the appliers response for applying mutations to DB
This is a workaround. The long-term solution should handle the failure in a better way
add data structure to track the status of each node
add logic to let master node assign role to loader and applier
make sure the command request and reply is correct
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.