Refactor: ClusterController driving cluster-recovery state machine

diff-1: Address Jingyu's review comments

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
This commit is contained in:
Ata E Husain Bohra 2021-11-30 17:45:28 -08:00 committed by Aaron Molitor
parent dfe9d184ff
commit abd2959702
3 changed files with 3 additions and 3 deletions

View File

@ -6179,7 +6179,7 @@ void clusterRegisterMaster(ClusterControllerData* self, RegisterMasterRequest co
if (db->clientInfo->get().commitProxies != req.commitProxies ||
db->clientInfo->get().grvProxies != req.grvProxies) {
TraceEvent("PublishNewClientInfo", self->id)
.detail("Master", dbInfo.master.id().toString())
.detail("Master", dbInfo.master.id())
.detail("GrvProxies", db->clientInfo->get().grvProxies)
.detail("ReqGrvProxies", req.grvProxies)
.detail("CommitProxies", db->clientInfo->get().commitProxies)

View File

@ -934,7 +934,7 @@ ACTOR Future<Void> grvProxyServerCore(GrvProxyInterface proxy,
addActor.send(traceRole(Role::GRV_PROXY, proxy.id()));
TraceEvent("GrvProxyServerCore", proxy.id())
.detail("MasterId", master.id().toString())
.detail("MasterId", master.id())
.detail("MasterLifetime", masterLifetime.toString())
.detail("RecoveryCount", db->get().recoveryCount);

View File

@ -2020,7 +2020,7 @@ ACTOR Future<Void> doQueueCommit(TLogData* self,
logData->recoveryComplete.send(Void());
}
TraceEvent("TLogCommitDurable", self->dbgid).detail("Version", ver);
//TraceEvent("TLogCommitDurable", self->dbgid).detail("Version", ver);
if (logData->logSystem->get() &&
(!logData->isPrimary || logData->logRouterPoppedVersion < logData->logRouterPopToVersion)) {
logData->logRouterPoppedVersion = ver;