This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:
- TraceEvent::detail now takes a c-string instead of std::string for
literals. This prevents unnecessary allocations if the trace is not
going to be printed in the first place (for example for SevDebug).
Before that `detail` expected a `std::string` as key, which mean that
any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
specialized for any type that needs to be printed. The actual
formatting will be deferred to after the `enabled` check. This
provides two benefits: (1) if a TraceEvent is disabled, we don't pay
for the formatting and (2) TraceEvent can trace types that it doesn't
know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
calls, a call to detail will only introduce a if-branch which is much
cheaper than a function call.
A rare race condition:
-r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on
- A is the ratekeeper.
- CC recruit B and B starts
- CC halts ratekeeper A and A is halted
- A registers back with CC, which then halts B. CC sets A to be the ratekeeper.
CC starts recruiting and finds A is the best machine. But skips recruiting
because CC thinks A is already used. Now the cluster is left with no ratekeeper.
Fix by disallowing ratekeeper registration with previous ID.
CC may think master failed and clear the master PID, which can block both data
distributor and ratekeeper recruitment. Fix by restoring it during worker
registration.
While waiting for recruting data distributor or ratekeeper, a previous one
could already joined. So we can skip this unnecessary recruiting.
Revert the change of worker.actor.cpp for ratekeeper. Instead, recruiting
ratekeeper should avoid the process with an existing one. This fixes a bug
where the ratekeeper interface became zombie, killing other healthy ratekeeper
but doing no useful work. Found by:
-r simulation --crash -f tests/fast/WriteDuringRead.txt -s 31858110 -b on
When a ratekeeper registers, the monitorRatekeeper wakes up and recruits a new
ratekeeper. Adding a 0s delay to avoid this.
If a ratekeeper is recruited on an existing machine, update the interface so
that the cluster controller can clear the ratekeeperID.
If all DD, RK, Master run on the same process and failed. Recruiting of new
DD or RK could try to use the old master worker interface, which is an invalid
one and causes recruitment to be stuck.
Fix by adding a delay and checking master is valid before recruitment.
Avoid multiple concurrent recuriting of ratekeepers with a recruiting flag.
Fix endless recruiting when the chosen worker is a proxy or a resolver --
prefer master in this case.
Test with:
-r simulation -f ./foundationdb/tests/slow/CommitBug.txt -s 67828576 -b on
The test has the following event sequence:
- Time 113.3s, CC noticed DD failure, cleard DD interface.
- 1s later, DD rejoined and registered with CC.
- Time 131.7s, DD actor cancelled. This old DD raced to register with CC and
the failure monitor is not installed because monitorDataDistributor is stalled
waiting for new DD.
- Time 161.4s, new DD running. New DD recruting was delayed due to no servers
in the period.
Fix by disabling DD registration during the recruting process.
Make sure both RateKeeper and DataDistributor are placed in the same data
center as the Master. Make sure only one RateKeeper is live in the cluster as
well.
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.
This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.