27 Commits

Author SHA1 Message Date
chaoguang
e8b62e48f4 Rename DDMetrics to DDMetricsRef 2020-05-08 17:17:27 -07:00
tclinken
1d6ac716a1 Merge remote-tracking branch 'origin' into add-data-distribution-metrics 2020-01-15 13:20:04 -08:00
Jon Fu
40ad6f0931 created new reply types to work with flatbuffer serialization requirements 2019-09-18 13:40:18 -07:00
Jon Fu
5a877d6b14 added safety check on client to prevent removing all servers from a team 2019-08-27 14:39:43 -07:00
sramamoorthy
ba6bccce73 snap v2: DD changes - snapshot orchestration logic 2019-07-24 15:36:28 -07:00
Jon Fu
b473a8a830 changed on-the-wire format to use serialized flatbuffers, added cycletest to workload, and fixed small bug in trace 2019-06-11 15:45:06 -07:00
Jon Fu
e339ab890b added timeout and shard limiting behaviour 2019-05-21 14:13:09 -07:00
Jon Fu
0984d272e1 initial implementation of get-dd-metrics 2019-05-21 14:13:09 -07:00
mpilman
642a96807b Fixed compilation issues after rebase 2019-05-13 14:15:22 -07:00
Jingyu Zhou
99d521ef4f Monitor Ratekeeper and DataDistributor to use stateless processes
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.

This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.
2019-03-14 15:00:57 -07:00
Jingyu Zhou
dc129207a9 Minor fix after rebase. 2019-03-07 13:16:20 -08:00
Jingyu Zhou
3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Trevor Clinkenbeard
39f612d132 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-03-02 17:07:00 -08:00
A.J. Beamon
3e6a6a6569 Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics. 2019-02-28 12:00:58 -08:00
A.J. Beamon
a6205391ae Fix line endings 2019-02-27 12:05:32 -08:00
A.J. Beamon
a051055caf Initial implementation of adding separate limits for batch priority in ratekeeper 2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
07f800eeee Got rid of detailed field in GetRateInfoReply message 2019-02-23 17:52:11 -08:00
Trevor Clinkenbeard
f3a73963b4 Got rid of detailedLeaseDuration in GetRateInfoReply message 2019-02-23 16:42:11 -08:00
Trevor Clinkenbeard
fa96b8dd33 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-20 16:56:16 -08:00
Vishesh Yadav
e05b53d755 Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-15 20:37:07 -08:00
Jingyu Zhou
5e6577cc82 Final cleanup per review comments
Make distributor interface optional in ServerDBInfo and many other small
changes.
2019-02-14 16:37:17 -08:00
Jingyu Zhou
578473a974 Various review comments fixes 2019-02-14 16:37:16 -08:00
Evan Tschannen
2db31d70a5 Update fdbserver/DataDistributorInterface.h
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-02-14 16:37:16 -08:00
Jingyu Zhou
39e4a59154 Add used worker IDs to cluster controller
This "usedIds" is updated when receiving a master registration message, so that
when recruiting new data distributor, existing assignment is known.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
ef868f599c Add DataDistributorInterface to ServerDBInfo
Also change the Proxy and QuietDatabase to use the DataDistributorInterface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou
0490160714 Fix according to Evan's comments
Use getRateInfo's endpoint as the ID for the DataDistributorInterface.
For now, added a "rejoined" flag for ClusterControllerData and Proxy.

TODO: move DataDistributorInterface into ServerDBInfo.
2019-02-14 16:30:13 -08:00
Jingyu Zhou
886e7ab2ba Add a new DataDistributor role.
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId

Add DataDistributor rejoin handling.

This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.

If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.

The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.

Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00