763 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard
91930b8040 Remove getMinReplicasRemaining PromiseStream.
Instead, in order to enforce the maximum fault tolerance for snapshots,
update getStorageWorkers to return the number of unavailable storage
servers (instead of throwing an error when unavailable storage servers
exist).
2022-04-07 23:23:23 -07:00
sfc-gh-tclinkenbeard
70f378bacc Restrict write access to getUnhealthyRelocationCount 2022-04-03 23:47:54 -07:00
sfc-gh-tclinkenbeard
33fb6ab983 Prevent coordFaultTolerance from dropping below 0 2022-04-03 23:37:42 -07:00
sfc-gh-tclinkenbeard
4f61c86b69 Add MAX_COORDINATOR_SNAPSHOT_FAULT_TOLERANCE knob 2022-04-03 23:28:57 -07:00
sfc-gh-tclinkenbeard
253db642be Add MAX_SNAPSHOT_FAULT_TOLERANCE knob 2022-04-03 22:31:45 -07:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
sfc-gh-tclinkenbeard
58de6e22cc Add BalanceOnRequests boolean parameter for ModelInterface 2022-03-16 14:25:32 -07:00
Xiaoge Su
cbd381778e Fix the includes in DataDistribution.actor.cpp
Update the comment to re-trigger failed checks
2022-03-15 18:05:55 -07:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Bharadwaj V.R
36c5d3a1e6
Merge branch 'main' into dd-utest 2022-02-11 12:25:31 -08:00
sfc-gh-tclinkenbeard
9158564bfc Fix formatting 2022-02-11 10:27:41 -08:00
Bharadwaj V.R
41bd39a82a Fix code formatting 2022-02-10 22:10:50 -08:00
Bharadwaj V.R
b306288c62
Merge branch 'main' into dd-utest 2022-02-10 22:00:52 -08:00
sfc-gh-tclinkenbeard
2165635478 Make printSnapshotTeamsInfo a static function of DDTeamCollection 2022-02-10 18:45:52 -08:00
sfc-gh-tclinkenbeard
9bc38ae73e Make DDTeamCollection::distributorId private 2022-02-10 18:26:06 -08:00
sfc-gh-tclinkenbeard
14c8483e9d Mark DDTeamCollection::primary private 2022-02-10 18:16:57 -08:00
sfc-gh-tclinkenbeard
641a38bd0b Make more DDTeamCollection methods private.
The methods only used by DDTeamCollection::run can now be made private.
2022-02-10 16:19:32 -08:00
sfc-gh-tclinkenbeard
c4508330d2 Make dataDistributionTeamCollection a static function of DDTeamCollection 2022-02-10 16:19:32 -08:00
sfc-gh-tclinkenbeard
5477012ad8 Change DDTeamCollection method signatures to accept references.
Passing nullptr to these methods is invalid, but previously the
signature didn't indicate this. We previously needed to pass pointers
due to actor compiler restrictions, but these restrictions no longer
apply.
2022-02-10 16:19:32 -08:00
sfc-gh-tclinkenbeard
3141698c41 Use special ASSERT_* macros for numeric comparison in data distribution
code.

This helps debugging by printing the exact input values when an
assertion fails.
2022-02-10 11:59:19 -08:00
sfc-gh-tclinkenbeard
975b9f3b32 Remove get helper function from DataDistribution.actor.cpp 2022-02-10 11:32:33 -08:00
Bharadwaj V.R
6d46b03651 Add some unit tests for DD team selection 2022-02-09 22:22:56 -08:00
sfc-gh-tclinkenbeard
04a1347df2 Merge remote-tracking branch 'origin/main' into dd-refactor 2022-02-08 00:33:27 -08:00
Xiaoxi Wang
6dc5921575
createdTime based storage wiggler (#6219)
* add storagemetadata

* add StorageWiggler;

* fix serverMetadataKey bug

* add metadata tracker in storage tracker

* finish StorageWiggler

* update next storage ID

* change pid to server id

* write metadata when seed SS

* add status json fields

* remove pid based ppw iteration

* fix time expression

* fix tss metadata nonexistence; fix transaction retry when retrieving metadata

* fix checkMetadata bug when store type is wrong

* fix remove storage status json

* format code

* refactor updateNextWigglingStoragePID

* seperate storage metadata tracker and store type tracker

* rename pid

* wiggler stats

* fix completion between waitServerListChange and storageRecruiter

* solve review comments

* rename system key

* fix database lock timeout by adding lock_aware

* format code

* status json

* resolve code format/naming comments

* delete expireNow; change PerpetualStorageWiggleID's value to KeyBackedObjectMap<UID, StorageWiggleValue>

* fix omit start rount

* format code

* status json reset

* solve status json format

* improve status json latency; replace binarywriter/reader to objectwriter/reader; refactor storagewigglerstats transactions

* status timestamp
2022-02-04 15:04:30 -08:00
sfc-gh-tclinkenbeard
68ec591cf9 Move DDTeamCollection into its own files 2022-02-04 00:39:42 -08:00
Ata E Husain Bohra
703364d146
Update cluster recovery documentation (#6255)
Patch updates code documentation to reflect the recent code
refactoring where ClusterController process drives recovery
instead of sequencer/master process.
2022-01-18 13:54:00 -08:00
sfc-gh-tclinkenbeard
90ced244eb Fix -Wunused-but-set-variable warnings 2021-12-01 18:15:53 -08:00
Josh Slocum
1870e07ff4 Fixed pause racing with waitUntilHealthy 2021-11-29 14:19:15 -06:00
Evan Tschannen
964d0209ca
Merge pull request #5637 from sfc-gh-ljoswiak/features/data-loss-prevention
Data loss protection when joining new cluster
2021-11-15 15:26:32 -08:00
Ata E Husain Bohra
82c3e8bf79
Trigger buildTeam operation if server transition from unhealthy -> healthy (#5930)
* Trigger buildTeam operation if server transition from unhealthy -> healthy

DataDistribution actor helps in building teams as server count changes
(add/removal), however, it is possible that total_healthy_server count
is insufficient to allow team formation. If happens, even healthy server
count recover, the buildTeam operation will not be triggered.

Patch proposal is to trigger `checkBuildTeam` operation if server
transitions from unhealthy -> healthy state. Incase system already
has created enough teams (desiredTeamCount/maxTeamCount), the operation
incurs a very minimal cost.
2021-11-12 09:41:01 -08:00
Lukas Joswiak
15e0d5b29f Add explicit transaction options when reading cluster ID 2021-11-09 12:29:49 -08:00
Lukas Joswiak
3988b11fd6 Cleanup 2021-11-09 12:29:48 -08:00
Lukas Joswiak
30867750b5 Add protection against storage and tlog data deletion when joining a new cluster 2021-11-09 12:29:47 -08:00
sfc-gh-tclinkenbeard
30cef51746 Improve tracing in ddSnapCreateCore 2021-11-04 12:59:50 -07:00
sfc-gh-tclinkenbeard
d0c9cf4fb0 Enable mismatched-tags clang warning 2021-11-01 14:18:31 -07:00
Xiaoxi Wang
e4fd0023b7 don't disable machine team remover 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
75ef854563 format 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
db7ee9d389 disable team remover 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
14fa32f208 change boolean 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
1a2a838df3 add knob 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
c320391c4c restartRecruiting 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
dc630d63bd add asyncvar 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
654c0a1f14 format 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
8a10966126 wait extra time 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
d1959122af consider wiggling when waitUntilHealthy 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
69190ed04e format 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
0053b4793e change knob and delete redundant doBuildTeam 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
db7b48b71c wiggling teams calculation replace 2021-10-27 09:08:37 -07:00
Xiaoxi Wang
3a6359e202 minus wiggling teams when build team 2021-10-27 09:08:37 -07:00
He Liu
16ae2b76e5 Merge branch 'master' of https://github.com/apple/foundationdb into clean-sim-test-data-loss 2021-10-21 09:16:53 -07:00