110 Commits

Author SHA1 Message Date
Meng Xu
1ef4cb432b Merge branch 'master' into mengxu/fast-restore-robust-and-visibility-PR-v2 2020-03-01 20:08:07 -08:00
Meng Xu
2657d41bb2 FastRestore:Add debug msg when memory is over threshold 2020-02-27 18:32:11 -08:00
Alvin Moore
0f64505d0b Merge branch 'release-6.2' of github.com:apple/foundationdb
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
Evan Tschannen
96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
A.J. Beamon
4c696d5bf2 Merge branch 'release-6.2' into dd-better-rebalance-logging
# Conflicts:
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-21 17:41:00 -08:00
A.J. Beamon
dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
Evan Tschannen
08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
A.J. Beamon
2e699fef55 Don't suppress actor cancellation because we've already initialized the trace event by adding details. 2020-02-21 11:28:59 -08:00
A.J. Beamon
6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Evan Tschannen
819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon
c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
Evan Tschannen
3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Evan Tschannen
3157d8a375 fixed typo 2019-12-18 16:57:39 -08:00
Evan Tschannen
688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen
5667331729 added a buggify + minor code cleanup 2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
A.J. Beamon
22c3fa867c
Merge pull request #2074 from xumengpanda/mengxu/fix-correctness-bug
DD:fix:getTeam may fail to get a team when it can
2019-10-07 09:33:57 -07:00
Meng Xu
3c4dc1003d DD:clang-format the PR 2019-09-11 11:16:29 -07:00
Meng Xu
a83afa5f64 DD:BgDDValleyFiller:Do not move shard to a team with no healthy space 2019-09-11 11:15:00 -07:00
Meng Xu
bd80a67d46 Merge branch 'master' into mengxu/storage-engine-switch-PR-v2 2019-09-03 14:11:33 -07:00
Meng Xu
a377261740 StorageEngineSwitch:Remove questions in comments 2019-08-22 11:49:39 -07:00
Evan Tschannen
ac68c8e4fd added sources servers to the warning message 2019-08-21 11:48:29 -07:00
Meng Xu
39680fa515 StorageEngineSwitch:Clean up unnecessary trace
And do not trigger storage recruitment unnecessarily.
2019-08-19 14:11:57 -07:00
Evan Tschannen
d30d4cb955 Added a duration to regular relocateShard trace events 2019-08-16 15:15:36 -07:00
Evan Tschannen
297b65236f added additional trace events to warn when different parts of shard relocations take more than 10 minutes 2019-08-16 14:56:58 -07:00
Meng Xu
a588710376 StorageEngineSwitch:Graceful switch
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Xin Dong
b653ddb30d Final clean ups after rebasing master 2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc Address review comments. 50K correctness with no failures. 2019-07-30 22:24:30 -07:00
Xin Dong
5d20364423 Address review comments 2019-07-30 22:24:30 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
f5d6e3a5b3 - Addressed review commends
- Added test for the storage server failure disable switch
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen
a78a97f186
Merge pull request #1908 from etschannen/feature-better-dd
A few data distribution improvements
2019-07-30 17:34:50 -07:00
sramamoorthy
63941e0d96 disable DD with a in-memory flag and use in snapv2 2019-07-30 17:04:51 -07:00
Evan Tschannen
5dd9043fd3 addressed review comments 2019-07-30 17:04:41 -07:00
Evan Tschannen
481642fbd4 Merge branch 'master' into feature-better-dd 2019-07-30 16:56:27 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
bc536757df Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space. 2019-07-29 15:47:34 -07:00
Evan Tschannen
6b5e683de5 The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller. 2019-07-28 23:50:42 -07:00
Evan Tschannen
04dd293af0
Merge pull request #1874 from xumengpanda/mengxu/DD-code-read
DataDistribution:Add comments to help understand the code
2019-07-26 13:30:44 -07:00
A.J. Beamon
b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
Meng Xu
e582219ec5 Remove unnecessary condition in DDQueue
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Meng Xu
b7478f5dd3 DD:Add comments to help understand code
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Meng Xu
612a51fe00 Apply Clang format to PRIORITY_TEAM_REDUNDANT 2019-07-19 18:32:22 -07:00