58 Commits

Author SHA1 Message Date
Andrew Noyes
1f541f02be Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
489ba20641 Fix several merge issues 2020-11-16 14:46:36 -08:00
David Youngworth
d0391db862 Merge branch 'release-6.2' into release-6.3 2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard
ca8ea3b6ff Fix memory issues caused by cancelling data distribution tracker 2020-11-15 23:52:36 -08:00
Meng Xu
222da17558 Merge branch 'release-6.2' into mengxu/ha-code-read 2020-11-12 13:39:27 -08:00
Meng Xu
063700e4d6 Add comments and questions to HA and tLog code reading
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
Xin Dong
9ef29d0cea Changed getTeamID() to return a string instead of UID as suggested by reviews. 2020-10-26 16:44:52 -07:00
Xin Dong
9b5a02b552 Resolve review comments 2020-10-26 16:44:52 -07:00
Xin Dong
21ad448ad3 Fix macOS build. 2020-10-26 16:44:52 -07:00
Xin Dong
7ebb2e5c09 Piggy back this PR to polish more TraceEvent by:
- Making it clear that it's tracking machine team info or server team info
- Added ID to both machine team and server team for better trackability
- Attach distributor id to some trace events.
2020-10-26 16:44:09 -07:00
Jingyu Zhou
8f17a1a5d6 Merge branch 'release-6.2' into release-6.3 2020-10-16 15:25:39 -07:00
sfc-gh-tclinkenbeard
91a8367acb Avoid slow task in ~DataDistributionTracker 2020-10-01 11:44:55 -07:00
sfc-gh-tclinkenbeard
9a2ce4c981 Make IDataDistributionTeam const-correct 2020-07-21 11:05:34 -07:00
A.J. Beamon
b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
5e02fd490e fix: the check for if a teamCollection was tracking a source server was unreliable, leading to scenarios where we would temporarily replicate a shard less than teamSIze 2020-06-29 10:02:27 -07:00
Chaoguang Lin
ef724bf939 Merge remote-tracking branch 'upstream/master' into add-data-distribution-metrics 2020-05-08 18:39:28 -07:00
chaoguang
e8b62e48f4 Rename DDMetrics to DDMetricsRef 2020-05-08 17:17:27 -07:00
Evan Tschannen
07cc0a8d74 code cleanup 2020-04-10 17:02:11 -07:00
tclinken
247ab84323 Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics 2020-03-23 17:01:17 -07:00
Evan Tschannen
e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
A.J. Beamon
555db50cd1 Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist. 2020-03-12 11:22:03 -07:00
Evan Tschannen
303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
125bd13198 fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload 2020-03-04 14:17:17 -08:00
Evan Tschannen
96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
A.J. Beamon
e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon
3a1ba5a077 Rename variable for clarity 2020-02-20 10:59:52 -08:00
A.J. Beamon
c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
tclinken
c9363e7e28 Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics 2020-01-22 21:02:21 -08:00
Evan Tschannen
3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
tclinken
1d6ac716a1 Merge remote-tracking branch 'origin' into add-data-distribution-metrics 2020-01-15 13:20:04 -08:00
Evan Tschannen
ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Jon Fu
d2b6626d5c Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-21 13:47:06 -07:00
Evan Tschannen
688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen
86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Jon Fu
d146f1d636 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-07 11:27:15 -07:00
Jon Fu
450a09e117 Code Review Changes 2019-09-24 15:48:50 -07:00
Meng Xu
3c4dc1003d DD:clang-format the PR 2019-09-11 11:16:29 -07:00
Meng Xu
0b785e5c1c DD:getTeam may fail to get a team when it can
Due to randomness, when unhealthy teams are majority while there still
exists healthy teams, getTeam function may be unlucky to find
any feasible (ok) team, which leads to BestTeamStuck situation.

This commit increases the tries from 10 to 20.

A long-term solution may first find all feasible teams and choose a random
one from them. Since This can affect the statistics of which team is picked.
So it is not included in this commit.

Non-functional change: This commit removes unneeded printf introduced by
fast restore PR 1404.
2019-09-07 20:08:58 -07:00
Jon Fu
66bba51988 Implemented direct removal of failed storage server from system keyspace 2019-08-27 14:39:43 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
Meng Xu
378db79441 Resolve conflict when merge with master 2019-07-22 10:56:20 -07:00
Meng Xu
f243e77afc Increase merge and split shard priority by 100
PRIORITY_TEAM_REDUNDANT should be in a different priority band from
PRIORITY_MERGE_SHARD and PRIORITY_SPLIT_SHARD, because
priority inversion happens within priorities in the same band.
2019-07-19 13:55:38 -07:00
Meng Xu
6df93173ca TC:Lower priority for removing redundant teams 2019-07-16 18:02:36 -07:00
Meng Xu
c7a996267c TeamRemover: Remove unused declaration
Also change state variable to variable.
2019-07-05 16:54:06 -07:00
Jon Fu
b473a8a830 changed on-the-wire format to use serialized flatbuffers, added cycletest to workload, and fixed small bug in trace 2019-06-11 15:45:06 -07:00