Andrew Noyes
1f541f02be
Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
...
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
489ba20641
Fix several merge issues
2020-11-16 14:46:36 -08:00
David Youngworth
d0391db862
Merge branch 'release-6.2' into release-6.3
2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard
ca8ea3b6ff
Fix memory issues caused by cancelling data distribution tracker
2020-11-15 23:52:36 -08:00
Meng Xu
222da17558
Merge branch 'release-6.2' into mengxu/ha-code-read
2020-11-12 13:39:27 -08:00
Meng Xu
063700e4d6
Add comments and questions to HA and tLog code reading
...
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
Xin Dong
9ef29d0cea
Changed getTeamID() to return a string instead of UID as suggested by reviews.
2020-10-26 16:44:52 -07:00
Xin Dong
9b5a02b552
Resolve review comments
2020-10-26 16:44:52 -07:00
Xin Dong
21ad448ad3
Fix macOS build.
2020-10-26 16:44:52 -07:00
Xin Dong
7ebb2e5c09
Piggy back this PR to polish more TraceEvent by:
...
- Making it clear that it's tracking machine team info or server team info
- Added ID to both machine team and server team for better trackability
- Attach distributor id to some trace events.
2020-10-26 16:44:09 -07:00
Jingyu Zhou
8f17a1a5d6
Merge branch 'release-6.2' into release-6.3
2020-10-16 15:25:39 -07:00
sfc-gh-tclinkenbeard
91a8367acb
Avoid slow task in ~DataDistributionTracker
2020-10-01 11:44:55 -07:00
sfc-gh-tclinkenbeard
9a2ce4c981
Make IDataDistributionTeam const-correct
2020-07-21 11:05:34 -07:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
5e02fd490e
fix: the check for if a teamCollection was tracking a source server was unreliable, leading to scenarios where we would temporarily replicate a shard less than teamSIze
2020-06-29 10:02:27 -07:00
Chaoguang Lin
ef724bf939
Merge remote-tracking branch 'upstream/master' into add-data-distribution-metrics
2020-05-08 18:39:28 -07:00
chaoguang
e8b62e48f4
Rename DDMetrics to DDMetricsRef
2020-05-08 17:17:27 -07:00
Evan Tschannen
07cc0a8d74
code cleanup
2020-04-10 17:02:11 -07:00
tclinken
247ab84323
Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics
2020-03-23 17:01:17 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
125bd13198
fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload
2020-03-04 14:17:17 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
3a1ba5a077
Rename variable for clarity
2020-02-20 10:59:52 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
tclinken
c9363e7e28
Merge branch 'master' of https://github.com/apple/foundationdb into add-data-distribution-metrics
2020-01-22 21:02:21 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
tclinken
1d6ac716a1
Merge remote-tracking branch 'origin' into add-data-distribution-metrics
2020-01-15 13:20:04 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Jon Fu
d2b6626d5c
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-21 13:47:06 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Jon Fu
d146f1d636
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-07 11:27:15 -07:00
Jon Fu
450a09e117
Code Review Changes
2019-09-24 15:48:50 -07:00
Meng Xu
3c4dc1003d
DD:clang-format the PR
2019-09-11 11:16:29 -07:00
Meng Xu
0b785e5c1c
DD:getTeam may fail to get a team when it can
...
Due to randomness, when unhealthy teams are majority while there still
exists healthy teams, getTeam function may be unlucky to find
any feasible (ok) team, which leads to BestTeamStuck situation.
This commit increases the tries from 10 to 20.
A long-term solution may first find all feasible teams and choose a random
one from them. Since This can affect the statistics of which team is picked.
So it is not included in this commit.
Non-functional change: This commit removes unneeded printf introduced by
fast restore PR 1404.
2019-09-07 20:08:58 -07:00
Jon Fu
66bba51988
Implemented direct removal of failed storage server from system keyspace
2019-08-27 14:39:43 -07:00
Xin Dong
4ecfc9830f
Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
...
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
...
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
b91795d288
Send bytes input rate to DD.
2019-07-25 16:27:32 -07:00
Meng Xu
378db79441
Resolve conflict when merge with master
2019-07-22 10:56:20 -07:00
Meng Xu
f243e77afc
Increase merge and split shard priority by 100
...
PRIORITY_TEAM_REDUNDANT should be in a different priority band from
PRIORITY_MERGE_SHARD and PRIORITY_SPLIT_SHARD, because
priority inversion happens within priorities in the same band.
2019-07-19 13:55:38 -07:00
Meng Xu
6df93173ca
TC:Lower priority for removing redundant teams
2019-07-16 18:02:36 -07:00
Meng Xu
c7a996267c
TeamRemover: Remove unused declaration
...
Also change state variable to variable.
2019-07-05 16:54:06 -07:00
Jon Fu
b473a8a830
changed on-the-wire format to use serialized flatbuffers, added cycletest to workload, and fixed small bug in trace
2019-06-11 15:45:06 -07:00