Meng Xu
1ef4cb432b
Merge branch 'master' into mengxu/fast-restore-robust-and-visibility-PR-v2
2020-03-01 20:08:07 -08:00
Meng Xu
2657d41bb2
FastRestore:Add debug msg when memory is over threshold
2020-02-27 18:32:11 -08:00
Alvin Moore
0f64505d0b
Merge branch 'release-6.2' of github.com:apple/foundationdb
...
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
A.J. Beamon
4c696d5bf2
Merge branch 'release-6.2' into dd-better-rebalance-logging
...
# Conflicts:
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-21 17:41:00 -08:00
A.J. Beamon
dfa5f76c01
Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK.
2020-02-21 16:28:03 -08:00
Evan Tschannen
08914a2acd
Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team
2020-02-21 15:14:32 -08:00
A.J. Beamon
2e699fef55
Don't suppress actor cancellation because we've already initialized the trace event by adding details.
2020-02-21 11:28:59 -08:00
A.J. Beamon
6810a03283
Add more logging to valley filler and mountain chopper
2020-02-21 10:55:14 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
...
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Evan Tschannen
3157d8a375
fixed typo
2019-12-18 16:57:39 -08:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
5667331729
added a buggify + minor code cleanup
2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
A.J. Beamon
22c3fa867c
Merge pull request #2074 from xumengpanda/mengxu/fix-correctness-bug
...
DD:fix:getTeam may fail to get a team when it can
2019-10-07 09:33:57 -07:00
Meng Xu
3c4dc1003d
DD:clang-format the PR
2019-09-11 11:16:29 -07:00
Meng Xu
a83afa5f64
DD:BgDDValleyFiller:Do not move shard to a team with no healthy space
2019-09-11 11:15:00 -07:00
Meng Xu
bd80a67d46
Merge branch 'master' into mengxu/storage-engine-switch-PR-v2
2019-09-03 14:11:33 -07:00
Meng Xu
a377261740
StorageEngineSwitch:Remove questions in comments
2019-08-22 11:49:39 -07:00
Evan Tschannen
ac68c8e4fd
added sources servers to the warning message
2019-08-21 11:48:29 -07:00
Meng Xu
39680fa515
StorageEngineSwitch:Clean up unnecessary trace
...
And do not trigger storage recruitment unnecessarily.
2019-08-19 14:11:57 -07:00
Evan Tschannen
d30d4cb955
Added a duration to regular relocateShard trace events
2019-08-16 15:15:36 -07:00
Evan Tschannen
297b65236f
added additional trace events to warn when different parts of shard relocations take more than 10 minutes
2019-08-16 14:56:58 -07:00
Meng Xu
a588710376
StorageEngineSwitch:Graceful switch
...
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Xin Dong
b653ddb30d
Final clean ups after rebasing master
2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc
Address review comments. 50K correctness with no failures.
2019-07-30 22:24:30 -07:00
Xin Dong
5d20364423
Address review comments
2019-07-30 22:24:30 -07:00
Xin Dong
1922c39377
Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.
2019-07-30 22:24:30 -07:00
Xin Dong
f5d6e3a5b3
- Addressed review commends
...
- Added test for the storage server failure disable switch
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f
Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
...
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen
a78a97f186
Merge pull request #1908 from etschannen/feature-better-dd
...
A few data distribution improvements
2019-07-30 17:34:50 -07:00
sramamoorthy
63941e0d96
disable DD with a in-memory flag and use in snapv2
2019-07-30 17:04:51 -07:00
Evan Tschannen
5dd9043fd3
addressed review comments
2019-07-30 17:04:41 -07:00
Evan Tschannen
481642fbd4
Merge branch 'master' into feature-better-dd
2019-07-30 16:56:27 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
...
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
...
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
bc536757df
Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.
2019-07-29 15:47:34 -07:00
Evan Tschannen
6b5e683de5
The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller.
2019-07-28 23:50:42 -07:00
Evan Tschannen
04dd293af0
Merge pull request #1874 from xumengpanda/mengxu/DD-code-read
...
DataDistribution:Add comments to help understand the code
2019-07-26 13:30:44 -07:00
A.J. Beamon
b91795d288
Send bytes input rate to DD.
2019-07-25 16:27:32 -07:00
Meng Xu
e582219ec5
Remove unnecessary condition in DDQueue
...
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Meng Xu
b7478f5dd3
DD:Add comments to help understand code
...
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Meng Xu
612a51fe00
Apply Clang format to PRIORITY_TEAM_REDUNDANT
2019-07-19 18:32:22 -07:00