104 Commits

Author SHA1 Message Date
Evan Tschannen
12f2b32770 added additional logging in data distribution 2020-03-13 15:19:33 -07:00
Evan Tschannen
e219c1671f Merge branch 'release-6.2' into feature-dd-region-queue
# Conflicts:
#	fdbserver/Knobs.h
2020-03-04 16:25:38 -08:00
Evan Tschannen
6d6f184e2f added a knob which reverts the new queue behavior 2020-03-04 16:23:49 -08:00
Evan Tschannen
b7834b2995
Merge pull request #2774 from etschannen/feature-dd-repopulate-priority
Make the DD priority of populating a region lower than machine failures
2020-03-04 16:15:18 -08:00
Evan Tschannen
125bd13198 fix: in multi-region configurations, the data distribution queue could start too much work, expecting that the remote region would contribute to the read workload 2020-03-04 14:17:17 -08:00
Evan Tschannen
6296465e07 Make the DD priority associated with populating a remote region lower than machine failures 2020-03-04 14:07:32 -08:00
Meng Xu
ad9b3fb4a8 DD:Add trace for detailed relocate shard info 2020-02-29 13:45:10 -08:00
A.J. Beamon
4c696d5bf2 Merge branch 'release-6.2' into dd-better-rebalance-logging
# Conflicts:
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-21 17:41:00 -08:00
A.J. Beamon
dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
Evan Tschannen
08914a2acd Once available space ratio falls below 0.3 avoid moving data to teams with less free space than the median team 2020-02-21 15:14:32 -08:00
A.J. Beamon
2e699fef55 Don't suppress actor cancellation because we've already initialized the trace event by adding details. 2020-02-21 11:28:59 -08:00
A.J. Beamon
6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Evan Tschannen
819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon
c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Evan Tschannen
3157d8a375 fixed typo 2019-12-18 16:57:39 -08:00
Evan Tschannen
5667331729 added a buggify + minor code cleanup 2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Evan Tschannen
ac68c8e4fd added sources servers to the warning message 2019-08-21 11:48:29 -07:00
Evan Tschannen
d30d4cb955 Added a duration to regular relocateShard trace events 2019-08-16 15:15:36 -07:00
Evan Tschannen
297b65236f added additional trace events to warn when different parts of shard relocations take more than 10 minutes 2019-08-16 14:56:58 -07:00
Xin Dong
b653ddb30d Final clean ups after rebasing master 2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc Address review comments. 50K correctness with no failures. 2019-07-30 22:24:30 -07:00
Xin Dong
5d20364423 Address review comments 2019-07-30 22:24:30 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
f5d6e3a5b3 - Addressed review commends
- Added test for the storage server failure disable switch
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen
a78a97f186
Merge pull request #1908 from etschannen/feature-better-dd
A few data distribution improvements
2019-07-30 17:34:50 -07:00
sramamoorthy
63941e0d96 disable DD with a in-memory flag and use in snapv2 2019-07-30 17:04:51 -07:00
Evan Tschannen
5dd9043fd3 addressed review comments 2019-07-30 17:04:41 -07:00
Evan Tschannen
481642fbd4 Merge branch 'master' into feature-better-dd 2019-07-30 16:56:27 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
bc536757df Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space. 2019-07-29 15:47:34 -07:00
Evan Tschannen
6b5e683de5 The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller. 2019-07-28 23:50:42 -07:00
Evan Tschannen
04dd293af0
Merge pull request #1874 from xumengpanda/mengxu/DD-code-read
DataDistribution:Add comments to help understand the code
2019-07-26 13:30:44 -07:00
A.J. Beamon
b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
Meng Xu
e582219ec5 Remove unnecessary condition in DDQueue
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Meng Xu
b7478f5dd3 DD:Add comments to help understand code
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Meng Xu
612a51fe00 Apply Clang format to PRIORITY_TEAM_REDUNDANT 2019-07-19 18:32:22 -07:00
Meng Xu
ea76451f15 Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY 2019-07-19 18:30:01 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen
2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
e0f7ec96aa Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers 2019-04-22 17:29:46 -07:00
mpilman
d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00