95 Commits

Author SHA1 Message Date
A.J. Beamon
dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
A.J. Beamon
2e699fef55 Don't suppress actor cancellation because we've already initialized the trace event by adding details. 2020-02-21 11:28:59 -08:00
A.J. Beamon
6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Evan Tschannen
819c55556c More aggressively attempt to find teams that do not have low disk space 2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1 Merge branch 'release-6.2' into dd-use-available-space
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d Combine some logic that was doing similar computations for free space ratio. 2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253 Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them. 2020-02-20 11:21:03 -08:00
A.J. Beamon
c164acb88d Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck. 2020-02-20 09:32:00 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Evan Tschannen
3157d8a375 fixed typo 2019-12-18 16:57:39 -08:00
Evan Tschannen
5667331729 added a buggify + minor code cleanup 2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Evan Tschannen
ac68c8e4fd added sources servers to the warning message 2019-08-21 11:48:29 -07:00
Evan Tschannen
d30d4cb955 Added a duration to regular relocateShard trace events 2019-08-16 15:15:36 -07:00
Evan Tschannen
297b65236f added additional trace events to warn when different parts of shard relocations take more than 10 minutes 2019-08-16 14:56:58 -07:00
Xin Dong
b653ddb30d Final clean ups after rebasing master 2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc Address review comments. 50K correctness with no failures. 2019-07-30 22:24:30 -07:00
Xin Dong
5d20364423 Address review comments 2019-07-30 22:24:30 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
f5d6e3a5b3 - Addressed review commends
- Added test for the storage server failure disable switch
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen
a78a97f186
Merge pull request #1908 from etschannen/feature-better-dd
A few data distribution improvements
2019-07-30 17:34:50 -07:00
sramamoorthy
63941e0d96 disable DD with a in-memory flag and use in snapv2 2019-07-30 17:04:51 -07:00
Evan Tschannen
5dd9043fd3 addressed review comments 2019-07-30 17:04:41 -07:00
Evan Tschannen
481642fbd4 Merge branch 'master' into feature-better-dd 2019-07-30 16:56:27 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
bc536757df Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space. 2019-07-29 15:47:34 -07:00
Evan Tschannen
6b5e683de5 The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller. 2019-07-28 23:50:42 -07:00
Evan Tschannen
04dd293af0
Merge pull request #1874 from xumengpanda/mengxu/DD-code-read
DataDistribution:Add comments to help understand the code
2019-07-26 13:30:44 -07:00
A.J. Beamon
b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
Meng Xu
e582219ec5 Remove unnecessary condition in DDQueue
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Meng Xu
b7478f5dd3 DD:Add comments to help understand code
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Meng Xu
612a51fe00 Apply Clang format to PRIORITY_TEAM_REDUNDANT 2019-07-19 18:32:22 -07:00
Meng Xu
ea76451f15 Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY 2019-07-19 18:30:01 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen
2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
e0f7ec96aa Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers 2019-04-22 17:29:46 -07:00
mpilman
d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman
1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
anoyes
981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Meng Xu
9445ac0b0c Status: Use new data distributor worker to publish status
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).

So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu
7cca439e00 TeamRemover: Add status to show redundant team removing
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
mpilman
27a3153719 Use ACTOR forward declarations in MoveKeys
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9 Fix minor IDE build errors 2019-02-19 15:16:59 -08:00
Meng Xu
6d09ac483c Merge with master 2019-02-15 17:03:40 -08:00
Jingyu Zhou
bf6da81bf9 Remove recovery version from data distribution queue
This parameter is no longer used/needed.
2019-02-14 16:37:16 -08:00