A.J. Beamon
dfa5f76c01
Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK.
2020-02-21 16:28:03 -08:00
A.J. Beamon
2e699fef55
Don't suppress actor cancellation because we've already initialized the trace event by adding details.
2020-02-21 11:28:59 -08:00
A.J. Beamon
6810a03283
Add more logging to valley filler and mountain chopper
2020-02-21 10:55:14 -08:00
Evan Tschannen
819c55556c
More aggressively attempt to find teams that do not have low disk space
2020-02-20 16:47:50 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
e4b483796d
Combine some logic that was doing similar computations for free space ratio.
2020-02-20 14:52:08 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
A.J. Beamon
c164acb88d
Add new criteria to DD's GetTeamRequest that allow you to require shards be present on the team and that the team have a minimum free ratio. This avoids scenarios where the team chosen when processing the request is later rejected by the requestor, causing rebalancing movements to get stuck.
2020-02-20 09:32:00 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
...
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Evan Tschannen
3157d8a375
fixed typo
2019-12-18 16:57:39 -08:00
Evan Tschannen
5667331729
added a buggify + minor code cleanup
2019-10-11 18:31:43 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Evan Tschannen
ac68c8e4fd
added sources servers to the warning message
2019-08-21 11:48:29 -07:00
Evan Tschannen
d30d4cb955
Added a duration to regular relocateShard trace events
2019-08-16 15:15:36 -07:00
Evan Tschannen
297b65236f
added additional trace events to warn when different parts of shard relocations take more than 10 minutes
2019-08-16 14:56:58 -07:00
Xin Dong
b653ddb30d
Final clean ups after rebasing master
2019-07-30 22:35:34 -07:00
Xin Dong
cda70700cc
Address review comments. 50K correctness with no failures.
2019-07-30 22:24:30 -07:00
Xin Dong
5d20364423
Address review comments
2019-07-30 22:24:30 -07:00
Xin Dong
1922c39377
Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race.
2019-07-30 22:24:30 -07:00
Xin Dong
f5d6e3a5b3
- Addressed review commends
...
- Added test for the storage server failure disable switch
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f
Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
...
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen
a78a97f186
Merge pull request #1908 from etschannen/feature-better-dd
...
A few data distribution improvements
2019-07-30 17:34:50 -07:00
sramamoorthy
63941e0d96
disable DD with a in-memory flag and use in snapv2
2019-07-30 17:04:51 -07:00
Evan Tschannen
5dd9043fd3
addressed review comments
2019-07-30 17:04:41 -07:00
Evan Tschannen
481642fbd4
Merge branch 'master' into feature-better-dd
2019-07-30 16:56:27 -07:00
A.J. Beamon
41605735f5
Merge pull request #1916 from ajbeamon/merge-onto-new-servers
...
Add knob to control whether merges request new servers or not.
2019-07-30 15:04:37 -07:00
A.J. Beamon
14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
...
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon
bc536757df
Add knob to control whether merges request new servers or not. Set the default to request new servers in \xff but not in main key space.
2019-07-29 15:47:34 -07:00
Evan Tschannen
6b5e683de5
The mountainChopper and valleyFiller only move larger than average shards, to avoid moving high bandwidth shards which are generally smaller.
2019-07-28 23:50:42 -07:00
Evan Tschannen
04dd293af0
Merge pull request #1874 from xumengpanda/mengxu/DD-code-read
...
DataDistribution:Add comments to help understand the code
2019-07-26 13:30:44 -07:00
A.J. Beamon
b91795d288
Send bytes input rate to DD.
2019-07-25 16:27:32 -07:00
Meng Xu
e582219ec5
Remove unnecessary condition in DDQueue
...
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Meng Xu
b7478f5dd3
DD:Add comments to help understand code
...
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Meng Xu
612a51fe00
Apply Clang format to PRIORITY_TEAM_REDUNDANT
2019-07-19 18:32:22 -07:00
Meng Xu
ea76451f15
Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY
2019-07-19 18:30:01 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Evan Tschannen
2d5043c665
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
e0f7ec96aa
Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers
2019-04-22 17:29:46 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Meng Xu
9445ac0b0c
Status: Use new data distributor worker to publish status
...
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).
So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu
7cca439e00
TeamRemover: Add status to show redundant team removing
...
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
mpilman
27a3153719
Use ACTOR forward declarations in MoveKeys
...
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9
Fix minor IDE build errors
2019-02-19 15:16:59 -08:00
Meng Xu
6d09ac483c
Merge with master
2019-02-15 17:03:40 -08:00
Jingyu Zhou
bf6da81bf9
Remove recovery version from data distribution queue
...
This parameter is no longer used/needed.
2019-02-14 16:37:16 -08:00