Evan Tschannen
0bdd25df23
ratekeeper does not control on remote storage servers
2018-06-18 17:23:55 -07:00
Evan Tschannen
1ccfb3a0f4
fix: log_anti_quorum was always 0 in simulation
...
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Evan Tschannen
e28769b98e
fixed trace event name
2018-06-11 12:43:08 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen
134b5d6f65
fix: only consider data distribution started when remote has recovered so quite database works correctly
2018-06-10 20:25:15 -07:00
Evan Tschannen
4903df5ce9
fix: give time to detect failed servers before building teams
2018-06-10 20:21:39 -07:00
Evan Tschannen
6e48d93d39
backed out the healthy team check because it was unnecessary
2018-06-10 12:43:32 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
e4d5817679
fix: we must server getTeam requests before readyToStart is set because we cannot complete relocateShard requests without getTeam responses from both team collections
2018-06-07 16:14:40 -07:00
Evan Tschannen
9f0c16f062
do not build teams which contain failed servers
2018-06-07 14:05:53 -07:00
Evan Tschannen
b423d73b42
fix: do not finish a shard relocation until all of the storage servers have made the current recovery version durable. This is to prevent dropping a needed storage server as a source for a shard after dropping a remote configuration
2018-06-07 12:29:25 -07:00
Evan Tschannen
be06938d9d
fix: dropping the remote replication will cause all remote storage servers to die. Make sure we are not restoring redundancy before doing this to prevent data loss in simulation.
2018-06-04 18:46:09 -07:00
Evan Tschannen
6cf9508aae
finished a comment
2018-06-03 19:38:51 -07:00
Evan Tschannen
b1935f1738
fix: do not allow a storage server to be removed within 5 million versions of it being added, because if a storage server is added and removed within the known committed version and recovery version, they storage server will need see either the add or remove when it peeks
2018-05-05 18:16:28 -07:00
Evan Tschannen
440e2ae609
fix: data distribution logic was incorrect for finding a complete source team in a failed DC
2018-05-01 23:08:31 -07:00
Evan Tschannen
10d25927cd
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen
9cdabfed0e
added useful trace events
2018-04-29 18:54:47 -07:00
Evan Tschannen
73597f190e
fix: new tlogs are initialized with exactly the tags which existed at the recovery version
2018-04-22 20:28:01 -07:00
Bruce Mitchener
9cdf25eda3
Fix some typos.
2018-04-20 00:49:22 +07:00
Evan Tschannen
a8662f8737
fix: remote recovered is does not need to wait for old logs to be removed
2018-04-16 10:14:39 -07:00
Evan Tschannen
e53f17a83a
fix: the newest log router needs to start where the last old one ends
2018-04-15 14:54:22 -07:00
Evan Tschannen
0496bee1ef
fix: suppress expected errors in data distribution
2018-04-15 11:30:22 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Evan Tschannen
579ba58930
pop old tags only looks are recovered tags, and checks if they are still being used
2018-03-30 19:08:01 -07:00
Evan Tschannen
82fb6424ec
fix: storage recruitment could get stuck in a spin loop
2018-03-15 11:00:44 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen
cf6dd1437b
suppress spammy trace events
2018-03-09 10:16:34 -08:00
Evan Tschannen
5390af8be4
suppress spammy logs
2018-03-09 09:40:36 -08:00
Evan Tschannen
fa7eaea7cf
fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers
2018-03-08 10:50:05 -08:00
Evan Tschannen
470f5c01f3
changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases
2018-02-26 17:09:09 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
e1162e9238
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-22 11:16:12 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
dc93759e15
suppressed trace events that are spammy
2018-02-16 16:01:19 -08:00
Evan Tschannen
d2b0c07558
storage servers continue to attempt to pop old tags after the log system updates
2018-02-13 18:34:13 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Evan Tschannen
63a9f2aed6
fix: history tags were being incorrectly popped
...
fix: history tags were not cleared when a storage server was removed
2018-02-03 12:20:18 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen
b48d8ce96d
getTeam will return an unhealthy exact match if all teams are unhealthy. Resubmit relocation requests once healthy teams are available
2018-01-30 17:00:51 -08:00
Evan Tschannen
29c5d4ad3d
upgrades from 5.X mostly supported, still some remaining correctness problems
2018-01-28 11:52:54 -08:00
Evan Tschannen
b5eba4f13a
fix: do not check for desired data centers if they have not been set
2018-01-20 10:28:59 -08:00
Evan Tschannen
2e46ee3dba
fix: getTeam works when there are no teams
2018-01-17 17:49:13 -08:00
Evan Tschannen
264dc44dfa
fixed many more bugs associated with running without remote logs
2018-01-17 17:03:17 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
3915d6825c
we need to check the server list at a higher priority, because if we do not notice a storage server interface change for a long period of time, we will mark it as failed
2018-01-12 12:51:07 -08:00
Evan Tschannen
9630deba3a
fixed a number of bugs related to running fearless without remote logs
2018-01-08 12:04:19 -08:00
Evan Tschannen
3d2103075d
data distribution tracks teams for each data center separately
2017-10-10 10:36:33 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00