Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
8b768e66df
Merge pull request #2694 from dongxinEric/feature/2663/specialize-policy-for-zoneid-in-cc
...
Added a specialized algorithm for PolicyOne and PolicyAcross(,'zoneId…
2020-02-20 14:46:23 -08:00
Evan Tschannen
574e88ba8e
updateGoodRemoteRecruitmentTime was unnecessary because the only way findRemoteWorkers would return would be after a new server has joined which already resets goodRemoteRecruitmentTime
2020-02-20 13:46:22 -08:00
Xin Dong
99095c9224
Again make Clang happy.
2020-02-20 09:50:22 -08:00
Xin Dong
298d6cb3d7
Address review comments.
2020-02-20 09:34:01 -08:00
Evan Tschannen
fbd45963d8
The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment
2020-02-19 16:48:30 -08:00
Xin Dong
89fcbb2055
Make clang happy
2020-02-19 09:44:15 -08:00
Xin Dong
efc0d7f9d5
Added a specialized algorithm for PolicyOne and PoilcyAcross(,'zoneId',PolicyOne()) to find a set of TLog servers which will be able to fulfill the policy later.
2020-02-19 09:25:57 -08:00
negoyal
85cc35e81e
Merge branch 'master' into HEAD
2020-02-05 14:59:55 -08:00
Evan Tschannen
844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
...
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou
52c6737411
Rename backupLoggingEnabled as backupWorkerEnabled
...
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
0db03f1d3c
Use backup_logging_enabled flag
...
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Jingyu Zhou
38aa1903fd
Add a DB configuration option for backup workers
...
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.
The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
6ddf73e26a
Remove code introduced when resolving merge conflicts
2020-01-22 21:23:38 -08:00
Jingyu Zhou
c6c39ca99d
Update better master exist with backup workers
...
During recruitment, if there is no desired log router count, use tlog size
instead, because the number of backup workers has to be larger than 0.
2020-01-22 19:43:40 -08:00
Jingyu Zhou
56a2c37071
Recruit backup workers for single region
...
Enable log router tags for single region, which are popped by backup workers.
Need to add noop for backup workers if there is no active backups.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
19d6a889ff
Recruit backup workers for old epochs
...
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26
Enable pop from backup workers
...
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
ece3cadf8e
Recruit backup worker during master recovery
...
Right now recruit the same number as TLogs. The backup worker does nothing.
2020-01-22 19:37:48 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Vishesh Yadav
daef5f011a
Merge remote-tracking branch 'apple/master' into task/failmon-remove-server
2020-01-21 13:20:15 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
d55e56993d
fix: the cluster controller would not recruit more remote logs before the database became fully_recovered
2020-01-10 12:21:48 -08:00
Alvin Moore
7628d04fb9
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
mpilman
d3d6016c90
Merge remote-tracking branch 'negoyal/fdb_cache_subfeature2' into features/cache-initialization
2020-01-07 19:53:09 -08:00
Vishesh Yadav
6e6cfaff16
Cleanup old Failure Monitoring code
2020-01-07 15:53:32 -08:00
negoyal
29b77863f0
Cache warmup and Consistency check workload changes.
2020-01-07 13:06:58 -08:00
Evan Tschannen
3eae401886
fix: we were recruiting one too few oldLogRouters
...
code cleanup
2020-01-02 15:05:44 -08:00
Evan Tschannen
5e5e618da0
during recovery, only send the full serverDBInfo to processes that are part of the new generation
2019-12-09 13:17:49 -08:00
Evan Tschannen
bcce5968a4
recruit oldLogRouters on TLogs, do not recruit oldLogRouters on the cluster controller if possible
2019-12-09 13:12:13 -08:00
mpilman
821edcb207
Register caches through keyspace
...
This also removes the old mechanism that registers them
through the serverDBInfo.
Caches do now self-recruit at startup
2019-12-06 13:28:44 -08:00
negoyal
cf2563f1c7
Mix of various things, a lot of which will change.
2019-12-05 17:10:32 -08:00
Evan Tschannen
3c769fcf60
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen
ebcb2f79ed
Merge branch 'master' of github.com:apple/foundationdb
2019-11-22 15:34:49 -08:00
A.J. Beamon
7c801513e2
Fix cases where latency band config could be discarded during recovery or process start.
2019-11-20 11:44:18 -08:00
Evan Tschannen
8d3ef89540
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/MutationList.h
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-14 15:49:56 -08:00
Evan Tschannen
ffc89d1182
fix: dd test recruitment should prefer the location of ratekeeper over other used processes
2019-11-13 12:58:55 -08:00
Balachandar Namasivayam
2e41497580
This commit tries to distribute RK and DD among other empty available processes.
2019-11-12 17:52:42 -08:00
Balachandar Namasivayam
f5282f2c7e
Fix bug where DD or RK could be halted and re-recruited in a loop for certain valid process class configurations. Specifically, recruitment of DD or RK takes into account that master process is preferred over proxy, resolver or cc.
...
But check for better DD only looks for better machine class ignoring that the new recruit could share a proxy or resolver or CC. Also try to balance the distribution of the DD and RK role if there are enough processes to do so.
2019-11-12 14:22:36 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Evan Tschannen
43e99ef6a4
fix: better master exists must check if fitness is better for proxies or resolvers before looking at the count of either of them
2019-10-17 13:18:31 -07:00
Evan Tschannen
298b815109
one proxy or resolver with best fitness no longer prevents more proxies or resolvers from being recruited with good fitness
2019-10-14 18:32:17 -07:00
Evan Tschannen
5064d91b75
fix: the cluster controller would not change to a new set of satellite tlogs when they become available in a better satellite location
2019-10-14 18:31:23 -07:00
Evan Tschannen
35e816e9ad
added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present
2019-10-14 18:30:15 -07:00
A.J. Beamon
31ce56eddf
Add cluster controller metrics
2019-10-03 15:29:11 -07:00
Evan Tschannen
b495cc697b
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-09-13 09:25:08 -07:00
Evan Tschannen
a62862c105
add yieldedFutures to prevent slow tasks
2019-09-11 16:26:48 -07:00
Evan Tschannen
945cff1e5b
the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times
2019-09-10 14:27:22 -07:00