615 Commits

Author SHA1 Message Date
Evan Tschannen
96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
8b768e66df
Merge pull request #2694 from dongxinEric/feature/2663/specialize-policy-for-zoneid-in-cc
Added a specialized algorithm for PolicyOne and PolicyAcross(,'zoneId…
2020-02-20 14:46:23 -08:00
Evan Tschannen
574e88ba8e updateGoodRemoteRecruitmentTime was unnecessary because the only way findRemoteWorkers would return would be after a new server has joined which already resets goodRemoteRecruitmentTime 2020-02-20 13:46:22 -08:00
Xin Dong
99095c9224 Again make Clang happy. 2020-02-20 09:50:22 -08:00
Xin Dong
298d6cb3d7 Address review comments. 2020-02-20 09:34:01 -08:00
Evan Tschannen
fbd45963d8 The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment 2020-02-19 16:48:30 -08:00
Xin Dong
89fcbb2055 Make clang happy 2020-02-19 09:44:15 -08:00
Xin Dong
efc0d7f9d5 Added a specialized algorithm for PolicyOne and PoilcyAcross(,'zoneId',PolicyOne()) to find a set of TLog servers which will be able to fulfill the policy later. 2020-02-19 09:25:57 -08:00
negoyal
85cc35e81e Merge branch 'master' into HEAD 2020-02-05 14:59:55 -08:00
Evan Tschannen
844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou
52c6737411 Rename backupLoggingEnabled as backupWorkerEnabled
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
0db03f1d3c Use backup_logging_enabled flag
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Jingyu Zhou
38aa1903fd Add a DB configuration option for backup workers
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.

The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
6ddf73e26a Remove code introduced when resolving merge conflicts 2020-01-22 21:23:38 -08:00
Jingyu Zhou
c6c39ca99d Update better master exist with backup workers
During recruitment, if there is no desired log router count, use tlog size
instead, because the number of backup workers has to be larger than 0.
2020-01-22 19:43:40 -08:00
Jingyu Zhou
56a2c37071 Recruit backup workers for single region
Enable log router tags for single region, which are popped by backup workers.
Need to add noop for backup workers if there is no active backups.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
19d6a889ff Recruit backup workers for old epochs
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26 Enable pop from backup workers
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
ece3cadf8e Recruit backup worker during master recovery
Right now recruit the same number as TLogs. The backup worker does nothing.
2020-01-22 19:37:48 -08:00
Jingyu Zhou
de8d953865 Add backup role, class, and worker skeleton 2020-01-22 19:35:30 -08:00
Vishesh Yadav
daef5f011a Merge remote-tracking branch 'apple/master' into task/failmon-remove-server 2020-01-21 13:20:15 -08:00
Evan Tschannen
3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
d55e56993d fix: the cluster controller would not recruit more remote logs before the database became fully_recovered 2020-01-10 12:21:48 -08:00
Alvin Moore
7628d04fb9 Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
mpilman
d3d6016c90 Merge remote-tracking branch 'negoyal/fdb_cache_subfeature2' into features/cache-initialization 2020-01-07 19:53:09 -08:00
Vishesh Yadav
6e6cfaff16 Cleanup old Failure Monitoring code 2020-01-07 15:53:32 -08:00
negoyal
29b77863f0 Cache warmup and Consistency check workload changes. 2020-01-07 13:06:58 -08:00
Evan Tschannen
3eae401886 fix: we were recruiting one too few oldLogRouters
code cleanup
2020-01-02 15:05:44 -08:00
Evan Tschannen
5e5e618da0 during recovery, only send the full serverDBInfo to processes that are part of the new generation 2019-12-09 13:17:49 -08:00
Evan Tschannen
bcce5968a4 recruit oldLogRouters on TLogs, do not recruit oldLogRouters on the cluster controller if possible 2019-12-09 13:12:13 -08:00
mpilman
821edcb207 Register caches through keyspace
This also removes the old mechanism that registers them
through the serverDBInfo.

Caches do now self-recruit at startup
2019-12-06 13:28:44 -08:00
negoyal
cf2563f1c7 Mix of various things, a lot of which will change. 2019-12-05 17:10:32 -08:00
Evan Tschannen
3c769fcf60 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen
ebcb2f79ed Merge branch 'master' of github.com:apple/foundationdb 2019-11-22 15:34:49 -08:00
A.J. Beamon
7c801513e2 Fix cases where latency band config could be discarded during recovery or process start. 2019-11-20 11:44:18 -08:00
Evan Tschannen
8d3ef89540 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/MutationList.h
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-14 15:49:56 -08:00
Evan Tschannen
ffc89d1182 fix: dd test recruitment should prefer the location of ratekeeper over other used processes 2019-11-13 12:58:55 -08:00
Balachandar Namasivayam
2e41497580 This commit tries to distribute RK and DD among other empty available processes. 2019-11-12 17:52:42 -08:00
Balachandar Namasivayam
f5282f2c7e Fix bug where DD or RK could be halted and re-recruited in a loop for certain valid process class configurations. Specifically, recruitment of DD or RK takes into account that master process is preferred over proxy, resolver or cc.
But check for better DD only looks for better machine class ignoring that the new recruit could share a proxy or resolver or CC. Also try to balance the distribution of the DD and RK role if there are enough processes to do so.
2019-11-12 14:22:36 -08:00
negoyal
a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Evan Tschannen
688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen
43e99ef6a4 fix: better master exists must check if fitness is better for proxies or resolvers before looking at the count of either of them 2019-10-17 13:18:31 -07:00
Evan Tschannen
298b815109 one proxy or resolver with best fitness no longer prevents more proxies or resolvers from being recruited with good fitness 2019-10-14 18:32:17 -07:00
Evan Tschannen
5064d91b75 fix: the cluster controller would not change to a new set of satellite tlogs when they become available in a better satellite location 2019-10-14 18:31:23 -07:00
Evan Tschannen
35e816e9ad added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present 2019-10-14 18:30:15 -07:00
A.J. Beamon
31ce56eddf Add cluster controller metrics 2019-10-03 15:29:11 -07:00
Evan Tschannen
b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
Evan Tschannen
a62862c105 add yieldedFutures to prevent slow tasks 2019-09-11 16:26:48 -07:00
Evan Tschannen
945cff1e5b the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times 2019-09-10 14:27:22 -07:00