foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-06-02 19:25:52 +08:00

Author	SHA1	Message	Date
Evan Tschannen	6a38f81269	do not kill the master unless we have a dbInfo from the current cluster controller	2020-07-17 14:59:38 -07:00
sfc-gh-tclinkenbeard	1b55d75896	Remove TRIVIALLY_DESTRUCTIBLE macro	2020-07-04 19:28:10 -07:00
sfc-gh-tclinkenbeard	3f6222a04d	Mark DebugEntryRef trivially destructible	2020-06-24 14:08:22 -07:00
Markus Pilman	5f9b127e56	Emit traces regularly about role assignment We are currently emitting Role transition traces when a role starts and when it ends. While this is useful for debugging, it doesn't work well with tools that inject data and might potentially miss some trace lines. We do decorate each trace lines with the roles assigned to that particular process, however, this is not sufficient for tools that can make use of the UID -> Role mapping	2020-05-08 16:27:57 -07:00
Evan Tschannen	b7f5f3be48	merge in master	2020-04-28 13:11:47 -07:00
Evan Tschannen	c87aa33941	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # bindings/go/src/fdb/generated.go # documentation/sphinx/source/api-common.rst.inc # documentation/sphinx/source/api-ruby.rst # documentation/sphinx/source/release-notes.rst # fdbclient/FailureMonitorClient.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbclient/vexillographer/fdb.options # fdbrpc/FlowTransport.actor.cpp # fdbserver/OldTLogServer_6_0.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # versions.target	2020-04-23 13:47:53 -07:00
Evan Tschannen	dfb0593ae6	increases priority of status requests	2020-04-22 14:24:59 -07:00
Evan Tschannen	33efb9ec97	code cleanup based on review comments	2020-04-17 15:05:01 -07:00
Evan Tschannen	1476057996	properly cache serialization of serverDBInfo	2020-04-11 19:30:05 -07:00
Evan Tschannen	ce4493f679	many bug fixes	2020-04-10 13:45:16 -07:00
Evan Tschannen	a51c92854a	Merge branch 'master' into feature-tree-broadcast # Conflicts: # fdbserver/WorkerInterface.actor.h # fdbserver/worker.actor.cpp	2020-04-06 21:09:44 -07:00
Evan Tschannen	2a1bd97120	fix compilation errors	2020-04-06 20:58:43 -07:00
Evan Tschannen	477d66b46d	implemented a tree broadcast for txn state message for proxies, and serverDBInfo for workers	2020-04-05 23:09:36 -07:00
Jingyu Zhou	fda6c08640	Include a total number of tags in partition log file names This is needed for BackupContainer to check partitioned mutation logs are continuous, i.e., restorable to a version.	2020-03-20 20:13:38 -07:00
Balachandar Namasivayam	58a9bfa78b	Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report Fix/1977/add back trace event flush failure report	2020-03-18 16:11:44 -07:00
Xin Dong	5967ef5eab	Added back the changes that report trace log flush failures and fix the random crash	2020-03-12 14:34:19 -07:00
Meng Xu	e0d2eca7a8	checkForExtraDataStores:Add coordinators into stateful process list	2020-03-10 23:38:30 -07:00
Xin Dong	39610d15f8	Revert this change since it somehow introduced a random crash detected on circus	2020-03-04 16:14:38 -08:00
Xin Dong	f20619c9fb	Resolve review comments. Changed how issues got cleared	2020-02-25 15:39:51 -08:00
Xin Dong	090c89e90a	Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.	2020-02-25 15:39:38 -08:00
Xin Dong	6325c40336	Apply suggestions from code review Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-02-25 15:39:09 -08:00
Xin Dong	f4f860bfa8	Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe.	2020-02-25 15:38:14 -08:00
Xin Dong	0b0414fb94	Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way.	2020-02-25 15:37:53 -08:00
Xin Dong	034dfe5e42	Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object. - Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler - A successful flush will reset the accumulated counter. Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.	2020-02-25 15:37:32 -08:00
Evan Tschannen	96258b9809	Merge branch 'release-6.2' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbcli/fdbcli.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistribution.actor.h # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/QuietDatabase.actor.cpp # fdbserver/SkipList.cpp # fdbserver/StorageMetrics.actor.h # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KVStoreTest.actor.cpp # flow/CMakeLists.txt # flow/Knobs.cpp # flow/Knobs.h # flow/genericactors.actor.cpp # flow/serialize.h	2020-02-21 19:09:16 -08:00
A.J. Beamon	1d9140d874	Removed TLogVersion logging. Added logging of SharedTLog ID for each TLog. Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog. Made some parameters to startRole passed by reference.	2020-02-14 12:33:43 -08:00
A.J. Beamon	56053c565b	Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered). The SharedTLog role was being started and stopped twice, so remove one instance of it.	2020-02-12 15:11:38 -08:00
Jingyu Zhou	1eaea91cb3	Address review comments	2020-01-22 19:42:13 -08:00
Jingyu Zhou	116608a0a7	Set backup workers w.r.t. the correct epoch For backup workers created for previous epoch, we need to associate them with the correct epoch so that later peekLogRouter can get the correct peek cursor. Otherwise, the workers can never peek the missing range of mutations.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	19d6a889ff	Recruit backup workers for old epochs If there are unfinished ranges in the old epochs, the new master will recruit backup workers responsible for finishing these ranges. These workers remains in the cluster until the next epoch, when it will remove itself.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	17002740bb	Add epoch and backup workers to DBCoreState This enables backup workers to know the end version of the epoch. Additionally, the master recovery only needs to deal with crashed backup workers by recruiting new workers to backup the unfinished version range.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	7da9f47f26	Enable pop from backup workers This is still WIP as some edge cases can trigger test failure, most likely due to not popping mutations by backup workers when epoch ends.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	443c4995a2	Add file identifier in interfaces for flatbuffer	2020-01-22 19:37:48 -08:00
Jingyu Zhou	ece3cadf8e	Recruit backup worker during master recovery Right now recruit the same number as TLogs. The backup worker does nothing.	2020-01-22 19:37:48 -08:00
Jingyu Zhou	de8d953865	Add backup role, class, and worker skeleton	2020-01-22 19:35:30 -08:00
Evan Tschannen	ebcb2f79ed	Merge branch 'master' of github.com:apple/foundationdb	2019-11-22 15:34:49 -08:00
Evan Tschannen	8d3ef89540	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # documentation/sphinx/source/release-notes.rst # fdbclient/MutationList.h # fdbserver/MasterProxyServer.actor.cpp # versions.target	2019-11-14 15:49:56 -08:00
negoyal	a4a0bf18f9	Merging with Master.	2019-11-12 13:01:29 -08:00
Evan Tschannen	1e5677b55a	increase the priority of reboot and recruitment requests	2019-11-11 15:17:11 -08:00
Alex Miller	1eb3a70b96	Spill SharedTLog when there's more than one. When switching between spill_type or log_version, a new instance of a SharedTLog is created in the transaction log processes. If this is done in a saturated database, then doubling the amount of memory to hold mutations in memory can cause TLogs to be uncomfortably close to the 8GB OOM limit. Instead, we now thread which UID of a SharedTLog is active, and the other TLog spill out the majority of their mutations. This is a backport of #2213 (fef89aa1) to release-6.2	2019-10-17 01:24:50 -07:00
Alex Miller	b3fd4f62a7	Fix whitespace.	2019-10-07 18:08:27 -07:00
Alex Miller	1d8a7e5af7	Spill SharedTLog when there's more than one. When switching between spill_type or log_version, a new instance of a SharedTLog is created in the transaction log processes. If this is done in a saturated database, then doubling the amount of memory to hold mutations in memory can cause TLogs to be uncomfortably close to the 8GB OOM limit. Instead, we now thread which UID of a SharedTLog is active, and the other TLog spill out the majority of their mutations.	2019-10-07 18:08:27 -07:00
Alex Miller	60fb04ca68	Fork TLogServer into TLogServer_6_2 This prepares us for incoming modifications to the TLog that can't easily coexist with our current on-disk state.	2019-10-03 01:41:25 -07:00
Evan Tschannen	b495cc697b	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # documentation/sphinx/source/release-notes.rst # versions.target	2019-09-13 09:25:08 -07:00
Evan Tschannen	945cff1e5b	the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times	2019-09-10 14:27:22 -07:00
Meng Xu	d160810662	FastRestore:Resolve review comments	2019-09-04 16:48:43 -07:00
sramamoorthy	a65c9f92ed	get rid of all timeouts and other changes	2019-07-24 15:36:28 -07:00
sramamoorthy	8f1f0c0435	snap v2: worker and other helper related changes	2019-07-24 15:36:28 -07:00
Evan Tschannen	15e894c724	Merge in master	2019-07-05 15:49:24 -07:00
Alex Miller	ea6898144d	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-07-03 20:44:15 -07:00

1 2

89 Commits