foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-06-02 11:15:50 +08:00

Author	SHA1	Message	Date
Ata E Husain Bohra	e1ca0ef9a2	Defer recoveredDiskFiles wait if Encryption data at-rest is enabled (#7414 ) * Defer recoveredDiskFiles wait if Encryption data at-rest is enabled Description In the current code ClusterController startup wait for 'recoveredDiskFiles' future to complete before triggered 'clusterControllerCore' actor, which inturn starts 'EncryptKeyProxy' (EKP) actor resposible to fetch/refresh encryption keys needed for ClusterRecovery as well interactions with KMS. Patch addresses a circular dependency where StorageServer initialization depends on EKP, but, CC doesn't recruit EKP till 'recoveredDiskFiles' completes which includes SS initialization. Given 'recoveredDiskFiles' is an optimization, the patch proposes deferring the 'recoveredDiskFiles' future completion until new Master recruitment is done as part of ClusterRecovery (unblock EKP singleton) Testing Ran 500K correctness runs: 20220618-055310-ahusain-foundationdb-61c431d467557551 Recorded failures doesn't seems to be related to the change.	2022-06-21 18:18:57 -07:00
Bharadwaj V.R	8cf2be030f	Build a TenantCache for use by DD (#7207 ) * Add an DD tenant-cache-assembly actor * Add basic tenant list monitoring for tenant cache. * Update DD tenant cache refresh to be more efficient and unit-testable * Remove the DD prefix in the tenant cache class name (and associated impl and UT class names); there is nothing specific to DD in it; DD uses it; other modules may use it in the future * Disable DD tenant awareness by default	2022-06-21 16:29:30 -07:00
Lukas Joswiak	9ca8a3c683	Reenable status json for dynamic knobs, add unit test	2022-06-21 11:43:05 -07:00
Dan Lambright	c48d569024	fix a fault injection bug in txn store recovery (#7405 ) * fix a fault injection bug in txn store recovery * Update LogSystemDiskQueueAdapter.actor.cpp typo * recoverLoc can be overwritten, so on reset use the stored range start	2022-06-21 12:33:58 -04:00
Josh Slocum	34e6a8f942	Merge pull request #7399 from sfc-gh-jslocum/bg_tenant_improvements Bg tenant improvements	2022-06-17 11:19:41 -05:00
Markus Pilman	5aacaf891c	Merge pull request #7321 from sfc-gh-ajbeamon/multiple-tenant-creation Support creating multiple tenants in the same transaction	2022-06-17 10:10:09 -06:00
Xiaoxi Wang	6bb4e341f9	Merge pull request #7110 from sfc-gh-xwang/features/ppw-pause-state Adding paused/running wiggling status to status json and also the last running/paused timestamp	2022-06-16 14:27:18 -07:00
Xiaoxi Wang	a311cc28cc	solve some comments	2022-06-16 11:07:21 -07:00
Josh Slocum	b3597ef3a8	Added plumbing for tenant-aware purge granules	2022-06-16 13:04:34 -05:00
Andrew Noyes	83aceb216c	Use absl::GetStackTrace for slow task profiler (#7374 ) * Make SlowTask workload runnable in joshua * Remove SignalSafeUnwind, and use absl::GetStackTrace for slow task profiler	2022-06-15 14:53:52 -07:00
Sreenath Bodagala	2c85bb71c1	- Do not try to figure out the sequencer locality if knob ENABLE_VERSION_VECTOR_HA_OPTIMIZATION is disabled.	2022-06-15 16:08:31 +00:00
Ata E Husain Bohra	8808d93813	Fix bugs in EncyrptKeyProxy actor (#7388 ) Description Major changes include: 1. GetEncryptByKeyIds cache elements can expire. 2. Update iterator after erasing an element during refresh encryption keys operation. Testing EncryptKeyProxyTest	2022-06-14 21:22:25 -07:00
Yao Xiao	7da26db342	[ShardedRocksDB] 4/N Support removeRange. (#7345 )	2022-06-14 13:52:03 -07:00
Yi Wu	6246664006	Support encrypting TxnStateStore (#7253 ) Adding encryption support for TxnStateStore. It is done by supporting encryption. for KeyValueStoreMemory. The encryption is currently done on operation level when the operations are being write to the underlying log file. See inline comment for the encrypted data format. This PR depends on #7252. It is part of the effort to support TLog encryption #6942.	2022-06-14 13:26:32 -07:00
Trevor Clinkenbeard	6bed046148	Merge pull request #7352 from sfc-gh-xwang/feature/ddtxn [DD testability enhancement] Create IDDTxnProcessor and simple refactoring	2022-06-13 16:01:13 -07:00
Xiaoxi Wang	ef0f415e3d	add option; change to shared_ptr	2022-06-13 13:55:48 -07:00
Andrew Noyes	013b290ca5	Don't fail test if log cursor times out during network partition (#7330 ) * Don't fail test if log cursor times out during network partition Also, exercise the codepath for handling timed_out in simulation, by reverting this knob buggification behavior to that of 07976993e7. * clang-format	2022-06-13 13:28:22 -07:00
Trevor Clinkenbeard	942d687506	Clean up includes in actor header files (#7331 ) * Remove unnecessary actorcompiler.h includes (from non-actor files) * Make AsyncFileChaos a non-actor header file * Add unactorcompiler.h include to the end of actor header files * Add missing actorcompiler.h includes to actor header files	2022-06-13 13:26:51 -07:00
Ata E Husain Bohra	a5d91fe18a	KmsConnector implementation to support KMS driven CipherKey TTL (#7334 ) * KmsConnector implementation to support KMS driven CipherKey TTL Description KMS CipherKeys can be of two types: 1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey shouldn't be used by the FDB. 2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would still want to refresh ciphers to support KMS cipher rotation feature. Patch proposes following change to incorporate support for above defined cipher-key types: 1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter' time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh & expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage, a caller of EKP API for a non-revocable key should continue using cached cipherKey until it expires. 2. Simplify KmsConnector API arena handling by using VectorRef to represent component structs and manage associated memory allocation/lifetime. Testing 1. EncryptKeyProxyTest 2. RESTKmsConnectorTest 3. SimKmsConnectorTest * KmsConnector implementation to support KMS driven CipherKey TTL Description diff-1: Set expireTS for baseCipherId indexed cache KMS CipherKeys can be of two types: 1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey shouldn't be used by the FDB. 2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would still want to refresh ciphers to support KMS cipher rotation feature. Patch proposes following change to incorporate support for above defined cipher-key types: 1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter' time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh & expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage, a caller of EKP API for a non-revocable key should continue using cached cipherKey until it expires. 2. Simplify KmsConnector API arena handling by using VectorRef to represent component structs and manage associated memory allocation/lifetime. Testing 1. EncryptKeyProxyTest 2. RESTKmsConnectorTest 3. SimKmsConnectorTest * KmsConnector implementation to support KMS driven CipherKey TTL Description diff-2: Fix Valgrind issues discovered runnign tests diff-1: Set expireTS for baseCipherId indexed cache KMS CipherKeys can be of two types: 1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey shouldn't be used by the FDB. 2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would still want to refresh ciphers to support KMS cipher rotation feature. Patch proposes following change to incorporate support for above defined cipher-key types: 1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter' time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh & expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage, a caller of EKP API for a non-revocable key should continue using cached cipherKey until it expires. 2. Simplify KmsConnector API arena handling by using VectorRef to represent component structs and manage associated memory allocation/lifetime. Testing 1. EncryptKeyProxyTest 2. RESTKmsConnectorTest 3. SimKmsConnectorTest * KmsConnector implementation to support KMS driven CipherKey TTL Description diff-3: Address review comment diff-2: Fix Valgrind issues discovered runnign tests diff-1: Set expireTS for baseCipherId indexed cache KMS CipherKeys can be of two types: 1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey shouldn't be used by the FDB. 2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would still want to refresh ciphers to support KMS cipher rotation feature. Patch proposes following change to incorporate support for above defined cipher-key types: 1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter' time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh & expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage, a caller of EKP API for a non-revocable key should continue using cached cipherKey until it expires. 2. Simplify KmsConnector API arena handling by using VectorRef to represent component structs and manage associated memory allocation/lifetime. Testing 1. EncryptKeyProxyTest 2. RESTKmsConnectorTest 3. SimKmsConnectorTest	2022-06-13 13:25:01 -07:00
Xiaoxi Wang	1de6c09307	use struct instead of tuple	2022-06-13 11:27:50 -07:00
Xiaoxi Wang	c12a7a30ed	Update fdbserver/DataDistributionQueue.actor.cpp Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2022-06-13 08:22:48 -07:00
Xiaoxi Wang	9604db3f10	Update fdbserver/DDTxnProcessor.h Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2022-06-13 08:19:14 -07:00
Xiaoxi Wang	fb66561bc4	format code	2022-06-09 14:43:09 -07:00
Xiaoxi Wang	7ee6808ebd	solve compiler warning	2022-06-09 14:32:24 -07:00
Xiaoxi Wang	b99bd45730	format code	2022-06-09 12:36:20 -07:00
Xiaoxi Wang	e5aa5fef22	merge upstream/main	2022-06-09 12:17:27 -07:00
Xiaoxi Wang	6ab12ea971	add storeTuple and unit test; refactor getSourceServersForRange	2022-06-09 12:16:12 -07:00
Yao Xiao	0bb02f6415	[Sharded RocksDB] 3/N Implement functions for range clear. (#7310 )	2022-06-09 10:50:39 -07:00
Jingyu Zhou	7acd184a38	Merge pull request #7339 from jzhou77/fix-status-memory Add rss_bytes to process memory and fix available_bytes calculation	2022-06-08 13:10:51 -07:00
Jingyu Zhou	b9ff6bc129	Address AJ's comments	2022-06-08 09:38:32 -07:00
Sreenath Bodagala	fe5f11358f	Merge pull request #7318 from sbodagala/main Introduce a knob that controls the placement of remote storage server commit versions in version vector	2022-06-08 12:18:15 -04:00
Markus Pilman	d141347500	Merge pull request #7282 from Doxense/fix-windows-tests Fix windows tests	2022-06-08 08:18:47 -06:00
Bharadwaj V.R	d4b983264b	Merge branch 'apple:main' into ddneat	2022-06-07 23:10:56 -07:00
Bharadwaj V.R	b40553556b	Merge pull request #7281 from sfc-gh-bvr/mcvf-nothrottle Remove last-limited check from DDMountainChopper and DDValleyFiller	2022-06-07 21:15:47 -07:00
Yi Wu	bbf8cb4b02	GetEncryptCipherKeys helper function and misc encryption changes (#7252 ) Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead. The PR also have other misc changes: * EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read. * API tweaks for BlobCipher * Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone. This PR is split out from #6942.	2022-06-07 21:00:13 -07:00
Jingyu Zhou	217ba24b6f	Add rss_bytes to process memory and fix available_bytes calculation Since memory is now limited with RSS size, add RSS size in status json for reporting. Also change how available_bytes is calculated from: (available + virtual memory) * process_limit / machine_limit to: (available memory) * process_limit / machine_limit	2022-06-07 16:44:14 -07:00
Andrew Noyes	1997e6057c	Fix a heap-use-after-free in a unit test (#7230 ) * Fix a heap-use-after-free in a unit test The data passed to IAsyncFile::write must remain valid until the future is ready. * Use holdWhile instead of a new state variable	2022-06-07 14:48:01 -07:00
Josh Slocum	a0bb585260	Merge pull request #7333 from sfc-gh-jslocum/blob_metadata_valgrind_fix fixes for blob metadata memory from valgrind	2022-06-07 15:24:11 -05:00
Andrew Noyes	1f8fc32f41	Save a memcpy in the tlog peek path (#7328 )	2022-06-07 13:22:56 -07:00
Xiaoxi Wang	21e7e6d2ba	add DDTxnProcessor (incomplete)	2022-06-07 11:58:16 -07:00
Josh Slocum	ae865027d6	fixes for blob metadata memory from valgrind	2022-06-07 13:50:11 -05:00
Xiaoxi Wang	541f98e111	create DDTxnProcessor	2022-06-07 11:48:59 -07:00
Sreenath Bodagala	96a88e3847	Merge remote-tracking branch 'apple-upstream/main'	2022-06-07 18:38:35 +00:00
A.J. Beamon	4f308b34fc	Fix an off-by-one error in determining whether to include the entire range in the conflict ranges when a reverse range read returns early due to limit.	2022-06-07 08:52:10 -07:00
Yao Xiao	5f1a061e3a	Disable rocksdb metrics. (#7327 )	2022-06-06 14:27:41 -07:00
Bharadwaj V.R	aa84f8925e	Merge branch 'apple:main' into mcvf-nothrottle	2022-06-06 13:18:11 -07:00
Dan Adkins	bd47f390bd	Add simulation test for three_data_hall configuration (#7305 ) * Add simulation test for 1 data hall + 1 machine failure case. * Disable BUGGIFY for DEGRADED_RESET_INTERVAL. A simulation test discovered a situation where machines attempting to connect to a dead coordinator (with a well-known endpoint) were getting themselves marked degraded. This flapping of the degraded state prevented recovery from completing, as it started over any time it noticed that tlogs on degraded hosts could be relocated to non-degraded ones. bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956	2022-06-06 13:14:49 -07:00
Bharadwaj V.R	990c789a5c	Increase quiet-database timeout when buggify is on; data-movements in simulation take longer than the timeout allows, and waiting for quiet-database does succeed when given some more time (#7290 )	2022-06-06 13:13:11 -07:00
Josh Slocum	a3289f9cab	adding tenant prefix to bg ranges call	2022-06-06 14:09:10 -05:00
Bharadwaj V.R	7f079a6c29	Merge branch 'apple:main' into mcvf-nothrottle	2022-06-06 12:03:13 -07:00

1 2 3 4 5 ...

9532 Commits