foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-05-14 18:02:31 +08:00

Author	SHA1	Message	Date
A.J. Beamon	1519f24f77	Merge branch 'main' into feature-metacluster	2022-07-07 09:35:40 -07:00
A.J. Beamon	c4b0f6eaae	Add an internal C API to support connection to a cluster using a connection string (#7438 ) * Add an internal C API to support memory connection records * Track shared state in the client using a unique and immutable cluster ID from the cluster * Add missing code to store the clusterId in the database state object * Update some arguments to pass by const&	2022-07-07 10:12:49 +02:00
A.J. Beamon	e1a93988ef	Merge branch 'main' into feature-metacluster	2022-06-28 14:58:07 -07:00
Zhe Wu	3cb587edfb	Remove explicit degraded peer recovery since this may be false positive	2022-06-23 09:38:27 -07:00
A.J. Beamon	9f3819752f	Change the command to create a metacluster from using 'configure tenant_mode=management' to 'metacluster create <NAME>'. Distribute this name to all processes in a metacluster. Eliminate the tenant mode entirely from metacluster clusters, instead relying on a metacluster registration key.	2022-06-22 12:15:43 -07:00
Ata E Husain Bohra	e1ca0ef9a2	Defer recoveredDiskFiles wait if Encryption data at-rest is enabled (#7414 ) * Defer recoveredDiskFiles wait if Encryption data at-rest is enabled Description In the current code ClusterController startup wait for 'recoveredDiskFiles' future to complete before triggered 'clusterControllerCore' actor, which inturn starts 'EncryptKeyProxy' (EKP) actor resposible to fetch/refresh encryption keys needed for ClusterRecovery as well interactions with KMS. Patch addresses a circular dependency where StorageServer initialization depends on EKP, but, CC doesn't recruit EKP till 'recoveredDiskFiles' completes which includes SS initialization. Given 'recoveredDiskFiles' is an optimization, the patch proposes deferring the 'recoveredDiskFiles' future completion until new Master recruitment is done as part of ClusterRecovery (unblock EKP singleton) Testing Ran 500K correctness runs: 20220618-055310-ahusain-foundationdb-61c431d467557551 Recorded failures doesn't seems to be related to the change.	2022-06-21 18:18:57 -07:00
Yi Wu	bbf8cb4b02	GetEncryptCipherKeys helper function and misc encryption changes (#7252 ) Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead. The PR also have other misc changes: * EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read. * API tweaks for BlobCipher * Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone. This PR is split out from #6942.	2022-06-07 21:00:13 -07:00
Lukas Joswiak	7972ef48d6	Refactor profiling special keys to use GlobalConfig The special keys `\xff\xff/management/profiling/client_txn_sample_rate` and `\xff\xff/management/profiling/client_txn_size_limit` are deprecated in FDB 7.2. However, GlobalConfig was introduced in 7.0, and reading and writing these keys through the special key space was broken in 7.0+. This change modifies the profiling special keys to use GlobalConfig behind the scenes, fixing the broken special keys. The following Python script was used to make sure both GlobalConfig and the profiling special key can be used to read/write/clear profiling data: ``` import fdb import time fdb.api_version(710) @fdb.transactional def set_sample_rate(tr): tr.options.set_special_key_space_enable_writes() # Alternative way to write the key #tr[b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate'] = fdb.tuple.pack((5.0,)) tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = '5.0' @fdb.transactional def clear_sample_rate(tr): tr.options.set_special_key_space_enable_writes() # Alternative way to clear the key #tr.clear(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate') tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = 'default' @fdb.transactional def get_sample_rate(tr): print(tr.get(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate')) # Alternative way to read the key #print(tr.get(b'\xff\xff/management/profiling/client_txn_sample_rate')) fdb.options.set_trace_enable() fdb.options.set_trace_format('json') db = fdb.open() get_sample_rate(db) # None (or 'default') set_sample_rate(db) time.sleep(1) # Allow time for global config changes to propagate get_sample_rate(db) # 5.0 clear_sample_rate(db) time.sleep(1) get_sample_rate(db) # None (or 'default') ``` It can be run with `PYTHONPATH=./bindings/python/ python profiling.py`, and reads the `fdb.cluster` file in the current directory. ``` $ PYTHONPATH=./bindings/python/ python sps.py None 5.000000 None ```	2022-05-10 10:51:08 -07:00
Josh Slocum	db6d7396ca	Add delay between quickly re-recruiting the same singleton process, to avoid recruit thrashing when there are temporarily multiple cluster controllers (#7000 )	2022-04-28 15:45:09 -07:00
Renxuan Wang	c69a07a858	Check in the new Hostname logic. (#6926 ) * Revert #6655. 20220407-031010-renxuan-c101052c21da8346 compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan * Revert #6271. 20220407-051532-renxuan-470f0fe6aac1c217 compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan * Revert #6266. Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable. 20220407-175119-renxuan-55d30ee1a4b42c2f compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan * Add hostname to coordinator interfaces. * Turn on the new hostname logic. * Add the corresponding change in config txns. The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first. Passed correctness tests. * Return error when hostnames cannot be resolved in coordinators command. * Minor fixes.	2022-04-27 21:54:13 -07:00
Renxuan Wang	e40cc8722c	A few hostname improvements. (#6825 ) * Add tryResolveHostnames() in connection string. * Add missing hostname to related interfaces. * Do not pass RequestStream into GetReplyFromHostname() functions. Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults. Add dynamic hostname resolve and reconnect intervals. * Address comments.	2022-04-20 13:42:46 -07:00
Vaidas Gasiunas	ca563466a6	Merge pull request #6401 from sfc-gh-mpilman/features/private-request-streams Features/private request streams	2022-04-11 18:29:06 +02:00
Ata E Husain Bohra	933e5bbd2e	EncryptKeyProxy server APIs for simulation runs. (#6727 ) * EncryptKeyProxy server APIs for simulation runs. Description diff-2: FlowSingleton util class Bug fixes diff-1: Expected errors returned to the caller Major changes proposed are: 1. EncryptKeyProxy server APIs: 1.1. Lookup Cipher details via BaseCipherId 1.2. Lookup latest Cipher details via encryption domainId. 2. EncyrptKeyProxy implements caches indexed by: baseCipherId & encyrptDomainId 3. Periodic task to refresh domainId indexed cache to support 'limiting cipher lifetime' abilities if supported by external KMS solutions. Testing EncyrptKeyProxyTest workload to validate the newly added code.	2022-04-11 09:08:42 -07:00
Markus Pilman	16467262f0	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-04-10 14:12:37 -06:00
Renxuan Wang	938e8ed996	Do not throw lookup_failed when resolving fails. Instead, return an empty Optional<NetworkAddress>. For resolveWithRetry(), still return NetworkAddress because it retries until succeed.	2022-04-08 14:21:49 -07:00
Renxuan Wang	0f894509d9	Simplify the isCoordinator check in registerWorker.	2022-04-08 14:21:49 -07:00
Markus Pilman	7631d299bf	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-04-08 09:58:56 -06:00
Zhe Wu	e017faa6c4	grey failure detection account for the case where the connection between primary and satellite DC becomes bad.	2022-04-07 17:34:13 -07:00
Markus Pilman	bf956f5630	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-04-07 13:29:27 -06:00
Renxuan Wang	2a59c5fd4e	Workers should monitor coordinators in submitCandidacy(). (#6655 ) * Workers should monitor coordinators in submitCandidacy(). * Change re-resolve delay to a knob.	2022-03-24 19:20:42 -07:00
Josh Slocum	f27475e2f4	Merge branch 'main' into blob_integration	2022-03-22 11:41:58 -05:00
sfc-gh-tclinkenbeard	a71099471b	Update copyright header dates	2022-03-21 13:36:23 -07:00
Josh Slocum	37e7c80f26	Merge branch 'main' into blob_integration	2022-03-17 18:45:42 -05:00
Josh Slocum	0f9e88572a	Cleaning up debugging and fixing race in blob manager recruitment	2022-03-17 14:57:43 -05:00
Markus Pilman	117ee637db	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-03-15 17:17:47 +01:00
Markus Pilman	bed799220a	Addressed review comments, added test	2022-03-15 16:57:26 +01:00
Ata E Husain Bohra	944ec48415	Introduce a simulate EncryptKeyVaultProxy interface (#6576 ) Description Major changes proposed are: 1. Rename ServerKnob->ENABLE_ENCRYPT_KEY_PROXY to ServerKnob->ENABLE_ENCRYPTION. Approach simplifies enabling controlling encyrption code change using a single knob (desirable) 2. Implement EncyrptKeyVaultProxy simulated interface to assist validating encyrption workflows in simulation runs. The interface is leveraged to satisfy "encryption keys" lookup which otherwise gets satisfied by integrating organization preferred Encryption Key Management solution. Testing Unit test to validate the newly added code	2022-03-10 12:06:49 -08:00
Josh Slocum	4b254d259c	Ensuring BM split retry is idempotent	2022-03-10 11:54:57 -06:00
Josh Slocum	e71b3533f9	Merge branch 'main' into blob_integration	2022-03-09 08:59:56 -06:00
Markus Pilman	8fac0081a8	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-03-09 11:00:00 +01:00
A.J. Beamon	5fa9d3e1b7	Add a tenant parameter to read and commit requests. Store a map of all tenants on commit proxy and storage servers. Add an option to require tenant mode.	2022-03-06 21:54:21 -08:00
Josh Slocum	623db663dc	don't reset watch config transaction	2022-02-25 08:48:52 -06:00
Renxuan Wang	06b1d06d38	Support hostname in coordinators commands.	2022-02-24 23:02:29 -08:00
A.J. Beamon	250a88e682	Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement.	2022-02-24 12:25:52 -08:00
Markus Pilman	cf31e14904	Merge remote-tracking branch 'origin/main' into features/private-request-streams	2022-02-23 10:29:32 +01:00
Markus Pilman	102169ba33	Ran clang-format	2022-02-23 10:23:27 +01:00
Markus Pilman	dc973fb67e	Allow List and first test	2022-02-22 11:15:16 +01:00
Josh Slocum	38a75a8b89	Merge branch 'main' into blob_integration	2022-02-17 17:47:38 -06:00
Vaidas Gasiunas	092b5cee4b	MVC2.0: Rollback added code	2022-02-14 13:50:42 -08:00
Lukas Joswiak	d5a562e6b8	Fix dynamic knobs correctness issues	2022-02-09 13:43:32 -08:00
Ata E Husain Bohra	f3c3ab06f1	Add new FDB EncryptKeyProxy role diff-1: Address review comments Major changes includes: 1. Add a new FDB role responsible- EncyrptKeyProxy. The role is responsible to expose APIs to fetch encyrption keys interacting with external Encryption KeyManager interface. 2. The process is a FDB singleton process following similar recruitment rules as other singleton processes in the system. 3. Code to recruit the worker process; given the encryption keys are needed during recovery (decode TLog records), for now the process is co-located in same datacenter as ClusterController. 4. Skeleton process actor code; more functionality will be added in subsequent PRs. NOTE: The code is protected under a SERVER_KNOB with the default value as 'false' for now.:%s	2022-01-25 23:12:49 -08:00
Ata E Husain Bohra	87ee4cf958	Add new FDB EncryptKeyProxy role Major changes includes: 1. Add a new FDB role responsible- EncyrptKeyProxy. The role is responsible to expose APIs to fetch encyrption keys interacting with external Encryption KeyManager interface. 2. The process is a FDB singleton process following similar recruitment rules as other singleton processes in the system. 3. Code to recruit the worker process; given the encryption keys are needed during recovery (decode TLog records), for now the process is co-located in same datacenter as ClusterController. 4. Skeleton process actor code; more functionality will be added in subsequent PRs. NOTE: The code is protected under a SERVER_KNOB with the default value as 'false' for now.	2022-01-25 17:38:27 -08:00
Ata E Husain Bohra	703364d146	Update cluster recovery documentation (#6255 ) Patch updates code documentation to reflect the recent code refactoring where ClusterController process drives recovery instead of sequencer/master process.	2022-01-18 13:54:00 -08:00
Ata E Husain Bohra	936bf5336a	Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191 ) * Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine"" Major changes includes: 1. Re-revert Sequencer refactor commits listed below (in listed order): 1.a. This reverts commit bb17e194d9c9888e203421290959bd7f2c075d7f. 1.b. This reverts commit d174bb2e06bff01157d16c652073536c54d17f7f. 1.c. This reverts commit 30b05b469c87d9b526b427751c211fb5cf7ff9cd. 2. Update Status.actor to track ClusterController interface to track recovery status. 3. Introduce a ServerKnob to define "cluster recovery trace event" prefix; for now keeping it as "Master", however, it should allow smooth transition to "Cluster" prefix as it seems more appropriate.	2022-01-06 12:15:51 -08:00
Josh Slocum	bc69521a91	Several fixes with restarting BW/BM	2022-01-05 12:48:53 -06:00
Aaron Molitor	30b05b469c	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit dfe9d184ff5dd66bdbbc5b984688ac3ebb15b901.	2021-12-24 11:25:51 -08:00
Aaron Molitor	d174bb2e06	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit abd2959702b0027ab23b8d42d8082b79c3b197f3.	2021-12-24 11:25:51 -08:00
Aaron Molitor	bb17e194d9	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit 1520390bc50614ae7583638c07c033739f40dbfb.	2021-12-24 11:25:51 -08:00
Ata E Husain Bohra	1520390bc5	Refactor: ClusterController driving cluster-recovery state machine diff-1: Address Jingyu's review comments diff-2: Introduce ClusterRecovery actor to seperate out cluster recovery code At present, cluster recovery process consists of following steps: 1. ClusterController clusterWatchDatabase actor recruits master/sequencer process. 2. Sequencer process implements the cluster recovery state machine, responsible to recruit all other processes as well restore the cluster state. Patch proposes a scheme where the cluster recovery state machine is implemented and driven by the ClusterController process instead of the Sequencer process. Advantages of the scheme could be: 1. Simplified design where ClusterController recruits "sequencer" process like other worker processes compared to current scheme where "sequencer" process gets special treatment. In newer scheme sequencer is responsible for maintaining/providing "committed version" (as expected). 2. ClusterController is responsible for worker processes recruitment, the sequencer though orchestrating the recovery state machine, it need to reachout to the ClusterController for recruiting worker processes etc. NOTE: Patch has moved the recovery state machine code from 'sequencer' -> 'cluster-controller' process, however, necessary updates were done for both functionality as well as performance improvement reasons. Next Steps: Cluster recovery documentation will be updated in near future.	2021-12-22 14:06:27 -08:00
Ata E Husain Bohra	abd2959702	Refactor: ClusterController driving cluster-recovery state machine diff-1: Address Jingyu's review comments At present, cluster recovery process consists of following steps: 1. ClusterController clusterWatchDatabase actor recruits master/sequencer process. 2. Sequencer process implements the cluster recovery state machine, responsible to recruit all other processes as well restore the cluster state. Patch proposes a scheme where the cluster recovery state machine is implemented and driven by the ClusterController process instead of the Sequencer process. Advantages of the scheme could be: 1. Simplified design where ClusterController recruits "sequencer" process like other worker processes compared to current scheme where "sequencer" process gets special treatment. In newer scheme sequencer is responsible for maintaining/providing "committed version" (as expected). 2. ClusterController is responsible for worker processes recruitment, the sequencer though orchestrating the recovery state machine, it need to reachout to the ClusterController for recruiting worker processes etc. NOTE: Patch has moved the recovery state machine code from 'sequencer' -> 'cluster-controller' process, however, necessary updates were done for both functionality as well as performance improvement reasons. Next Steps: Cluster recovery documentation will be updated in near future.	2021-12-22 14:06:27 -08:00

1 2 3 4 5 ...

618 Commits