577 Commits

Author SHA1 Message Date
Markus Pilman
117ee637db Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-03-15 17:17:47 +01:00
Markus Pilman
bed799220a Addressed review comments, added test 2022-03-15 16:57:26 +01:00
Ata E Husain Bohra
944ec48415
Introduce a simulate EncryptKeyVaultProxy interface (#6576)
Description

Major changes proposed are:
1. Rename ServerKnob->ENABLE_ENCRYPT_KEY_PROXY to
   ServerKnob->ENABLE_ENCRYPTION. Approach simplifies enabling
   controlling encyrption code change using a single knob (desirable)
2. Implement EncyrptKeyVaultProxy simulated interface to assist
   validating encyrption workflows in simulation runs. The interface
   is leveraged to satisfy "encryption keys" lookup which otherwise
   gets satisfied by integrating organization preferred Encryption
   Key Management solution.

Testing

Unit test to validate the newly added code
2022-03-10 12:06:49 -08:00
Markus Pilman
8fac0081a8 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-03-09 11:00:00 +01:00
A.J. Beamon
5fa9d3e1b7 Add a tenant parameter to read and commit requests. Store a map of all tenants on commit proxy and storage servers. Add an option to require tenant mode. 2022-03-06 21:54:21 -08:00
Renxuan Wang
06b1d06d38 Support hostname in coordinators commands. 2022-02-24 23:02:29 -08:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Markus Pilman
cf31e14904 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-02-23 10:29:32 +01:00
Markus Pilman
102169ba33 Ran clang-format 2022-02-23 10:23:27 +01:00
Markus Pilman
dc973fb67e Allow List and first test 2022-02-22 11:15:16 +01:00
Vaidas Gasiunas
092b5cee4b MVC2.0: Rollback added code 2022-02-14 13:50:42 -08:00
Lukas Joswiak
d5a562e6b8 Fix dynamic knobs correctness issues 2022-02-09 13:43:32 -08:00
Ata E Husain Bohra
f3c3ab06f1 Add new FDB EncryptKeyProxy role
diff-1: Address review comments

Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.:%s
2022-01-25 23:12:49 -08:00
Ata E Husain Bohra
87ee4cf958 Add new FDB EncryptKeyProxy role
Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.
2022-01-25 17:38:27 -08:00
Ata E Husain Bohra
703364d146
Update cluster recovery documentation (#6255)
Patch updates code documentation to reflect the recent code
refactoring where ClusterController process drives recovery
instead of sequencer/master process.
2022-01-18 13:54:00 -08:00
Ata E Husain Bohra
936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9c9888e203421290959bd7f2c075d7f.
1.b. This reverts commit d174bb2e06bff01157d16c652073536c54d17f7f.
1.c. This reverts commit 30b05b469c87d9b526b427751c211fb5cf7ff9cd.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Aaron Molitor
30b05b469c Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit dfe9d184ff5dd66bdbbc5b984688ac3ebb15b901.
2021-12-24 11:25:51 -08:00
Aaron Molitor
d174bb2e06 Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit abd2959702b0027ab23b8d42d8082b79c3b197f3.
2021-12-24 11:25:51 -08:00
Aaron Molitor
bb17e194d9 Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit 1520390bc50614ae7583638c07c033739f40dbfb.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra
1520390bc5 Refactor: ClusterController driving cluster-recovery state machine
diff-1: Address Jingyu's review comments
 diff-2: Introduce ClusterRecovery actor to seperate out
         cluster recovery code

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Ata E Husain Bohra
abd2959702 Refactor: ClusterController driving cluster-recovery state machine
diff-1: Address Jingyu's review comments

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Ata E Husain Bohra
dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Andrew Noyes
def41697bf
Merge pull request #6083 from sfc-gh-tclinkenbeard/remove-temporaries
Avoid creating unnecessary temporary objects
2021-12-06 13:24:56 -08:00
Zhe Wu
dc88d3fa37 use CC_ENABLE_WORKER_HEALTH_MONITOR knob to guard remoteDCIsHealthy logic 2021-12-06 09:33:45 -08:00
sfc-gh-tclinkenbeard
d01a363e29 Avoid creating unnecessary temporary objects 2021-12-01 23:48:34 -08:00
Evan Tschannen
557186ed17
Merge pull request #5909 from sfc-gh-jfu/jfu-cc-request-dbinfo
Change dbinfo broadcast to be explicitly requested by the worker registration message
2021-11-16 15:01:42 -08:00
Evan Tschannen
964d0209ca
Merge pull request #5637 from sfc-gh-ljoswiak/features/data-loss-prevention
Data loss protection when joining new cluster
2021-11-15 15:26:32 -08:00
Vaidas Gasiunas
51b8ccf7d3 Merge remote-tracking branch 'apple/master' into notify-client-lib-changes 2021-11-10 18:40:34 +01:00
Lukas Joswiak
74cf64fe0f Sync cluster ID through ServerDBInfo 2021-11-09 12:29:48 -08:00
Lukas Joswiak
3e2c65bb11 Allow tlog to join another cluster but retain its data 2021-11-09 12:29:48 -08:00
Lukas Joswiak
30867750b5 Add protection against storage and tlog data deletion when joining a new cluster 2021-11-09 12:29:47 -08:00
Jon Fu
00f4bd8536 Check ccInterface against serverDbInfo's cc and make broadcast unconditional for first registration 2021-11-08 12:43:02 -05:00
Jon Fu
4e8625ccc0 retain old behaviour along with explicit request 2021-11-03 17:23:07 -04:00
Jon Fu
59f0a2c3e5 Change dbinfo broadcast to be explicitly requested by the worker registration message 2021-11-03 15:51:21 -04:00
Evan Tschannen
ee00135a6b skip good recruitment errors when doing simulation only validation 2021-11-01 13:24:15 -07:00
Evan Tschannen
78e36e7590 fix: simulation only validation could throw errors which would impact the behavior of the cluster controller 2021-11-01 13:24:15 -07:00
Evan Tschannen
ddf235713e strengthen assert 2021-10-28 16:40:30 -07:00
Evan Tschannen
4d8ee2ed33 fix: simple recruitment could succeed with less than the required replication factor 2021-10-28 16:38:04 -07:00
Vaidas Gasiunas
875824b186 MVC2.0: Notify clients about relevant changes of client libraries 2021-10-27 23:43:40 +02:00
Josh Slocum
0ff8ddc2b6 Merge branch 'master' into blob_full_clean 2021-10-25 13:38:48 -05:00
A.J. Beamon
e882eb33fc Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem. 2021-10-22 11:05:18 -07:00
Josh Slocum
773886515e Merge branch 'feature-range-feed' into blob_full_clean 2021-10-22 11:07:51 -05:00
Josh Slocum
912ef76f1c cleanup before merge 2021-10-18 17:11:14 -05:00
Suraj Gupta
5466bdb569 Gate more entry points to BM recruitment. 2021-10-18 15:04:22 -04:00
A.J. Beamon
507a09893c
Add ClientCount to ClusterControllerMetrics (#5748) 2021-10-17 20:47:11 -07:00
Josh Slocum
5f0ec0612a Merge branch 'feature-range-feed' into blob_full 2021-10-13 15:44:35 -05:00
Suraj Gupta
2ec8781224 Merge knobs into one. 2021-10-13 14:00:37 -04:00
Suraj Gupta
5a6a052c55 Add a knob to gate blob-related work. 2021-10-13 09:48:02 -04:00
Zhe Wu
645cfc85a0 fix remote health variables declaration order 2021-10-07 21:54:25 -07:00
Zhe Wu
6540b6eec5 Some improvements for grey failure failover 2021-10-07 20:42:55 -07:00