331 Commits

Author SHA1 Message Date
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Trevor Clinkenbeard
bdff100ef9
Merge pull request #6582 from sfc-gh-tclinkenbeard/global-tag-throttling2
Various `TagThrottler` enhancements
2022-03-23 12:54:30 -07:00
Evan Tschannen
4a085fc844
Merge pull request #6602 from apple/blob_integration
Blob integration
2022-03-23 12:02:43 -07:00
sfc-gh-tclinkenbeard
0726832e80 Merge remote-tracking branch 'origin/main' into global-tag-throttling2 2022-03-23 11:05:02 -07:00
Jingyu Zhou
0c88be0393 Refactor resolution balancing into separate files 2022-03-23 09:57:31 -07:00
Josh Slocum
f27475e2f4 Merge branch 'main' into blob_integration 2022-03-22 11:41:58 -05:00
Xiaoge Su
3520d7f73c fixup! Resort the FDBSERVER_SRCS and add missing files 2022-03-17 16:53:10 -07:00
Xiaoge Su
99b030c2f6 Allow the TOML file assign knobs during test
In this patch, for a given test, it is possible to override the knob
values, e.g.

[[test]]

    [[test.knobs]]
    watch_timeout = 999

will set the client knob WATCH_TIMEOUT to 999 during the test. The
original value will be recovered after the test is over.
2022-03-17 16:53:10 -07:00
Josh Slocum
37e7c80f26 Merge branch 'main' into blob_integration 2022-03-17 18:45:42 -05:00
A.J. Beamon
05495908b8 Implement some tenant tests 2022-03-17 12:10:18 -07:00
sfc-gh-tclinkenbeard
faecd8a8f8 Merge remote-tracking branch 'origin/main' into global-tag-throttling2 2022-03-16 13:45:23 -07:00
He Liu
c3a68d661e
Physical Shard Move (#6264)
Physical Shard Move part I: Checkpoint creation, transfer and restore.
2022-03-15 13:03:23 -07:00
sfc-gh-tclinkenbeard
af69058596 Move RkTagThrottleCollection class into its own files 2022-03-11 12:09:15 -04:00
sfc-gh-tclinkenbeard
e61c26758c Move TransactionTagCounter implementation into .cpp file 2022-03-11 12:09:15 -04:00
sfc-gh-tclinkenbeard
91d1a172d8 Move TransactionTagCounter into separate file 2022-03-11 12:09:15 -04:00
Ata E Husain Bohra
944ec48415
Introduce a simulate EncryptKeyVaultProxy interface (#6576)
Description

Major changes proposed are:
1. Rename ServerKnob->ENABLE_ENCRYPT_KEY_PROXY to
   ServerKnob->ENABLE_ENCRYPTION. Approach simplifies enabling
   controlling encyrption code change using a single knob (desirable)
2. Implement EncyrptKeyVaultProxy simulated interface to assist
   validating encyrption workflows in simulation runs. The interface
   is leveraged to satisfy "encryption keys" lookup which otherwise
   gets satisfied by integrating organization preferred Encryption
   Key Management solution.

Testing

Unit test to validate the newly added code
2022-03-10 12:06:49 -08:00
Tao Lin
e2c7c30faf
GetMappedRange support serializable & check RYW & continuation (#6181) 2022-03-10 10:05:44 -08:00
Josh Slocum
e71b3533f9 Merge branch 'main' into blob_integration 2022-03-09 08:59:56 -06:00
Jon Fu
2e2c8bf88c Merge branch 'main' of github.com:apple/foundationdb into jfu-grv-cache 2022-02-22 12:43:55 -05:00
Trevor Clinkenbeard
82bbfa8aee
Merge pull request #6395 from sfc-gh-tclinkenbeard/global-tag-throttling
Create `TagThrottler` class
2022-02-18 13:17:28 -08:00
Josh Slocum
38a75a8b89 Merge branch 'main' into blob_integration 2022-02-17 17:47:38 -06:00
Jon Fu
d399daebed Merge branch 'main' of github.com:apple/foundationdb into jfu-grv-cache 2022-02-15 15:09:40 -05:00
sfc-gh-tclinkenbeard
00f12687c6 Add TagThrottler class 2022-02-14 16:03:37 -08:00
Vaidas Gasiunas
092b5cee4b MVC2.0: Rollback added code 2022-02-14 13:50:42 -08:00
sfc-gh-tclinkenbeard
d4b4479399 Rename RatekeeperData.actor.cpp to Ratekeeper.actor.cpp 2022-02-14 12:35:50 -08:00
sfc-gh-tclinkenbeard
9beae3fb64 Add RatekeeperData class with refactored implementation 2022-02-14 11:49:45 -08:00
Jon Fu
7492b755d8 Merge branch 'main' of github.com:apple/foundationdb into jfu-grv-cache 2022-02-14 14:06:49 -05:00
sfc-gh-tclinkenbeard
3f0e2ae62e Merge remote-tracking branch 'origin/main' into dd-refactor 2022-02-09 14:29:16 -08:00
Aaron Molitor
96dd86ebf8 update RocskDB and Boost
add Finduring, and include into fdbserver
add BOOST asio/uring settings to fdbserver compile
move portable rocks, liburing up to be configurable at build time.
2022-02-09 10:48:18 -06:00
Jon Fu
9c0a512cf5 Merge branch 'main' of github.com:apple/foundationdb into jfu-grv-cache 2022-02-07 14:51:12 -05:00
sfc-gh-tclinkenbeard
6ab4bc0a06 Move all TC*Info classes into TCInfo.h 2022-02-04 10:59:01 -08:00
sfc-gh-tclinkenbeard
0c8834ff66 Move TCServerInfo into its own files 2022-02-04 10:20:11 -08:00
sfc-gh-tclinkenbeard
68ec591cf9 Move DDTeamCollection into its own files 2022-02-04 00:39:42 -08:00
Ata E Husain Bohra
591ef57857
Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor (#6314)
* Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor

Major changes proposed are:
1. Refactor StreamCipher code to enable instantiation of
   multiple encryption keys. However, code still retains
   a globalEncryption key semantics used in Backup file
   encryption usecase.
2. Enhance StreamCipher to provide HMAC signature digest
   generation. Further, the class implements HMAC encryption
   key derivation function.
3. Upgrade StreamCipher to use AES 256 GCM mode from currently
   supported AES 128 GCM mode.
   Note: The code changes the encryption key size, however, the
         feature is NOT currently in use, hence, should be OK.
3. Add EncryptionOps validation and benchmark toml supported
   workload, it does the following:
   a. Allow user to configure encrypt-decrypt of a fixed size
      buffer or variable size buffer [100, 512K]
   b. Allow user to configure number of interactions of the runs,
      in each iteration: generate random data, derive an encryption
      key using HMAC SHA256 method, encrypt data and
      then decrypt data. It collects following metrics:
    i) time taken to derive encryption key.
    ii) time taken to encrypt the buffer.
    iii) time taken to decrypt the buffer.
    iv) total bytes encrypted and/or decrypted
   c. Along with stats it basic basic validations on the encrypted
      and decrypted buffer
   d. On completion for test, records the above mentioned metrics
      in trace files.
2022-01-31 19:52:44 -06:00
A.J. Beamon
6affc58e97 Rename high contention allocator implementation in fdbclient 2022-01-31 14:25:38 -08:00
A.J. Beamon
027fe80594 Added a generic high contention allocator implementation to fdbclient. This is an adapted version of the flow bindings HCA implementation. 2022-01-28 15:34:30 -08:00
Ata E Husain Bohra
87ee4cf958 Add new FDB EncryptKeyProxy role
Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.
2022-01-25 17:38:27 -08:00
Josh Slocum
14cc0a8b02 Got BlobGranuleCorrectnessWorkload passing a single test 2022-01-25 15:46:29 -06:00
Jon Fu
915e2f6c1c Merge branch 'main' of github.com:apple/foundationdb into jfu-grv-cache 2022-01-20 16:17:20 -05:00
Ata E Husain Bohra
936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9c9888e203421290959bd7f2c075d7f.
1.b. This reverts commit d174bb2e06bff01157d16c652073536c54d17f7f.
1.c. This reverts commit 30b05b469c87d9b526b427751c211fb5cf7ff9cd.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Aaron Molitor
bb17e194d9 Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit 1520390bc50614ae7583638c07c033739f40dbfb.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra
1520390bc5 Refactor: ClusterController driving cluster-recovery state machine
diff-1: Address Jingyu's review comments
 diff-2: Introduce ClusterRecovery actor to seperate out
         cluster recovery code

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Suraj Gupta
23de6fa39b Get it compiled 2021-12-10 16:43:58 -06:00
Sam Gwydir
31c0eef69c Add Minicycle Workload 2021-12-06 15:46:40 -08:00
Tao Lin
9b0a9c4503
Return error when getRangeAndFlatMap has more & Improve simulation tests (#6029) 2021-12-03 12:50:07 -08:00
Jon Fu
00ffd99938 Clone sideband workload to check consistency of cache 2021-12-02 14:17:59 -05:00
Andrew Noyes
b6fd402a3c Add option to use boost or libcoro
By default, use boost everywhere except windows and linux x86 (for
performance reasons)
2021-11-29 13:14:15 -08:00
Steve Atherton
6e43dde613 Fixed bad merge resolution. 2021-11-16 18:04:37 -08:00
Steve Atherton
035e0d6e52
Merge branch 'master' into bit-flipping-workload 2021-11-16 14:42:22 -08:00
Tao Lin
fdb3b72e35 Introduce GetRangeAndFlatMap to push computations down to FDB
Re-introduce #5609
2021-11-09 13:52:28 -08:00