81 Commits

Author SHA1 Message Date
Bharadwaj V.R
949f1f1c3e Switch to testing MIN_AVAILABLE_SPACE 2022-02-16 11:33:07 -08:00
Bharadwaj V.R
3fe6a952f1 Merge with upstream tcinfo refactor and move the server knob init to be adjacent to related knobs 2022-02-16 10:28:55 -08:00
Bharadwaj V.R
fe03e6f822 Introduce a new server knob and use it to test if storage servers are near the min bar for available space 2022-02-15 22:43:06 -08:00
Trevor Clinkenbeard
ef68e6fe0d
Merge pull request #6353 from sfc-gh-ljoswiak/fixes/dynamic-knobs
Fix dynamic knobs correctness issues
2022-02-10 22:13:02 -08:00
Zhe Wang
d684508540 Add RatekeeperLimitReasonDetails traceevent for RK 2022-02-10 13:59:47 -08:00
Lukas Joswiak
d5a562e6b8 Fix dynamic knobs correctness issues 2022-02-09 13:43:32 -08:00
Ata E Husain Bohra
87ee4cf958 Add new FDB EncryptKeyProxy role
Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.
2022-01-25 17:38:27 -08:00
Neethu Haneesha Bingi
162bce7a58 Rocksdb write rate limiter. 2022-01-18 13:23:00 -08:00
Neethu Haneesha Bingi
ef4038fe8d Rocksdb read range iterator pool to reuse iterators. 2022-01-18 02:05:21 -08:00
Ata E Husain Bohra
936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9c9888e203421290959bd7f2c075d7f.
1.b. This reverts commit d174bb2e06bff01157d16c652073536c54d17f7f.
1.c. This reverts commit 30b05b469c87d9b526b427751c211fb5cf7ff9cd.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Neethu Haneesha Bingi
1f30368e71 KeyValueStoreRocksDB histograms to track latencies 2021-12-21 23:09:46 -08:00
Steve Atherton
bed25f9571 Delay prioritized eviction of updated pages until after commit completes. 2021-11-28 21:03:44 -08:00
Steve Atherton
508429f30d
Redwood chunked file growth and low priority IO starvation prevention (#5936)
* Redwood files now growth in large page chunks controlled by a knob to reduce truncate() calls for expansion.   PriorityMultiLock has limit on consecutive same-priority lock release.  Increased Redwood max priority level to 3 for more separation at higher BTree levels.

* Simulation fix, don't mark certain IO timeout errors as injected unless the simulated process has been set to have an unreliable disk.

* Pager writes now truncate gradually upward, one chunk at a time, in response to writes, which wait on only the necessary truncate operations.   Increased buggified chunk size because truncate can be very slow in simulation.

* In simulation, ioTimeoutError() and ioDegradedOrTimeoutError() will wait until at least the target timeout interval past the point when simulation is sped up.

* PriorityMultiLock::toString() prints more info and is now public.

* Added queued time to PriorityMultiLock.

* Bug fix to handle when speedUpSimulation changes later than the configured time.

* Refactored mutation application in leaf nodes to do fewer comparisons and do in place value updates if the new value is the same size as the old value.

* Renamed updatingInPlace to updatingDeltaTree for clarity.  Inlined switchToLinearMerge() since it is only used in one place.

* Updated extendToCover to be more clear by passing in the old extension future as a parameter.  Fixed initialization warning.
2021-11-12 13:47:07 -08:00
Daniel Smith
66520eb1c1 Utilize read types to do selective throttling 2021-11-10 11:51:04 -05:00
Tao Lin
fdb3b72e35 Introduce GetRangeAndFlatMap to push computations down to FDB
Re-introduce #5609
2021-11-09 13:52:28 -08:00
Tao Lin
586cc3b102
Revert "Introduce GetRangeAndFlatMap to push computations down to FDB" 2021-11-04 08:46:56 -07:00
Tao Lin
0853661d13 Introduce getRangeAndHop to push computations down to FDB 2021-11-03 13:21:16 -07:00
Xiaoxi Wang
1a2a838df3 add knob 2021-10-27 09:08:37 -07:00
Evan Tschannen
2208b04174
Merge pull request #5855 from sfc-gh-etschannen/blob_full_clean
Blob Granules V0
2021-10-26 09:57:35 -07:00
Lukas Joswiak
c96f560cbe Verify rollback of a single version in simulation, other small fixes 2021-10-25 12:03:22 -07:00
Josh Slocum
0ff8ddc2b6 Merge branch 'master' into blob_full_clean 2021-10-25 13:38:48 -05:00
Steve Atherton
d153519188
Merge pull request #5813 from sfc-gh-jslocum/ss_ebrake_streaming_fix
Fixes to ss e-brake, tlog streaming, and their interaction
2021-10-22 10:46:17 -07:00
Josh Slocum
773886515e Merge branch 'feature-range-feed' into blob_full_clean 2021-10-22 11:07:51 -05:00
Zhe Wu
0cf829ef91 Reduce restore error message 2021-10-20 14:02:48 -07:00
Josh Slocum
8dd7f8f447 Fixes to ss e-brake, tlog streaming, and their interaction 2021-10-20 10:48:29 -05:00
Josh Slocum
912ef76f1c cleanup before merge 2021-10-18 17:11:14 -05:00
A.J. Beamon
507a09893c
Add ClientCount to ClusterControllerMetrics (#5748) 2021-10-17 20:47:11 -07:00
Josh Slocum
5f0ec0612a Merge branch 'feature-range-feed' into blob_full 2021-10-13 15:44:35 -05:00
Zhe Wu
c07a07dbbe Take uptime into account when making failover decision 2021-10-07 11:19:34 -07:00
Zhe Wu
62197faa46 Add more comments to the code 2021-10-07 11:19:34 -07:00
Zhe Wu
c0fbe5471f Implement the core logic of grey failure triggered failover 2021-10-07 11:19:34 -07:00
Suraj Gupta
4d54669ccd Recruit the blob workers via blob manager.
In this PR, the blob manager now recruits blob workers
(via communication with the cluster controller). Blob workers
are onboarded as blob worker processes enter the cluster.
2021-10-04 11:07:08 -04:00
Suraj Gupta
5fa6c687d6 Add blob manager as a singleton. 2021-09-23 10:45:37 -04:00
Suraj Gupta
72edcd8d73 Address PR comments.
Revert knob name change, fix comparison between new and old
recruitments, and get rid of empty `if` block.
2021-09-22 16:56:34 -05:00
Suraj Gupta
0b6fecddbc Refactor logic for recruiting singletons.
This commit refactors the logic for recruiting singletons,
which is done by the ClusterController. This allows for far
easier additions of new singletons in the future, and also
cleans up the code.

Also, the logic for recruiting DD was changed to mirror
the logic for recruiting RK. Although the logic for RK
allows there to be many RKs existing at once, the moveKeysLock
mechanism used by DD still prevents multiple DDs existing at once.
2021-09-22 16:56:18 -05:00
Suraj Gupta
6533678f0d Address PR comments.
Revert knob name change, fix comparison between new and old
recruitments, and get rid of empty `if` block.
2021-09-20 14:26:42 -05:00
Suraj Gupta
fe098b3b11 Refactor logic for recruiting singletons.
This commit refactors the logic for recruiting singletons,
which is done by the ClusterController. This allows for far
easier additions of new singletons in the future, and also
cleans up the code.

Also, the logic for recruiting DD was changed to mirror
the logic for recruiting RK. Although the logic for RK
allows there to be many RKs existing at once, the moveKeysLock
mechanism used by DD still prevents multiple DDs existing at once.
2021-09-20 14:26:42 -05:00
Josh Slocum
c2d1d1704f Merge branch 'feature-range-feed' into blob_full 2021-09-10 11:21:52 -05:00
Josh Slocum
eb76343dfb Added blob granule reassignment and splitting 2021-09-08 14:09:14 -05:00
Steve Atherton
be440ab954
Merge pull request #5260 from FuhengZhao/RedwoodHistogram
Redwood local histograms
2021-08-26 12:05:44 -07:00
Josh Slocum
5259af787d Switched blob implementation to use backup container 2021-08-24 13:47:47 -05:00
Fuheng Zhao
b65a66fab7 log redwood histogram seperatly 2021-08-24 09:57:39 -07:00
Neethu Haneesha Bingi
02b3ed3ff1 Adding deadline option to rocksdb calls. 2021-08-19 14:11:28 -07:00
Neethu Haneesha Bingi
fbb393f998 Added readrange timeoout check and rocksdb read deadline option. 2021-08-19 14:11:28 -07:00
Neethu Haneesha Bingi
24ac173c95 Adding ReadRangeAction timeout, returning error and using timer_monotonic changes. 2021-08-19 14:11:28 -07:00
Neethu Haneesha Bingi
01e85610ab Cancelling the timedout reads with rocksdb storage. 2021-08-19 14:11:28 -07:00
Daniel Smith
45b40addb7
Merge pull request #5379 from neethuhaneesha/eagerReadsDisable
Disabling option for removing eagerReads for ClearRange mutations.
2021-08-19 12:39:30 -04:00
Neethu Haneesha Bingi
c45daf6f51 Disabling option for removing eagerReads for ClearRange mutations. 2021-08-13 01:26:50 -07:00
Xiaoxi Wang
a97570bd06 solve mis-spelling, trace log and format problems 2021-08-11 18:26:00 -07:00
Josh Slocum
921a2cfca1 Merge branch 'feature-range-feed' into blob_full 2021-08-10 11:25:48 -05:00