144 Commits

Author SHA1 Message Date
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
sfc-gh-tclinkenbeard
77786f4fc6 Merge remote-tracking branch 'origin/main' into change-data-hall 2022-03-27 12:44:05 -07:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Steve Atherton
f03c0b8c3c Added ISimulated::restarted for detecting a restarted simulation test. 2022-03-04 17:19:46 -08:00
sfc-gh-tclinkenbeard
db8def68db Use std::unique_ptr for ISimulator::extraDB 2022-02-28 13:12:31 -08:00
Ata E Husain Bohra
87ee4cf958 Add new FDB EncryptKeyProxy role
Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.
2022-01-25 17:38:27 -08:00
Evan Tschannen
37c9a1320c added --print_sim_time to print simulated time to stdout 2021-11-23 15:01:44 -08:00
negoyal
1e7338b6c3 Merge branch 'master' into bit-flipping-workload 2021-10-28 14:24:49 -07:00
Evan Tschannen
2208b04174
Merge pull request #5855 from sfc-gh-etschannen/blob_full_clean
Blob Granules V0
2021-10-26 09:57:35 -07:00
sfc-gh-tclinkenbeard
49a667c29b Improve const-correctness of INetwork 2021-10-25 14:42:31 -07:00
Josh Slocum
912ef76f1c cleanup before merge 2021-10-18 17:11:14 -05:00
Josh Slocum
5f0ec0612a Merge branch 'feature-range-feed' into blob_full 2021-10-13 15:44:35 -05:00
negoyal
f913dfed97 Merge branch 'master' into bit-flipping-workload 2021-10-11 16:34:57 -07:00
Suraj Gupta
4d54669ccd Recruit the blob workers via blob manager.
In this PR, the blob manager now recruits blob workers
(via communication with the cluster controller). Blob workers
are onboarded as blob worker processes enter the cluster.
2021-10-04 11:07:08 -04:00
Suraj Gupta
5fa6c687d6 Add blob manager as a singleton. 2021-09-23 10:45:37 -04:00
Xiaoxi Wang
1730d75f73 change configure test
add store type check
add test file
2021-09-21 18:11:04 -07:00
negoyal
3b34423248 Merge branch 'master' into bit-flipping-workload 2021-08-31 12:14:51 -07:00
sfc-gh-tclinkenbeard
3418c20867 Merge remote-tracking branch 'origin/master' into paxos-config-db 2021-08-16 10:49:47 -07:00
sfc-gh-tclinkenbeard
82546853c0 Rename UseConfigDB to ConfigDBType 2021-08-09 10:04:35 -07:00
sfc-gh-tclinkenbeard
cdbcb69d86 Add configuration database type to ISimulator 2021-08-09 10:04:35 -07:00
Lukas Joswiak
5dc9a97230 Merge branch 'master' into fixes/alp6 2021-08-01 20:42:52 -07:00
negoyal
9e7197faba Bunch of changes based on review comments and discussions. 2021-07-30 01:32:43 -07:00
negoyal
050c218502 New Disk Delay Logic and ChaosMetrics. 2021-07-28 16:03:37 -07:00
Lukas Joswiak
e9a1679467 Disable sampling everywhere except fdbserver 2021-07-27 09:53:23 -07:00
sfc-gh-tclinkenbeard
b9a22a61ef Fix many -Wreorder-ctor warnings 2021-07-23 17:33:18 -07:00
negoyal
1b8b22decc Wrapper class to avoid adding overhead to all async disk calls 2021-07-12 17:51:01 -07:00
negoyal
df39c5a44e Implement Disk Throttling Chaos workload. 2021-06-30 17:05:04 -07:00
Lukas Joswiak
153de33f57 Revert "Merge pull request #4802 from sfc-gh-ljoswiak/revert/actor-lineage"
This reverts commit 6499fa178e8f65a22105c2cd062a67209b562973, reversing
changes made to 15126319577f915f28aa6308bbf066dc7ec992a2.
2021-06-04 13:31:55 -07:00
Josh Slocum
d67184163b
Merge pull request #4556 from sfc-gh-jslocum/tss
Testing Storage Server
2021-06-01 09:11:10 -07:00
A.J. Beamon
69dbe04d42 Rename WeakFutureReference to UnsafeWeakFutureReference and add warning comment 2021-05-28 14:34:20 -07:00
A.J. Beamon
a756469670 Use a weak reference in the open files cache (abstracted from a similar cache in AsyncFileCached) to avoid a problem where removing an item from the cache could cause us to reentrantly remove it again. 2021-05-26 13:38:24 -07:00
Josh Slocum
ce82c9653e Testing Storage Server implementation 2021-05-25 20:28:50 +00:00
Lukas Joswiak
4ea760b2a9 Revert "Merge pull request #4136 from sfc-gh-mpilman/features/actor-lineage"
This reverts commit da41534618a2a1edbf6b0b760635175372a66294, reversing
changes made to e6300905d6f294c52ebd166f4714541b084f37b4.
2021-05-10 20:26:12 -07:00
Markus Pilman
9bcde529f8
Merge pull request #4 from sfc-gh-ljoswiak/features/current-actor
Sample running actor
2021-04-05 11:36:48 -06:00
Evan Tschannen
e774262046 fix: g_simulator.disableRemote did not contain the rest of the configuration 2021-03-30 21:11:26 -07:00
Lukas Joswiak
2dfd420882 Add sampling profiler thread 2021-03-24 14:52:42 -07:00
Evan Tschannen
6a372e3fc7 fixed a simulation bug where a process on an unreliable machine would be considered reliable by the simulator 2021-03-15 11:07:36 -07:00
FDB Formatster
df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Vishesh Yadav
2bb4f2e59f Merge branch 'release-6.3-pre-format' into master-format
This merges release-6.3 branch right before it was fully formatted.
There were quite a few conflicts that are resolved here. CoroFlow had
a check for OOM errors introduced in 6.3, but didn't seem applicable in
the new implmentation which seems to use boost.
2021-03-10 09:37:41 -08:00
Steve Atherton
f33ed86210
Merge pull request #4420 from apple/release-6.2
Merge 6.2 into 6.3
2021-03-03 15:57:36 -08:00
Markus Pilman
cc47332478 Added an actor to allow for async file renames 2021-03-02 16:38:51 -07:00
A.J. Beamon
aaf0a9aa7b Merge branch 'release-6.3' into merge-release-6.3-into-master
# Conflicts:
#	build/docker-compose.yaml
#	cmake/ConfigureCompiler.cmake
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/IAsyncFile.h
#	fdbrpc/IRateControl.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbservice/ServiceBase.cpp
2021-02-08 12:58:34 -08:00
A.J. Beamon
67e783acf8 Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/CompileBoost.cmake
#	cmake/FDBComponents.cmake
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/storageserver.actor.cpp
#	flow/Knobs.h
#	flow/network.h
2021-02-08 09:20:28 -08:00
Evan Tschannen
36e4f82115 more complete support for simulated disk failures 2021-01-27 14:29:43 -08:00
Evan Tschannen
3ee5831287 fault injection can now cause a disk to stop responding 2021-01-27 13:57:54 -08:00
Andrew Noyes
4ee97c0784 Use clang-tidy to automatically fix missing overrides
Use `clang-tidy -p . $file -checks='-*,modernize-use-override' -header-filter='.*' -fix`
to fix missing overrides, and then use git clang-format to reformat just
those changes. This went pretty well for most files.

Formatting the following files went off the rails, so I'm going to
follow up with a commit that's just clang-tidy and no clang-format.

- fdbclient/DatabaseBackupAgent.actor.cpp
- fdbclient/FileBackupAgent.actor.cpp
- fdbserver/OldTLogServer_4_6.actor.cpp
- fdbmonitor/SimpleIni.h
- fdbserver/workloads/ClientTransactionProfileCorrectness.actor.cpp
2021-01-26 02:04:12 +00:00
sfc-gh-tclinkenbeard
86c7c1e946 Fix IAsyncFileSystem method signatures 2020-12-28 01:57:42 -04:00
sfc-gh-tclinkenbeard
5bfa6cea98 Merge remote-tracking branch 'origin/master' into misc-changes 2020-12-26 20:47:00 -04:00
sfc-gh-tclinkenbeard
f3c0d26806 Make ISimulator::BackupAgentType an enum class 2020-12-08 09:09:30 -08:00
sfc-gh-tclinkenbeard
c914620c10 Fix signatures for ISimulator methods 2020-12-08 09:09:29 -08:00