* proof of concept
* use code-probe instead of test
* code probe working on gcc
* code probe implemented
* renamed TestProbe to CodeProbe
* fixed refactoring typo
* support filtered output
* print probes at end of simulation
* fix missed probes print
* fix deduplication
* Fix refactoring issues
* revert bad refactor
* make sure file paths are relative
* fix more wrong refactor changes
* Remove unnecessary actorcompiler.h includes (from non-actor files)
* Make AsyncFileChaos a non-actor header file
* Add unactorcompiler.h include to the end of actor header files
* Add missing actorcompiler.h includes to actor header files
* Add simulation test for 1 data hall + 1 machine failure case.
* Disable BUGGIFY for DEGRADED_RESET_INTERVAL.
A simulation test discovered a situation where machines attempting to connect
to a dead coordinator (with a well-known endpoint) were getting themselves
marked degraded. This flapping of the degraded state prevented recovery from
completing, as it started over any time it noticed that tlogs on degraded
hosts could be relocated to non-degraded ones.
bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956
* initial structure for remote IKVS server
* moved struct to .h file, added new files to CMakeList
* happy path implementation, connection error when testing
* saved minor local change
* changed tracing to debug
* fixed onClosed and getError being called before init is finished
* fix spawn process bug, now use absolute path
* added server knob to set ikvs process port number
* added server knob for remote/local kv store
* implement simulator remote process spawning
* fixed bug for simulator timeout
* commit all changes
* removed print lines in trace
* added FlowProcess implementation by Markus
* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child
* temporary fix for process factory throwing segfault on create
* specify public address in command
* change remote kv store knob to false for jenkins build
* made port 0 open random unused port
* change remote store knob to true for benchmark
* set listening port to randomly opened port
* added print lines for jenkins run open kv store timeout debug
* removed most tracing and print lines
* removed tutorial changes
* update handleIOErrors error handling to handle remote-ikvs cases
* Push all debugging changes
* A version where worker bug exists
* A version where restarting tests fail
* Use both the name and the port to determine the child process
* Remove unnecessary update on local address
* Disable remote-kvs for DiskFailureCycle test
* A version where restarting stuck
* A version where most restarting tests green
* Reset connection with child process explicitly
* Remove change on unnecessary files
* Unify flags from _ to -
* fix merging unexpected changes
* fix trac.error to .errorUnsuppressed
* Add license header
* Remove unnecessary header in FlowProcess.actor.cpp
* Fix Windows build
* Fix Windows build, add missing ;
* Fix a stupid bug caused by code dropped by code merging
* Disable remote kvs by default
* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune
* serialization change on readrange
* Update traces
* Refactor the RemoteIKVS interface
* Format files
* Update sim2 interface to not clog connections between parent and child processes in simulation
* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled
* Add comments, format files
* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections
* Commit the IConnection interface change, forgot in previous commit
* Fix the issue that onClosed request is cancelled by ActorCollection
* Enable the remote kv store knob
* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process
* Fix the bug where one process starts storage server more than once
* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally
* Remove unreachable code path and add comments
* Clang format the code
* Fix a simple wait error
* Clang format after merging the main branch
* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false
* Disable remote kvs for PhysicalShardMove which is for RocksDB
* Cleanup #include orders, remove debugging traces
* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build
Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
* Add contrib/debug_determinism
Add an instrumentation-based technique for debugging unseen mismatches. Also guard a few existing sources of nondeterminism that don't affect unseen with the DEBUG_DETERMINISM macro.
Also change the simulated run loop to not run as the only task inside the real run loop, since that was a source of nondeterminism.
Also fix nondeterminism from calling timer_int
* Add StorageMetadataType::currentTime
Basically a deterministic-in-simulation version of timer_int that we can
use instead of timer_int for StorageMetadataType::createdTime