345 Commits

Author SHA1 Message Date
Markus Pilman
1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Xiaoxi Wang
0054b8a754 Merge branch 'main' of https://github.com/apple/foundationdb into bug/remotekv 2022-06-28 23:41:25 -07:00
Xiaoxi Wang
c6ff556f06 fix bug by exclude the spawned kv process info when counting dead process; add method 2022-06-28 23:39:22 -07:00
Markus Pilman
03d913a1de Flow compiling 2022-06-27 17:05:55 -06:00
Markus Pilman
38e100ebc5 flow bindings are compiling 2022-06-23 19:06:05 -06:00
Markus Pilman
de48e90276 fdbserver is now compiling 2022-06-23 18:45:26 -06:00
Markus Pilman
ffaf15c12a moved wellknownendpoints and fixed some includes 2022-06-23 17:03:53 -06:00
Markus Pilman
d35445a868 enforce include modularization in cmake 2022-06-23 14:37:35 -06:00
Trevor Clinkenbeard
942d687506
Clean up includes in actor header files (#7331)
* Remove unnecessary actorcompiler.h includes (from non-actor files)

* Make AsyncFileChaos a non-actor header file

* Add unactorcompiler.h include to the end of actor header files

* Add missing actorcompiler.h includes to actor header files
2022-06-13 13:26:51 -07:00
Dan Adkins
bd47f390bd
Add simulation test for three_data_hall configuration (#7305)
* Add simulation test for 1 data hall + 1 machine failure case.

* Disable BUGGIFY for DEGRADED_RESET_INTERVAL.

A simulation test discovered a situation where machines attempting to connect
to a dead coordinator (with a well-known endpoint) were getting themselves
marked degraded. This flapping of the degraded state prevented recovery from
completing, as it started over any time it noticed that tlogs on degraded
hosts could be relocated to non-degraded ones.

bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956
2022-06-06 13:14:49 -07:00
Renxuan Wang
cd2a575e02
Move the resolve of coordinator hostname from getCoordinatorProtocol() to getClusterProtocolImpl(). (#7245)
* Move the resolve of coordinator hostname from getCoordinatorProtocol() to getClusterProtocolImpl().

* Guard DNS Cache behind a knob.
2022-05-26 09:45:54 -07:00
Josh Slocum
a43c98519d
Switching char* to std::string for ProcessInfo to have it own memory (valgrind errors) (#7176) 2022-05-17 12:41:04 -07:00
sfc-gh-tclinkenbeard
475d66084d Remove ENCRYPTION_ENABLED macro 2022-05-02 22:26:31 -07:00
Ray Jenkins
dc9e782ccc
OpenTelemetry Tracing Perf Fixes (#6990) 2022-05-02 14:56:51 -05:00
Steve Atherton
1ab5c21967
Merge pull request #6979 from sfc-gh-jslocum/speedup_tail_latency
Don't do huge tail latencies for network requests when speed up simul…
2022-04-29 22:31:35 -07:00
Steve Atherton
338d2304d7
Bug fix: Killing a machine process would not wait for AsyncFileNonDurable close operations to finish, causing a later reopen of the same file in a new process to hang forever. Renamed AsyncFileNonDurable::deleteFile to closeFile for clarity. Renamed Machine deletingFiles to deletingOrClosingFiles for clarity. (#7007) 2022-04-29 14:01:18 -07:00
Josh Slocum
1e567c12e8 Don't do huge tail latencies for network requests when speed up simulation is set 2022-04-27 15:04:17 -05:00
Markus Pilman
64ac66c1d0 fix merge conflict 2022-04-10 14:16:21 -06:00
Markus Pilman
d8a0b57b6c clients have to listen on a port in simulation 2022-04-10 14:09:15 -06:00
Renxuan Wang
e548c0d604 Add DNS cache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
ff934ca2ad Change MockDNS to DNSCache. 2022-04-04 15:08:17 -07:00
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
sfc-gh-tclinkenbeard
8dcac2f76d Fix typos 2022-03-13 10:02:11 -03:00
Jingyu Zhou
1a5bf25b5c Update code base to use fmt 8.1.1 2022-03-04 15:52:06 -08:00
Andrew Noyes
7a9217a392
Add contrib/debug_determinism (#6389)
* Add contrib/debug_determinism

Add an instrumentation-based technique for debugging unseen mismatches. Also guard a few existing sources of nondeterminism that don't affect unseen with the DEBUG_DETERMINISM macro.

Also change the simulated run loop to not run as the only task inside the real run loop, since that was a source of nondeterminism.

Also fix nondeterminism from calling timer_int

* Add StorageMetadataType::currentTime

Basically a deterministic-in-simulation version of timer_int that we can
use instead of timer_int for StorageMetadataType::createdTime
2022-02-25 12:54:31 -08:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Renxuan Wang
2ea4146e1f Add resolveTCPEndpointBlocking() to resolve hostnames where async resolving is impossible. 2022-01-28 12:20:41 -08:00
Renxuan Wang
28832a99d6 Address comment. 2022-01-18 14:34:18 -08:00
Renxuan Wang
b8bab06e16 Add the functions to set and get mock DNS.
These functions will be used in restarting tests, where mock DNS needs to be saved to and read from files.
2022-01-18 14:34:18 -08:00
sfc-gh-tclinkenbeard
ec64890ac1 Remove some usages of PRId64 by using fmt library 2021-11-30 23:35:36 -08:00
Evan Tschannen
37c9a1320c added --print_sim_time to print simulated time to stdout 2021-11-23 15:01:44 -08:00
Steve Atherton
035e0d6e52
Merge branch 'master' into bit-flipping-workload 2021-11-16 14:42:22 -08:00
Renxuan Wang
4630b0ccea Move DNS mock from SimExternalConnection to Sim2.
This is a revise PR of #5934. In simulation, we don't have direct access to SimExternalConnection.
2021-11-15 17:02:51 -08:00
negoyal
1e7338b6c3 Merge branch 'master' into bit-flipping-workload 2021-10-28 14:24:49 -07:00
sfc-gh-tclinkenbeard
49a667c29b Improve const-correctness of INetwork 2021-10-25 14:42:31 -07:00
negoyal
3b34423248 Merge branch 'master' into bit-flipping-workload 2021-08-31 12:14:51 -07:00
FDB Formatster
2c788c233d apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-08-27 17:07:47 -07:00
Josh Slocum
2c10118229 Force kill in killDatacenter didn't actually force kill always 2021-08-17 17:26:59 -05:00
Lukas Joswiak
5dc9a97230 Merge branch 'master' into fixes/alp6 2021-08-01 20:42:52 -07:00
negoyal
40b4f3b2f1 Merge branch 'master' into bit-flipping-workload 2021-07-28 18:06:07 -07:00
negoyal
050c218502 New Disk Delay Logic and ChaosMetrics. 2021-07-28 16:03:37 -07:00
sfc-gh-tclinkenbeard
c74047c665 Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings 2021-07-28 11:51:02 -07:00
Evan Tschannen
81f93d794f Merge branch 'master' of https://github.com/apple/foundationdb into fix-ordered-delay 2021-07-27 13:56:35 -07:00
Evan Tschannen
256a18e43b Flow transport uses an ordered delay to avoid out of order reply promise stream messages 2021-07-27 12:01:32 -07:00
Lukas Joswiak
3eed4084e2 Merge branch 'master' into fixes/alp6 2021-07-27 11:26:53 -07:00
Lukas Joswiak
59d535149e Merge branch 'master' into fixes/alp6 2021-07-27 10:07:18 -07:00
Lukas Joswiak
e9a1679467 Disable sampling everywhere except fdbserver 2021-07-27 09:53:23 -07:00
Steve Atherton
507c1f11e3 Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use. 2021-07-26 19:55:10 -07:00
sfc-gh-tclinkenbeard
e006e4fed4 Fix -Wreorder-ctor warnings in LogSystemPeekCursor.actor.cpp and several other files 2021-07-24 00:48:13 -07:00