foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-05-14 09:58:50 +08:00

Author	SHA1	Message	Date
Josh Slocum	d37b2b0a76	Adding BlobFailureInjection workload (#9833 ) * Adding BlobFailureInjection workload * fixing formatting	2023-04-06 15:10:36 -05:00
Zhe Wu	d576d9a66a	Remote debug TraceEvent	2023-03-27 11:47:11 -07:00
Zhe Wu	40dc54223c	Add GC generation test, and make all simulation test passing	2023-03-27 11:46:13 -07:00
Zhe Wu	b4e62b9b3e	Update log cursor timeout check	2023-03-21 22:03:17 -07:00
Jingyu Zhou	5c97fb2c20	Use a constant for connectionFailuresDisableDuration	2023-03-09 09:50:24 -08:00
Jingyu Zhou	e18ed14278	Refactor to address comments	2023-03-09 09:39:27 -08:00
Jingyu Zhou	493e81f31d	Limit connection failures to be within tests In particular, disable connection failures when initializing the database during the startup phase, i.e., before running with test specs.	2023-03-08 15:36:58 -08:00
Russell Sears	bcc05b1058	Improve support for prebuilt boost	2023-02-27 15:38:58 -06:00
Jingyu Zhou	9a257a60a4	Address review comments	2023-02-24 10:47:32 -08:00
Jingyu Zhou	0b2e02c402	Fix rare test failures Unclog after DB is recovered, otherwise another recovery may become stuck again.	2023-02-23 15:42:33 -08:00
Jingyu Zhou	65443b6541	Fix compiling errors	2023-02-23 15:02:44 -08:00
Jingyu Zhou	ecae81882c	Change to only clog once for a particular tlog If we repeat clogging, different tlogs may be excluded, which can cause the recovery to stuck.	2023-02-23 14:31:39 -08:00
Jingyu Zhou	955826f2fe	Add ClogTlog workload	2023-02-23 14:31:12 -08:00
Junhyun Shim	d9c126a2d9	Introduce WipedString for Arena block holding AuthZ tokens (#9381 ) * Enable secure allocation mode in Arena This mode allows zeroing out blocks holding sensitive data after use * Introduce WipedString to all token-holding memory Also introduce a option flag "sensitive" * Make pointer equivalency a hard requirement for non-ASAN builds So that we can detect when Arena/malloc/memory-wipe behavior changes	2023-02-16 10:44:32 +01:00
Jingyu Zhou	622520bd2d	Return the source team if remote DC is dead Also refactor the code with findTeamFromServers().	2023-02-10 11:11:07 -08:00
Jingyu Zhou	6c4a9b5f23	Fix DD stuck when remote DC is dead When remote DC is down, the remote team collection of DD can initializing waiting for the remote to recover (all_tlog_recruited state). However, the getTeam request can already be served by the remote team collection. So, for a RelocateShard (data movement such as split, move), it will get a team for the remote DC. But the data movement can't make progress on the remote team because the remote DC hasn't recovered yet. Because of the stuck of data movement, the primary cannot reach the "storage_recovered" state and stay in accepting_commit state. The specifc test failure: slow/ApiCorrectness.toml -s 339026305 -b on at commit: 0edd899d65 In this test, primary DC has 1 SS killed, remote DC has 2 TLog and 2 SS killed. So the remote is dead, the remaining 2 SSes can't make progress because of the loss of 2 TLogs. The repairDeadDatacenter() can't reach the "storage_recovered" state due to DD's failure of moving shards away from the killed SS in the primary. The fix is to exclude all remote in repairDeadDatacenter() so that tells DD to mark all SSes in the remote as unhealthy. Another fix is to return empty results for getTeam request if the remote team collection is not ready. This will allow the data movement to continue, essentially remote team is not changed for the data movement.	2023-02-10 11:11:07 -08:00
Junhyun Shim	be225acd2a	Merge remote-tracking branch 'origin/main' into authz-tenant-name-to-tenant-id	2023-02-06 23:13:43 +01:00
Xiaoxi Wang	7190fa0c08	Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/testTimeout	2023-02-03 13:48:54 -08:00
Xiaoxi Wang	b757e8914a	fix BOOST_SYSTEM_NO_LIB redefinition in CI	2023-02-03 13:47:50 -08:00
Junhyun Shim	ce652fa284	Replace AuthZ's use of tenant names in token with tenant ID Also, to minimize audit log loss, handle token usage audit logging at each usage. This has a side-effect of making the token use log less bursty. This also subtly changes the dedup cache policy. Dedup time window used to be 5 seconds (default) since the start of batch-logging. Now it's 5 seconds from the first usage since the closing of the previous dedup window	2023-02-03 21:46:31 +01:00
Jingyu Zhou	e96adfa449	Fix excessive killing for HA configuration In the HA configuration, it's possible the remote DC was killed 2 out of 3 machines, left not enough machines for a successful recovery. So this PR changes to Reboot to avoid such excessive killings.	2023-02-01 15:16:10 -08:00
Chaoguang Lin	4c5cbe6cda	Merge branch 'main' of github.com:apple/foundationdb into fix-nightly-failure	2023-01-25 18:43:37 -08:00
Chaoguang Lin	fce9490c19	A Fix from Evan	2023-01-25 15:55:24 -08:00
Xiaoge Su	eb4e147ebf	Reformat source	2023-01-24 15:06:27 -08:00
Xiaoge Su	0a60142160	Extract ProcessInfo, MachineInfo, KillType out from ISimulator	2023-01-24 14:48:42 -08:00
Xiaoge Su	50de69c897	Extract IConnection and NetworkAddress out from network.h	2023-01-24 14:48:31 -08:00
Xiaoge Su	3f03a6b12d	Extract out IPAddress and IUDPSocket	2023-01-24 14:47:39 -08:00
sfc-gh-tclinkenbeard	986c792a9f	Drop UDP packets more frequently in simulation	2023-01-15 17:32:57 -08:00
Kevin Hoxha	407c371635	metrics: Add simulation testing and fix incorrect TraceEvent names - Added a background actor that listens on METRICS_EMISSION_UDP_PORT for incoming metrics (and verifies they are in the correct format) - TraceEvent details have certain requirements for naming. This commit makes a seperate name for Counter/LatencySample and its underlying IMetric to avoid those issues	2022-12-08 10:07:11 -08:00
Hui Liu	891331caed	Merge pull request #8881 from sfc-gh-huliu/fixinit Init blobGranulesEnabled in ISimulator	2022-11-18 17:07:57 -08:00
Hui Liu	bee0377b4d	Init blobGranulesEnabled in ISimulator	2022-11-18 15:53:06 -08:00
Junhyun Shim	bfefbfee8c	Merge pull request #8705 from sfc-gh-jshim/authz-accept-base64-for-jwt-tenant-name Make token's 'tenants' field base64-encoded (cf. base64url)	2022-11-16 10:17:10 +01:00
Markus Pilman	503769ef05	Merge pull request #8496 from sfc-gh-mpilman/bugfixes/machines-attrition-debugging Enable machine attrition injection	2022-11-15 16:32:33 -07:00
Junhyun Shim	41ea1678d0	Merge remote-tracking branch 'origin/main' into authz-accept-base64-for-jwt-tenant-name	2022-11-15 22:57:49 +01:00
sfc-gh-tclinkenbeard	c03f60c618	Update rare code probe annotations	2022-11-15 13:21:25 -08:00
Markus Pilman	f105cb1809	Merge remote-tracking branch 'origin/main' into bugfixes/machines-attrition-debugging	2022-11-14 10:11:52 -07:00
Markus Pilman	40c1bbc49a	Fix gcc problem with typenames	2022-11-09 10:14:13 -07:00
Markus Pilman	6643ed0a26	fix print-sim-time	2022-11-08 12:19:39 -07:00
Junhyun Shim	112363ef14	Merge remote-tracking branch 'origin/main' into authz-accept-base64-for-jwt-tenant-name	2022-11-08 13:16:08 +01:00
Junhyun Shim	50f4021cf7	Make token's 'tenants' field base64-encoded (cf. base64url) - Remove redundant operation from TokenSign - Let the sign/verify API directly report errors instead of tracing at failing subroutine, which lacks context	2022-11-04 20:17:08 +01:00
Josh Slocum	cff99a64f6	Blob Granule Attrition fixes (#8682 ) * Assert was incorrect in change feed destroy race with moved() clearing map * fixing race between injected fault and granule revoke * Handling race in sim2 blob worker attrition check	2022-11-03 18:48:10 -05:00
Markus Pilman	f1fea14255	Merge remote-tracking branch 'origin/main' into bugfixes/machines-attrition-debugging	2022-11-01 13:51:35 -06:00
Lukas Joswiak	5ca2b89bdf	Fix simulation issue where process switch was ignored The simulator tracks only active processes. Rebooted or killed processes are removed from the list of processes, and only get added back when the process is rebooted and starts up again. This causes a problem for the `RebootProcessAndSwitch` kill type, which wants to simultaneously reboot all machines in a cluster and change their cluster file. If a machine is currently being rebooted, it will miss the reboot process and switch command. The fix is to add a check when a process is being started in simulation. If the process has had its cluster file changed and the cluster is in a state where all processes should have had their cluster files reverted to the original value, the simulator will now send a `RebootProcessAndSwitch` signal right when the process is started. This will cause an extra reboot, but should correctly switch the process back to its original, correct cluster file, allowing the cluster to fully recover all clusters. Note that the above issue should only affect simulation, due to how the simulator tracks processes and handles kill signals. This commit also adds a field to each process struct to determine whether the process is being run in a DR cluster in the simulation run. This is needed because simulation does not differentiate between processes in different clusters (other than by the IP), and some processes needed to switch clusters and some simply needed to be rebooted.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	f43011e4b7	Notify processes joining the wrong cluster And have these processes enter a "zombie" state where they cancel all their actors and then wait forever, refusing to do any additional work until they are manually handled by the operator.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	a72066be33	Add simulation support for changing the cluster file	2022-10-27 13:56:13 -07:00
Markus Pilman	3c943ac37a	fix merge bugs	2022-10-26 10:42:11 -06:00
Markus Pilman	e7b5b870a3	Merge remote-tracking branch 'origin/main' into bugfixes/machines-attrition-debugging	2022-10-24 15:24:36 -06:00
Markus Pilman	2310584a05	Merge remote-tracking branch 'sfc/bugfixes/machines-attrition-debugging' into bugfixes/machines-attrition-debugging	2022-10-24 15:01:03 -06:00
Markus Pilman	43cafb0bc2	Track disk corruptions and mark resulting failures as injected	2022-10-24 14:54:43 -06:00
Andrew Noyes	fb9333e863	Delete Sim2::PromiseTask Previously this was leaking and causing simulation OOM's. Also make it FastAllocated to match Net2::PromiseTask	2022-10-24 09:25:21 -07:00

1 2 3 4 5 ...

416 Commits