27838 Commits

Author SHA1 Message Date
Jingyu Zhou
e0d6fa1a90
Add an option to Cycle workload to skip setup phase (#11990)
Useful for testing upgrade/downgrade tests.
2025-03-03 12:38:57 -08:00
dependabot[bot]
f6daa2c62e
Bump cryptography from 43.0.1 to 44.0.1 in /tests/TestRunner (#11989)
Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.1 to 44.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/43.0.1...44.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-03 11:38:21 -08:00
Michael Stack
1e1aa71dab
Build a sidecar container that refreshes s3 credentials (#11945)
* packaging/docker/Dockerfile
     Add fdb-aws-s3-credentials-fetcher-sidecar container.
     Runs perpetual script that writes blob-credentials.json to /var/fdb.

* packaging/docker/build-images.sh
     Build and publish new sidecar container

* packaging/docker/fdb-aws-s3-credentials-fetcher/README.md
* packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go
* packaging/docker/fdb-aws-s3-credentials-fetcher/go.mod
* packaging/docker/fdb-aws-s3-credentials-fetcher/go.sum
     Script that fetches credentials via IRSA (IAM Roles for Service Accounts).

* packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go
     Match the key generated by fdbserver internally.

* fdbclient/S3BlobStore.actor.cpp
     Add some logging around fail-to-find-credentials -- why.

* * fdbclient/tests/aws_fixture.sh
 Use the fdb-aws-s3-credentials-fetcher script fetching credentials if available in ctests.

* fdbclient/tests/s3client_test.sh
 TMPDIR might not be defined when we print usage.

Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com>
2025-03-03 08:39:33 -08:00
Jingyu Zhou
add710d7f6
Enable TRACK_TLOG_RECOVERY as default (#11987)
Test RECORD_RECOVER_AT_IN_CSTATE and TRACK_TLOG_RECOVERY in buggify with random
on or off.
2025-03-02 19:15:12 -08:00
Zhe Wang
bcb4b3961a
AuditStorage Documentation (#11983)
* audit doc

* fix ci

* address comments

* address comments
2025-02-28 20:27:43 -08:00
Jingyu Zhou
0351af8b26
Merge pull request #11985 from spraza/7.4-release-version-changes
API and protocol version changes for 8.0 -> 7.4
2025-02-28 20:25:53 -08:00
Syed Paymaan Raza
74e79d4316 FDB cmake: update to latest production ready 7.3 and 7.1 patch releases 2025-02-28 13:31:40 -08:00
Syed Paymaan Raza
bcc3237321 Update future protocol versions for 7.4 protocol version binaries 2025-02-28 13:31:40 -08:00
Syed Paymaan Raza
6319330d8e Revert "Update main branch to 8.0 (#11968)"
This reverts commit 710f3f3083b845b0ae5f94b9a2e58eced826f463.
2025-02-28 13:31:40 -08:00
Vishesh Yadav
b66cf62cca rocksdb: fix crash due to uninitialized/stale ColumnFamilyHandle
`CreateColumnFamilyWithImport()` expects that the value inside
handle is `nullptr`. This patch fixed a codepath where we pass
a stale handle left by destroyed column family.
2025-02-28 10:46:59 -08:00
neethuhaneesha
f786104c6c
Release notes for 7.3.62 and 7.3.63 (#11982) 2025-02-28 09:15:07 -08:00
Syed Paymaan Raza
7642ead228
Do not pick SS with a colocated LR in ExcludeIncludeStorageServersWorkload (#11980) 2025-02-27 15:23:32 -08:00
Vishesh Yadav
9f094417a2
Fix isOnMainThread in Simulation and Testing (#11978)
* Fix isOnMainThread in Simulation and Testing

isOnMainThread() is used to check if the currently running task
is on the FDB's event loop. However, in simulation this behaviour
is broken and always returns false.

In other modes such as UnitTest mode since `runTests()` is called before
`g_network->run()`, but without a wait() statement the event loop never
gets chance to set itself as main thread and the tests never sees
current thread as main thread. Therefore we add a yield inside
`runTests()` so yield control back to caller block and continue
with g_network->run() which eventually schedule it back after
initialization.

* Update fdbserver/tester.actor.cpp

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>

---------

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
2025-02-27 13:14:17 -08:00
Zhe Wang
8da2a54f4d
Add BulkloadJob Cancellation (#11976)
* add bulkload cancellation

* reduce frequency of job cancellation in tests

* fix bulkload assert failure

* nits

* fix busy loop in bulkload/dump workload

* fix workload

* but

* address comments and CI failures

* add task count trace event
2025-02-27 20:34:53 +00:00
Syed Paymaan Raza
710f3f3083
Update main branch to 8.0 (#11968) 2025-02-26 14:09:52 -08:00
Jingyu Zhou
e824466a4f
Merge pull request #11973 from sbodagala/version-vector-unicast-issue
An issue with populating available log servers during recovery with version vector
2025-02-26 14:03:00 -08:00
Zhe Wang
2116547ad3
Improve BulkDump Implementation (#11974)
* bulkdump code refactor

* fix bugs

* improve
2025-02-26 13:58:45 -08:00
Jingyu Zhou
5990b5c75c
Merge pull request #11972 from apple/disable-upgrade-tests
Add compile switch to disable restart simulation tests
2025-02-25 15:03:14 -08:00
Zhe Wang
5f9f5358a8
Improve BulkLoad TraceEvent (#11971)
* improve bulkload event

* fmt
2025-02-25 14:37:21 -08:00
Sreenath Bodagala
af7b34e431 - Correct an issue to do with populating the list of reporting log servers
during recovery with version vector - the list of reporting log servers
should include even those that have an empty unknown committed version list.
2025-02-25 20:32:22 +00:00
Dan Lambright
7c141b2156 Add compile switch to disable restart simulation tests 2025-02-25 15:11:27 -05:00
Zhe Wang
5cce92dcac
Simplify BulkLoad Job Metadata (#11959)
* address comments in the PR 11952

* code refactor and simplification

* avoid task outdated in DDBulkLoadJobExecute

* nit

* fix CI issue
2025-02-25 10:57:22 -08:00
Yao Xiao
67b9b5c9f3
Remove per thread histogram in storage engine and fix bugs in range scan. (#11967) 2025-02-25 10:52:46 -08:00
Zhe Wang
1c52697565
RandomMoveKey should choose SSes from different data halls (#11964)
* DDShardLost should be an error in simulation

* fix randomMoveKey workload

* revert DDShardLost severity change
2025-02-20 20:40:02 -08:00
Jingyu Zhou
ec714791df
Merge pull request #11960 from spraza/fix-nightly
Conditionally disable backup worker
2025-02-19 20:59:16 -08:00
Syed Paymaan Raza
5eda677f43 Conditionally disable backup worker 2025-02-19 17:31:41 -08:00
Syed Paymaan Raza
d9ea00ef5e
Fix rocksdb crash caused because of passing uninitialized metadata to ExportColumnFamily (#11957) 2025-02-17 20:46:02 -08:00
Vishesh Yadav
7ad26cfb31
Add ability to ignore multiple tests (#11956)
* Add ability to ignore multiple tests

- Also ignores gRPC unit tests

* Update fdbserver/workloads/UnitTests.actor.cpp

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>

* ignore grpc from other toml files

---------

Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
2025-02-17 17:57:33 -08:00
Zhe Wang
94faec13d5
Enable BulkLoad Job to Give Up Unretrievable Task and Fix DDStuck Bug (#11952)
* enable bulkload job to give up unretriable task

* fix ddstuck bug
2025-02-17 17:27:32 -08:00
Jingyu Zhou
63c6539f1f
Merge pull request #11908 from jzhou77/fix
Handle cases when backup worker pulling may miss mutations
2025-02-17 16:02:57 -08:00
Jingyu Zhou
1690af5acb Address comments 2025-02-17 09:52:28 -08:00
Jingyu Zhou
4939cd84a9 Fix start version for pullAsyncData 2025-02-17 09:50:29 -08:00
Jingyu Zhou
4196864eb4 Delay updating pop version in noop mode until it's saved
Otherwise, the pop version can become larger than the actual saved version when
switching to the regular pulling mode. Because the pop version is larger,
mutations larger than saved version could be popped and no long available.
2025-02-17 09:50:29 -08:00
Jingyu Zhou
048374ae9b It's fine to ignore mutations if noop mode popped them 2025-02-17 09:50:29 -08:00
Jingyu Zhou
a5d2a58272 Pause backup workers during quite database
Because in NOOP mode, backup workers still writes to the database, and cause
non-empty storage queues.
2025-02-17 09:50:29 -08:00
Jingyu Zhou
1bd6f0aeab Save NOOP progress of backup workers
This is needed so that CC knows the lower bound of versions that can be included
in a backup.
2025-02-17 09:50:29 -08:00
Jingyu Zhou
d9f04dc247 Fix start version after backup worker exits noop mode 2025-02-17 09:50:29 -08:00
Jingyu Zhou
6a07744672 Handle cases when backup worker pulling may miss mutations
I.e., throw an error to trigger a recovery.
2025-02-17 09:50:29 -08:00
Jingyu Zhou
108199ebe5
Update 7.3.59 as the latest release (#11955)
* Update 7.3.59 as the latest release

* Update cmake and boost versions used for compiling
2025-02-16 21:59:20 -08:00
Zhe Wang
d141eea3e1
Allow BulkLoadEngine to Handle Non-Retriable Task (#11950)
* enable-bulkload-engine-accept-unretriable-task

* nit and fmt

* fix bug
2025-02-14 10:52:29 -08:00
Jingyu Zhou
c812b90df9
Merge pull request #11949 from spraza:disable-sharded-rocks-unit-test
Disable noSim/ShardedRocksDBCheckpointTest.toml
2025-02-13 20:16:45 -08:00
Zhe Wang
e070698ed0
DataMove Should Decide BulkLoading After Old DataMove Actor Has Been Cleared (#11947)
* fix bulkload bug

* fix CI
2025-02-13 15:35:55 -08:00
Vishesh Yadav
7f46fc11ff
Add gRPC file transfer service (#11892)
Add gRPC file transfer service

* grpc: Add file size check
* grpc: change test addresses
* Fix CI/CD failure
* Disable gRPC for build
* Fixes for new gRPC in new build image
* Move FileTransfer definitions to CPP file
2025-02-13 14:36:30 -08:00
Jingyu Zhou
6a9898de44
Merge pull request #11904 from flowguru/backup1
Refactor backup mutation serialization
2025-02-13 14:18:04 -08:00
Syed Paymaan Raza
95be769282 Disable noSim/ShardedRocksDBCheckpointTest.toml 2025-02-13 13:15:14 -08:00
Dan Lambright
fee87e03b2
Add compile time switch NO_MULTIREGION_TEST. (#11931)
* Add compile time switch NO_MULTIREGION_TEST. When set, simulation tests
will not create configurations with more than one region. Tests requiring
multiple regions are ignored.

* While the RUN_IGNORED_TESTS setting allows running tests that have been
marked as ignored, this should not apply to multiregion tests.  Multiregion
tests must be completely disabled if the NO_MULTIREGION setting is enabled.

---------

Co-authored-by: Dan Lambright <hlambright@apple.com>
2025-02-13 14:38:41 -05:00
Michael Stack
ff22876247
Add multiparting to s3client. (#11920)
* Add multiparting to s3client.
Fix boost::urls::parse_uri 's dislike of credentialed blobstore urls.

* fdbclient/BulkLoading.cpp
 Add blobstore regex to extract credentials before feeding the boost
 parse_uri.

* fdbclient/include/fdbclient/S3BlobStore.h
* fdbclient/S3BlobStore.actor.cpp
 Add cleanup of failed multipart -- abortMultiPartUpload l(s3 will do
 this in the background eventually but lets clean up after ourselves).
 Also add  getObjectRangeMD5 so can do multipart checksumming.

* fdbclient/S3Client.actor.cpp
 Change upload file and download file to do multipart always.
 Retry too.

* fdbclient/S3Client_cli.actor.cpp
 Add command line to trace rather than output.

* Address Zhe review

* More logging around part upload and download

* Undo assert that proved incorrect; restore the old length math
doing copy in readObject.

Cleanup around TraceEvents in HTTTP.actor.

* Undo commented out cleanup -- for debugging

* formatting

---------

Co-authored-by: stack <stack@duboce.com>
2025-02-13 09:06:17 -08:00
Vivek Raj
764341b52a
Refactor initialize_logger_level and unit_tests_version_510 (#11879)
* Refactor initialize_logger_level and unit_tests_version_510

* clang-format fixed

* Apply clang-format

* Fix a compiling error

* Fix SIGSEGV

---------

Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>
Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com>
2025-02-13 08:39:54 -08:00
Jingyu Zhou
e59d52c53e
Merge pull request #11946 from spraza/r142428623-fix
Disable attrition fault injection in snapshot workload
2025-02-12 21:01:03 -08:00
Yao Xiao
a42c9b3c80
update logs (#11944) 2025-02-12 14:49:53 -08:00