10 Commits

Author SHA1 Message Date
Michael Stack
ff22876247
Add multiparting to s3client. (#11920)
* Add multiparting to s3client.
Fix boost::urls::parse_uri 's dislike of credentialed blobstore urls.

* fdbclient/BulkLoading.cpp
 Add blobstore regex to extract credentials before feeding the boost
 parse_uri.

* fdbclient/include/fdbclient/S3BlobStore.h
* fdbclient/S3BlobStore.actor.cpp
 Add cleanup of failed multipart -- abortMultiPartUpload l(s3 will do
 this in the background eventually but lets clean up after ourselves).
 Also add  getObjectRangeMD5 so can do multipart checksumming.

* fdbclient/S3Client.actor.cpp
 Change upload file and download file to do multipart always.
 Retry too.

* fdbclient/S3Client_cli.actor.cpp
 Add command line to trace rather than output.

* Address Zhe review

* More logging around part upload and download

* Undo assert that proved incorrect; restore the old length math
doing copy in readObject.

Cleanup around TraceEvents in HTTTP.actor.

* Undo commented out cleanup -- for debugging

* formatting

---------

Co-authored-by: stack <stack@duboce.com>
2025-02-13 09:06:17 -08:00
michael stack
4d835c542c Have ctests use s3 if it is available.
Fix object integrity check; original approach doesn't work when
serverside encryption is enabled (awz:kms).

* contrib/SimpleOpt/include/SimpleOpt/SimpleOpt.h
 Address sanitizer was complaining about how SimpleOpt manipulates the
 array of options. While memcpy inside a buffer is 'odd', it seems fine.
 Its old code. Leaving it.

* fdbbackup/tests/s3_backup_test.sh
 Pass in weed_dir rather than rely on fixture global (the latter didn't
 work).

* fdbclient/ClientKnobs.cpp
* fdbclient/include/fdbclient/ClientKnobs.h
* fdbclient/include/fdbclient/S3BlobStore.h
 Add a knob to ask for object integrity check on download from s3.
 BLOBSTORE_ENABLE_OBJECT_INTEGRITY_CHECK replaces BLOBSTORE_ENABLE_ETAG_ON_GET
 which doesn't work when serverside encodes content (found in testing).

* fdbclient/S3BlobStore.actor.cpp
 Implement object integrity check on download. If
 enable_object_integrity_check is set, we use sha256 in place of md5
 as our hash. Removed a redundant 'verify' of md5 check.

* fdbclient/S3Client.actor.cpp
 Remove unhelpful comments.

* fdbclient/S3Client_cli.actor.cpp
 Add support for enable_object_integrity_check. This knob replaces
 enable_etag_on_get which didn't work when awz:kms serverside
 encryption was enabled.
 Add error code on exit when exception.

* fdbclient/include/fdbclient/S3Client.actor.h
 Move an include (address a review comment from previous commit).

* fdbclient/tests/aws_fixture.sh
 Add an aws fixture of utility that can be shared.

* fdbclient/tests/bulkload_test.sh
 Use imported log_test_result

* fdbclient/tests/s3client_test.sh
 Add using s3 if available; otherwise, do seaweedfs.

* fdbclient/tests/seaweedfs_fixture.sh
 WEED_DIR global doesn't work so have caller pass it in for each method
 instead.
2025-01-14 13:13:15 -08:00
stack
c08b39ea21 Formatting 2025-01-13 09:21:54 -08:00
stack
373d1937e4 Address review comments 2025-01-10 09:53:51 -08:00
michael stack
fd239fcb2d Clarifying documentation on blob backup URL and credentials file.
* documentation/sphinx/source/backups.rst
 Minor edit. Add more examples making it clearer how to do S3
 backup URLs in particular. Explain the 'trick' for omitting
 key, secret, and token from URL instead picking them up from
 the credentils file.

* fdbclient/S3Client_cli.actor.cpp
 Minor cleanup of usage.
2025-01-10 09:30:24 -08:00
michael stack
4c1e74105e Add checksum checking of downloads. Add cleanup of test data.
* fdbclient/ClientKnobs.cpp
* fdbclient/include/fdbclient/ClientKnobs.h
 Add knob BLOBSTORE_ENABLE_ETAG_ON_GET

* fdbclient/S3BlobStore.actor.cpp
 Optionally check etag (md5) volunteered by s3 against the
 content we have downloaded and fail if not equal (TODO:
 check the checksum after we've saved the content to the
 filesystem --  would require  good bit of a refactoring).

* fdbclient/S3Client.actor.cpp
 Add deleteResource support.

* fdbclient/S3Client_cli.actor.cpp
 Add COMMAND support; currently either 'cp' or 'rm'.
 Set the knob blobstore_enable_etag_on_get to true by
 default for s3client.

* fdbclient/tests/s3client_test.sh
 Add clean up of resources written up to s3 at end of test.
 (Awkward in bash)
2025-01-06 13:50:19 -08:00
michael stack
ce7d4af37d Fix usage formatting issue 2024-12-06 11:06:36 -08:00
michael stack
cab2f0d3d0 Remove duplicate code. Move BackupTLSConfig.* from fdbbackup to
fdbclient so can be used in fdbclient. Remove the copies of
BackupTLSConfig we had in place named BlobTLSConfig.*.
Keep the old name though it a little clunky.
2024-12-05 08:25:25 -08:00
michael stack
b36a715ab8 Format S3Client_cli.actor.cpp 2024-12-04 09:15:31 -08:00
michael stack
597d3451d3 Refactor. Replace S3Cp with S3Client (S3Cp is too limiting of a name). Break out a
.h file of "public" functions. Move the CLI processing and the TLSConfig
to standalone files; the former because of complaints of two main
functions when building simulation with combined S3Client and the latter
for clarity's sake -- one entity per file.

* fdbclient/S3Client.actor.h
 "Interface" of S3Client public methods.

* fdbclient/S3Client.actor.cpp
 Implementation of S3Client.actor.h

* fdbclient/BlobTLSConfig.cpp
* fdbclient/BlobTLSConfig.h
 Move out of S3Cp/S3Client to its own file.

* fdbclient/S3Client_cli.actor.cpp
 CLI for S3Client. Keep it separate because bundling the CLI with
 the S3Client.actor becomes problematic building simulation tests
 (linker complains of duplicated main, etc.)

* fdbclient/include/fdbclient/BlobTLSConfig.h
 BlobTLSConfig (copied from BackupTLSConfig in fdbbackup).

* fdbclient/include/fdbclient/S3Client.actor.h
 "Interface" for an S3 "Client" that runs on top of S3BlobStore.
 Lists copy file and copy directory functions.
2024-12-03 20:23:04 -08:00