Michael Stack e1138c30ee
Make bulkload file reads and writes async and memory parsimonious (#11997)
* * fdbclient/S3Client.actor.cpp
 Change field names so capitialized (convention)
 Add duration as field to traces.

* fdbserver/BulkLoadUtil.actor.cpp
 When the job-manifest is big, processing blocks
 so much getBulkLoadJobFileManifestEntryFromJobManifestFile
 fails.

* Make bulkload file reads and writes async and memory parsimonious.
In tests at scale, processing a large job-manifest.txt was blocking
and causing the bulk job to fail. This is part 1 of two patches.
The second is to address data copy added in the below when we
made methods ACTORs (ACTOR doesn't allow passing by reference).

* fdbserver/BulkDumpUtil.actor.cpp
 Removed writeStringToFile and buldDumpFileCopy in favor of new methods
 in BulkLoadUtil. Made hosting functions ACTORs so could wait on
 async calls.

* fdbserver/BulkLoadUtil.actor.cpp
 Added async read and write functions.

* fdbserver/DataDistribution.actor.cpp
 Making uploadBulkDumpJobManifestFile async made it so big bulkloads
 work.

* fix memory corruption in writeBulkFileBytes and fix read options in getBulkLoadJobFileManifestEntryFromJobManifestFile

* If read or write < 1MB, do it in a single read else do multiple read/writes

* * packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go
 Just be blunt and write out the credentials. Trying to figure when the
 blob credentials have expired is error prone.

Co-authored-by: michael stack <stack@duboce.com>
Co-authored-by: Zhe Wang <zhe.wang@wustl.edu>
2025-03-06 10:43:04 -08:00
..
2017-05-25 13:48:44 -07:00