20826 Commits

Author SHA1 Message Date
Andrew Noyes
83aceb216c
Use absl::GetStackTrace for slow task profiler (#7374)
* Make SlowTask workload runnable in joshua

* Remove SignalSafeUnwind, and use absl::GetStackTrace for slow task profiler
2022-06-15 14:53:52 -07:00
Andrew Noyes
0fea3fb731
Save a bunch of copies in the trace thread (#7392)
Currently, a std::string is copied unnecessarily for every key and value
in a trace event.

This actually showed up in a jemalloc heap profile while I was
investigating something unrelated. I was surprised to see it since these
allocations should have a very short lifetime.
2022-06-15 12:29:15 -07:00
Johannes Scheuermann
c9b4ff3302
Add support for lumberjack in logging and update example to 7.1 (#7357) 2022-06-15 19:41:01 +01:00
Jingyu Zhou
a32127a0b9
Merge pull request #7391 from sbodagala/main
Do not always try to figure out the sequencer locality
2022-06-15 13:50:55 -04:00
Ata E Husain Bohra
9396b691b7
Generate GNU compatible build-id for mockkms golang binary (#7389)
* Generate GNU compatible build-id for mockkms golang binary

Description

 diff-1: Fix compilation issue

Generate GNU compatible build-id for mockkms golang binary
Leverage "cgo" to generate build-id

Testing

Debian package build, verified the GNU build-id
2022-06-15 10:43:46 -07:00
Sreenath Bodagala
2c85bb71c1 - Do not try to figure out the sequencer locality if knob
ENABLE_VERSION_VECTOR_HA_OPTIMIZATION is disabled.
2022-06-15 16:08:31 +00:00
Ata E Husain Bohra
8808d93813
Fix bugs in EncyrptKeyProxy actor (#7388)
Description

Major changes include:
1. GetEncryptByKeyIds cache elements can expire.
2. Update iterator after erasing an element during refresh encryption keys
   operation.

Testing

EncryptKeyProxyTest
2022-06-14 21:22:25 -07:00
Yao Xiao
7da26db342
[ShardedRocksDB] 4/N Support removeRange. (#7345) 2022-06-14 13:52:03 -07:00
Yi Wu
6246664006
Support encrypting TxnStateStore (#7253)
Adding encryption support for TxnStateStore. It is done by supporting encryption. for KeyValueStoreMemory. The encryption is currently done on operation level when the operations are being write to the underlying log file. See inline comment for the encrypted data format.

This PR depends on #7252. It is part of the effort to support TLog encryption #6942.
2022-06-14 13:26:32 -07:00
Xiaoge Su
21ee76a44d fixup! Reformat source #2 2022-06-14 13:22:18 -07:00
Xiaoge Su
c2676df2f8 fixup! Reformat source 2022-06-14 13:22:18 -07:00
Xiaoge Su
9fb6e5bb05 fixup! Fix the clang error when using std::move
This patch is to fix the compile error

/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: error: moving a local
object in a return statement prevents copy elision
[-Werror,-Wpessimizing-move]
 return std::move(resource);
        ^
/root/src/fdbclient/S3BlobStore.actor.cpp:410:9: note: remove std::move
call here
 return std::move(resource);
        ^~~~~~~~~~        ~
1 error generated.
2022-06-14 13:22:18 -07:00
Vishesh Yadav
fd6f6eb06a
Merge pull request #7364 from sfc-gh-ljoswiak/fixes/unnecessary-transaction-initialization
Remove unnecessary ReadYourWritesTransaction initialization
2022-06-14 11:02:31 -07:00
Xiaoge Su
00b805d8e0 fixup! Reformat source 2022-06-14 10:43:13 -07:00
Xiaoge Su
e493f1c3cd fixup! Add a retry mechanism in changeQuorumChecker and changeQuorum
This is to fix an issue when recovery and change coordinator key happens
together. The issue will occur when:

1. Recovery starts
2. Coordinator key change transaction started
3. During the recovery the coordinator key is read from cluster file and
   stored in the storage server
4. The cluster controller received `ChangeCoordiatorsRequest`, and
   updated the cluster name with the new value.

at this stage, the value related to coordinator key in storage server and
the worker is inconsistent.

5. changeQuorumChecker is called, which will verify such consistency.
   Since they are different, the call is returning failure and the
   caller, which could be a TEST_CASE, fails.

This is a rare race issue, and it is also noticed that when the
recovery/coordinator key change process is done, the database is in a
proper state which allows changeQuorumChecker behave properly. In this
case, a retry mechanism should be sufficiently fix corresponding test
failures.
2022-06-14 10:43:13 -07:00
Junhyun Shim
ed91ab5d54
Work around flow trace's data race bug (#7237)
* Work around flow trace's data race bug

BaseTraceEvent::setNetworkThread() and flushTraceFile[()|Void()]
has a long-standing race condition for traceEventThrottlerCache global
when flushTraceFileVoid() is not called from the network thread.

This race dates back to 2017 (commit hash 80e5fecfe2),
so before the race itself is fixed, work around the problem.

* Remove call to flushTraceFileVoid() from MkCertCli

* Apply clang format
2022-06-14 12:09:34 +02:00
Yao Xiao
ddbecb69ad
Ignore ShardedRocksDBTest. (#7381) 2022-06-13 23:13:11 -07:00
Renxuan Wang
839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
Hao Fu
9cee4c94e7
Safely remove fdb_transaction_get_range_and_flat_map (#7314) 2022-06-13 19:05:00 -07:00
Trevor Clinkenbeard
6bed046148
Merge pull request #7352 from sfc-gh-xwang/feature/ddtxn
[DD testability enhancement] Create IDDTxnProcessor and simple refactoring
2022-06-13 16:01:13 -07:00
Xiaoxi Wang
ef0f415e3d add option; change to shared_ptr 2022-06-13 13:55:48 -07:00
Xiaoge Su
5a2804e04b
fixup! Fix the XmlTraceLogFormatter (#7322)
* fixup! Fix the XmlTraceLogFormatter

The original escape process uses a `loop` while the code is actually not
an ACTOR. So the actorcompiler is not reacting. This causes the escape
not escaping the XML fields properly.

* fixup! Reformat source
2022-06-13 13:38:17 -07:00
Xiaoxi Wang
ea2edebbeb comment out store tuple 2022-06-13 13:36:19 -07:00
Zhanwei Wang
e632aef1c7
Make backup work with s3 compatible service (#6355)(#6382) (#7324)
1. Support virtual hosting endpoint.

2. On-premise s3 compatible storage service may use IP instead of s3 form domain name,
especially for development/test environment.

Instead of parsing service and region from domain name,

1). Hard code "s3" as service name in v4 signature
2). Add new parameter to allow pass region name from url

3. Fix creating bucket issue on aws, adding request body.
2022-06-13 13:33:05 -07:00
Andrew Noyes
013b290ca5
Don't fail test if log cursor times out during network partition (#7330)
* Don't fail test if log cursor times out during network partition

Also, exercise the codepath for handling timed_out in simulation, by
reverting this knob buggification behavior to that of 07976993e7.

* clang-format
2022-06-13 13:28:22 -07:00
Trevor Clinkenbeard
942d687506
Clean up includes in actor header files (#7331)
* Remove unnecessary actorcompiler.h includes (from non-actor files)

* Make AsyncFileChaos a non-actor header file

* Add unactorcompiler.h include to the end of actor header files

* Add missing actorcompiler.h includes to actor header files
2022-06-13 13:26:51 -07:00
Ata E Husain Bohra
a5d91fe18a
KmsConnector implementation to support KMS driven CipherKey TTL (#7334)
* KmsConnector implementation to support KMS driven CipherKey TTL

Description

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-2: Fix Valgrind issues discovered runnign tests
  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest

* KmsConnector implementation to support KMS driven CipherKey TTL

Description

  diff-3: Address review comment
  diff-2: Fix Valgrind issues discovered runnign tests
  diff-1: Set expireTS for baseCipherId indexed cache

KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.

Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.

Testing

1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
2022-06-13 13:25:01 -07:00
Andrew Noyes
38db712e7a
Make ASAN arena aware (#7336) 2022-06-13 13:24:02 -07:00
Andrew Noyes
207e0bc105
Fix a few places we weren't doing exponential backoff (#7349)
* Fix a few places we weren't doing exponential backoff

We re-create the transaction every iteration of each of these retry
loops, so we need to manage exponential backoff here ourselves.

Closes #7301

* Remove former Backoff definition
2022-06-13 13:18:58 -07:00
Stitch-Zhang
2275127d8c
fix(fdbkubernetesmonitor): unclosed file description (#7356)
Closing additional environment file description as soon as read it completely
2022-06-13 13:16:26 -07:00
Andrew Noyes
2a8d8a1932
Fix more heap overflows (#7360)
From calling strlen on a not necessarily null-terminated buffer.
2022-06-13 13:13:05 -07:00
Xiaoxi Wang
1de6c09307 use struct instead of tuple 2022-06-13 11:27:50 -07:00
Ray Jenkins
c45abc7c32
Add TRACING_SPAN_ATTRIBUTES_ENABLED Knob, default false. (#7354)
* Add TRACING_SPAN_ATTRIBUTES_ENABLED Knob, default false.

In order to prevent accidental leakage of PII to external tracing collector services,
we've added a knob to prevent additional attributes to be added to spans unless explicitly
enabled by the user.

* Enable span attributes knob for unit tests.
2022-06-13 11:37:09 -05:00
Xiaoxi Wang
c12a7a30ed
Update fdbserver/DataDistributionQueue.actor.cpp
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-06-13 08:22:48 -07:00
Xiaoxi Wang
9604db3f10
Update fdbserver/DDTxnProcessor.h
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-06-13 08:19:14 -07:00
Lukas Joswiak
3b3ef49d40 Remove unnecessary transaction initialization
`ReadYourWritesTransaction` has memory allocated before being passed to
the main thread. This allows both threads to continue to access the
transaction object. Currently, the transaction gets allocated and
initialized on the foreign thread, and then re-initialized on the main
thread. This causes a bunch of extra, unnecessary work for each
`ReadYourWritesTransaction` where the temporary object gets destructed.

The fix is to only allocate memory for the `ReadYourWritesTransaction`
on the foreign thread, and then initialize it once on the main thread.
2022-06-10 16:53:19 -07:00
Jingyu Zhou
d0c5449d5c Add 7.1.8 and 7.1.9 release notes 2022-06-10 15:09:10 -07:00
Andrew Noyes
849b1cd29a
Update to the latest jemalloc release (#7362)
Also remove our patch, since the fix is already present in the new
release.
2022-06-10 14:46:21 -07:00
Steve Atherton
90bb3a7f8c
Merge pull request #7341 from sfc-gh-satherton/net2-react-perf-fix
Performance bug fix: reactor.react() is called too often.
2022-06-09 17:55:07 -07:00
Xiaoxi Wang
fb66561bc4 format code 2022-06-09 14:43:09 -07:00
Xiaoxi Wang
7ee6808ebd solve compiler warning 2022-06-09 14:32:24 -07:00
Xiaoxi Wang
b99bd45730 format code 2022-06-09 12:36:20 -07:00
Xiaoxi Wang
e5aa5fef22 merge upstream/main 2022-06-09 12:17:27 -07:00
Xiaoxi Wang
6ab12ea971 add storeTuple and unit test; refactor getSourceServersForRange 2022-06-09 12:16:12 -07:00
Yao Xiao
0bb02f6415
[Sharded RocksDB] 3/N Implement functions for range clear. (#7310) 2022-06-09 10:50:39 -07:00
Junhyun Shim
631a59a65e
Merge pull request #7299 from sfc-gh-mdvorsky/mdvorsky/remove_tester_api_wrapper
Remove TesterApiWrapper, replace its uses with fdb_api.hpp
2022-06-09 10:42:05 +02:00
Jingyu Zhou
7acd184a38
Merge pull request #7339 from jzhou77/fix-status-memory
Add rss_bytes to process memory and fix available_bytes calculation
2022-06-08 13:10:51 -07:00
Robert Barabas
8606923da2
Arm64 related build fixes (#7319)
* Add missing include

* Fix open call on arm64

* Bump up doctest to 2.4.8
2022-06-08 11:20:27 -07:00
Jingyu Zhou
b9ff6bc129 Address AJ's comments 2022-06-08 09:38:32 -07:00
Andrew Noyes
07f49392ac
Avoid using structured bindings in doctest assertions (#7335)
* Avoid using structured bindings in doctest assertions

clang doesn't allow this with the latest releases of doctest

This will unblock #7319

* Add constructor to MappedKV
2022-06-08 09:36:18 -07:00