5728 Commits

Author SHA1 Message Date
A.J. Beamon
4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Andrew Noyes
2bdfc52f97
Fix heap use after free (#8189)
Previously, we had Ref types outliving the arena's that owned them,
specifically encryptDomains in the getResolution actor. Refactor to use
Standalone's, which both fixes the memory error and makes this easier to
reason about.

Also fix a potential ODR violation.
2022-09-16 13:46:05 -07:00
Trevor Clinkenbeard
7b4598a53d
Merge pull request #8197 from sfc-gh-jslocum/bg_code_coverage_cleanup
cleaning tss and blob granule file code probes
2022-09-16 09:32:20 -07:00
sfc-gh-ngoyal
1bd97fe628
Recruit new singleton for consistency checker. (#5804)
* Recruit new singleton for consistency checker.

* Recruit the consistency checker only if enabled.

* Add a yield in monitorConsistencyChecker().

* Minor fixes.

* Consistency check workload enhancements.

* Minor fixes and clarifications.

* clang format

* Clang format.

* Minor fixes, cleanup, debug tracing.

* Misc.

* Move the consistency scan information from dbconfig to a key backed object.

* Move consistency scan config out of db cofig to a state object and feature rename.

* ConsistencyCheck workload refactor.

* devFormat

* Update fdbcli/ConsistencyScanCommand.actor.cpp

* Review Comments.

Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
2022-09-16 09:03:06 -07:00
Josh Slocum
4ead9a697f cleaning tss and blob granule file code probes 2022-09-16 09:51:33 -05:00
Josh Slocum
17c855be7e
Merge pull request #8196 from sfc-gh-jslocum/cf_metrics_fix
not counting end of stream as an error for change feeds
2022-09-16 09:05:25 -05:00
Josh Slocum
977c03ff78 not counting end of stream as an error for change feeds 2022-09-15 17:37:52 -05:00
Jingyu Zhou
a4f9ef8aaa Make a new memory arena for Tuple::clear
To avoid potential problem of invalidating contents of previously returned from
pack() calls.
2022-09-15 12:49:11 -07:00
Hui Liu
b19f1b5e3b
Merge pull request #8109 from sfc-gh-huliu/bmr
Add blob manifest for full restore
2022-09-15 11:05:41 -07:00
Trevor Clinkenbeard
0ae5321b52
Merge pull request #8186 from sfc-gh-tclinkenbeard/make-gsimulator-ptr
Make `g_simulator` a pointer
2022-09-15 10:29:22 -07:00
Hui Liu
59be25848f bootstrap blob manager and blob worker from blob manifest 2022-09-15 09:50:12 -07:00
Trevor Clinkenbeard
6c211be57c
Merge pull request #8182 from sfc-gh-nwijetunga/nim/remove-code-probes
Remove FileBackupAgent Code Probes
2022-09-15 09:07:47 -07:00
sfc-gh-tclinkenbeard
82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Lukas Joswiak
54a12483e4 Add build date and time to build_flags
Also fixes the Boost lib version.
2022-09-14 15:11:51 -07:00
Ata E Husain Bohra
b540a3d6b9
Disable zlib find_package, effectively disable gzip compression (#8179)
Description

find_package was used to find and link `zlib` library needed to enable
boost::gzip compression filter. However, the code adds dynamic linkage
of zlib shared object with generated binaries (fdbserver for instance).

For now disable the ZLIB find code to effectively disable GZIP compression
support.

Testing
2022-09-14 14:03:13 -07:00
Josh Slocum
d4ba6c266c
Merge pull request #8176 from sfc-gh-jslocum/ss_cf_burst_fix_main
Fixing Thundering Herd problem of change feed stream retries in SS
2022-09-14 16:01:20 -05:00
Andrew Noyes
0afa24bb3f
Fix undefined behavior when retries is too large (#8180)
fdbclient/PaxosConfigTransaction.actor.cpp:221:77: runtime error: shift exponent 32 is too large for 32-bit type 'int'

I confirmed that 1 << 30 is not UB
2022-09-14 11:46:15 -07:00
Ata E Husain Bohra
d2b82d2c46
Introduce "default encryption domain" (#8139)
* Introduce "default encryption domain"

Description

In current FDB native encryption data at-rest implementation,
an entity getting encrypted (mutation, KV and/or file) is categorized
into one of following encryption domains:
1. Tenant domain, where, Encryption domain == Tenant boundaries
2. FDB system keyspace - FDB metadata encryption domain
3. FDB Encryption Header domain - used to generate digest for
plaintext EncryptionHeader.

The scheme doesn't support encryption if an entity can't be categorized
into any of above mentioned encryption domains, for instance, non-tenant
mutations are NOT supported.

Patch extend the encryption support for mutations for which corresponding
Tenant information can't be obtained (Key length shorter than TenantPrefix)
and/or mutations do not belong to any valid Tenant
(FDB management cluster data) by mapping such mutations to a
"default encryption domain".

TODO

CommitProxy driven TLog encryption implementation requires every transaction
mutation to contain 1 KV, not crossing Tenant-boundaries. Only exception to
this rule is ClearRange mutations. For now ClearRange mutations are mapped
to 'default encryption domain', in subsequent patch appropriate handling
for ClearRange mutations shall be proposed.

Testing

devRunCorrectness - 100k
2022-09-14 10:58:32 -07:00
Nim Wijetunga
91fb7c72c8 remove code probes 2022-09-14 10:40:35 -07:00
Jingyu Zhou
e70a18e638
Merge pull request #8122 from xumengpanda/mengxu/io-timeout-main
Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout
2022-09-14 10:11:46 -07:00
Josh Slocum
3e5e49b635 Operational improvements to limit thundering herd effect of many change feed queries being retried simultaneously 2022-09-14 09:57:21 -05:00
Josh Slocum
e5eabbf3df Additional observability for change feeds 2022-09-14 09:55:15 -05:00
Meng Xu
9e9efb69a0 Format code to repo style 2022-09-13 16:59:45 -07:00
Lukas Joswiak
8c50f98c00 Update type of coordinators hash
This fixes some serialization issues due to `BinaryReader` not being
able to deserialize types of size_t.
2022-09-13 16:53:54 -07:00
Lukas Joswiak
424bb87f3e Apply feedback 2022-09-13 16:53:54 -07:00
Lukas Joswiak
09892df0b0 Remove unused knob 2022-09-13 16:53:54 -07:00
Lukas Joswiak
7ee6be9238 Simplify how ConfigBroadcastInterface is stored on worker 2022-09-13 16:53:54 -07:00
Lukas Joswiak
2fe3fc5379 Fix issue with pointer dereference after actor cancellation 2022-09-13 16:53:54 -07:00
Lukas Joswiak
b2d395a304 Delay cluster controller restart when pushing knob updates to workers
This gives the `ConfigBroadcaster` time to send the knob change to all
workers before applying the change to itself and restarting.
2022-09-13 16:53:54 -07:00
Lukas Joswiak
8d237ba493 Fix various correctness and timeout issues
Contains the following fixes:

* When handling the special case rollforward where nodes can be rolled
  forward even if a majority are at version 0, we don't want to reset
  the live version of the node being rolled forward. This is because a
  quorum of nodes at version 0 can continue handing out and incrementing
  their live version, and if they are rolled forward there is the
  potential for them to go back in time in regard to their live version.
  So in this one special case, they should maintain their existing live
  version.
* Fixes some unseed issues due to fields not being initialized properly.
* Temporarily disables a coordinator restart in the recovery path (in
  the coordinated state) due to it causing a timeout. This needs more
  investigation in the future.
2022-09-13 16:53:54 -07:00
Lukas Joswiak
249ff2b2fd Fix configuration database unit tests 2022-09-13 16:53:54 -07:00
Lukas Joswiak
1a33515934 Add --no-config-db option to fdbcli coordinators command
Specifying the `--no-config-db` option when changing coordinators
through fdbcli will prevent the command from hanging when the
configuration database is not active. Failing to specify this option
when the configuration database is not active will not affect the
correctness of the command, but it will hang instead of returning.
2022-09-13 16:53:54 -07:00
Lukas Joswiak
74ac617a34 Add support for changing coordinators to the configuration database
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
2022-09-13 16:53:54 -07:00
sfc-gh-tclinkenbeard
2bea5b88bf Add /Atomic/DoAppendIfFits unit test 2022-09-13 11:35:39 -07:00
A.J. Beamon
87ee0a2963
Merge pull request #8160 from sfc-gh-ajbeamon/remove-unnecessary-option
Remove unnecessary special key-space relaxed option in binding tenant management
2022-09-12 17:06:04 -07:00
A.J. Beamon
0c91336461 Remove unnecessary special key-space relaxed option in binding tenant management 2022-09-12 14:30:28 -07:00
Markus Pilman
59ce49913a
Merge pull request #8146 from sfc-gh-tclinkenbeard/improve-code-coverage
Increase the number of unit tests run in `RandomUnitTests.toml`
2022-09-12 15:10:47 -06:00
Trevor Clinkenbeard
1582e79cc2
Merge pull request #8153 from sfc-gh-tclinkenbeard/remove-hostname-resolution-codeprobe
Remove code probe for "Coordinator hostname resolving failure"
2022-09-12 14:01:52 -07:00
sfc-gh-tclinkenbeard
7a2554c73e Remove code probe for \"Coordinator hostname resolving failure\"
This is not intended to be hit in simulation, and we have visibility
into these events with the MonitorProxiesConnectFailed trace event.
2022-09-12 10:09:08 -07:00
Jingyu Zhou
9429ca992c
Merge pull request #8066 from sfc-gh-tclinkenbeard/fix-includes
Use full header file paths in includes
2022-09-12 09:43:24 -07:00
sfc-gh-tclinkenbeard
39c6989673 Remove some simulation unit tests.
These tests should not be run in simulation, because they either run too
long or break determinism.
2022-09-11 00:36:18 -07:00
Vaidas Gasiunas
81fff640bd
Testing with invalid cluster files, fixing update from changed cluster file (#8126)
* ApiTester: test with invalid cluster files

* More asserts in monitorProxies

* ApiTester: Test tampering the cluster file

* Fix update of connection string from the cluster file to use the new connection string only if it valid

* ApiTester: add linker dependency on std++fs

* upgrade_test: no-cleanup-on-error option

* ApiTester: use atomic operations to change and access the transaction handle
2022-09-10 09:23:00 +02:00
Yi Wu
d831c87d14
Add encryption metrics (#8070)
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
2022-09-09 18:43:09 -07:00
Russell Sears
6546e3c458
Merge pull request #8138 from sears/header_macro_cleanup
Fix duplicate (and broken) header macro FDBCLIENT_EVENTTYPES[S]_ACTOR…
2022-09-09 16:31:16 -07:00
Jon Fu
e205147104
Merge pull request #7993 from sfc-gh-jfu/jfu-tenant-special-key-space
Explicit tenant support in special key space
2022-09-09 13:33:43 -07:00
Russell Sears
73f65c3192 Fix duplicate (and broken) header macro FDBCLIENT_EVENTTYPES[S]_ACTOR[_G]_H 2022-09-09 19:14:30 +00:00
Jon Fu
006fc9a0ef code formatting 2022-09-09 11:58:36 -07:00
Dennis Zhou
802893d02b FutureBool: fix bool -> fdb_bool_t
Honestly, this might not be an issue, but it's nice to be consistent
with the conversions across c++ -> c for bool type.
2022-09-09 11:29:52 -07:00
Dennis Zhou
28ac29476a blob: fix unblobbify calling blobbify 2022-09-09 11:29:52 -07:00
Jon Fu
75a096a5e5 Merge branch 'main' of github.com:apple/foundationdb into jfu-tenant-special-key-space 2022-09-09 10:12:19 -07:00