12308 Commits

Author SHA1 Message Date
Steve Atherton
216d0be2cf
Add processID, networkAddress, and locality to layer status JSON for Backup Agents. (#9736)
* Add processID, networkAddress, and locality to layer status JSON for Backup Agents.

* Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster.  And lots of code cleanup.
2023-03-17 18:07:03 -07:00
A.J. Beamon
fe5d0928f3 Remove doEmptyCommit function 2023-03-17 12:58:41 -07:00
A.J. Beamon
dc2bd78aa7 The consistency check should retry if it couldn't find all the commit proxies when getting key server locations 2023-03-17 12:00:47 -07:00
Ata E Husain Bohra
c492f83bf4
EaR: Avoid appending tls to the URL (#9734)
Description

Patch proposes two changes:

1. Avoid appending tls as part of URI for secure connections
2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed

Testing

EncryptionOps.toml
EncryptKeyProxyTest.toml
devRunCorrectness 
devRunCorrectnessFiltered 'Encrypt*'
2023-03-16 22:52:51 -07:00
He Liu
0f5e75b34b
Added newDataMoveId(). (#9647)
* Added newDataMoveId().

* Added `ENABLE_DD_PHYSICAL_SHARD_MOVE`

* fmt.

* Replace `teamId` with `shardId`.
2023-03-16 18:06:06 -07:00
A.J. Beamon
aeaedb147f
Merge pull request #9727 from sfc-gh-ajbeamon/fix-shared-remote-region-kills
Avoid killing too many machines if one region is being shared between the remote primary and a satellite
2023-03-16 17:46:12 -07:00
Josh Slocum
3c1ac344f1
buggify blob granule compression per-file (#9670) 2023-03-16 17:46:18 -05:00
A.J. Beamon
6818ce950c Fix check that excludes satellites from consideration to consider satelliteTLogReplicationFactor and satelliteTLogUsableDcs. Update trace event with more info about the updated policy. 2023-03-16 14:27:30 -07:00
Steve Atherton
5c795c3abe Rewrite corrupt block number calculations to be more clear. 2023-03-16 13:02:15 -07:00
Markus Pilman
df5b15e56c
Merge pull request #9634 from sfc-gh-mpilman/features/negative-simulation
Framework to write negative tests
2023-03-16 12:47:02 -07:00
A.J. Beamon
735327f1cf
Merge pull request #9718 from sfc-gh-ajbeamon/decrease-duration-of-automatic-idempotency-workload
Decrease number of transactions in automatic idempotency workload
2023-03-16 12:31:24 -07:00
A.J. Beamon
75b8148e91 If one region is being shared between the remote primary and a satellite, the simulator could kill too many machines 2023-03-16 11:24:12 -07:00
A.J. Beamon
f8255fe7a1
Merge pull request #9724 from sfc-gh-ajbeamon/fix-disk-corruption-check
Fix possible off-by-one in the simulation upper bound check for page corruption
2023-03-16 11:05:25 -07:00
Josh Slocum
c7c41bc9db
adding implementation and check for blob worker exclusion (#9700) 2023-03-16 12:09:43 -05:00
A.J. Beamon
99f75a9bb1 Fix possible off-by-one in the simulation upper bound check for page corruption 2023-03-16 09:31:11 -07:00
Jingyu Zhou
adda32db46
Merge pull request #9691 from sfc-gh-dadkins/sfc-gh-dadkins/commit-proxy-unavailable
Replace 10-second delay with explicit wait for cluster recovery in checkExtraDataStores
2023-03-16 09:21:59 -07:00
A.J. Beamon
4b8311d932 The automatic idempotency workload has a long runtime and can occasionally log too many events, etc. This decreases the number of transactions it runs significantly to avoid that issue. 2023-03-15 18:45:26 -07:00
A.J. Beamon
436a187171 Merge branch 'main' into fix-storage-quota-enables-tenant-aware-dd 2023-03-15 17:59:01 -07:00
A.J. Beamon
a6202253a4
When a storage server fails to register (e.g. due to worker_removed), we need to throw that error to terminate the SS. (#9712) 2023-03-15 17:46:21 -07:00
A.J. Beamon
3f9d51db4e The DD_TENANT_AWARENESS_ENABLED knob was indirectly disabling the feature by not initializing a dd tenant cache, but this could be bypassed by enabling storage quotas. This makes the knob more explicitly control the feature. 2023-03-15 15:56:24 -07:00
Josh Slocum
b4eb665f1d
fixing copy constructor error and adding test for it (#9711) 2023-03-15 15:33:16 -07:00
Ata E Husain Bohra
dbcab0b1bd
Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)
This reverts commit 2702665e353005ab9ace4cabb2191e2bb5748bea.
2023-03-15 12:10:08 -07:00
Markus Pilman
303b833d7b Adding data corruption test to verify consistency check 2023-03-15 11:22:25 -07:00
Markus Pilman
79447c6e06 First successful negative run 2023-03-15 11:22:25 -07:00
Markus Pilman
3894d5069e fix compiler error 2023-03-15 11:22:25 -07:00
Markus Pilman
7a108a2768 Add framework for writing negative simulation tests 2023-03-15 11:22:25 -07:00
Markus Pilman
aa09baadab
Merge pull request #9635 from sfc-gh-etschannen/fix-consistency-check
Fix: the consistency check did not properly report failed tests
2023-03-15 11:21:44 -07:00
Evan Tschannen
6c1d02a14f
Merge pull request #9703 from sfc-gh-jslocum/bg_file_logical_size
adding blob granule logical size
2023-03-15 09:59:57 -07:00
Evan Tschannen
2f96627d43 merge in main 2023-03-15 09:26:22 -07:00
Jingyu Zhou
bc380c9a5d
Merge pull request #9699 from sfc-gh-xwang/fix/main/tcTest
fix unit test failure because of implicit uint16_t conversion to int
2023-03-15 09:18:10 -07:00
Evan Tschannen
0a8435b742
Merge pull request #9702 from sfc-gh-jslocum/dbg_bg_ctest_timeout
fixing 2 bugs related to high delta file waitCommitted latency
2023-03-15 08:52:35 -07:00
Josh Slocum
a5b4212990 adding blob granule logical size 2023-03-15 08:54:49 -05:00
Josh Slocum
52c0dc56cc fixing 2 bugs related to high delta file waitCommitted latency 2023-03-15 08:39:42 -05:00
Josh Slocum
03818e94f3
add exclusion tracker utility and use it in DD (#9669) 2023-03-15 08:21:28 -05:00
Xiaoxi Wang
213263b5d2 fix unit test failure because of implicit uint16_t conversion to int 2023-03-14 22:23:20 -07:00
Evan Tschannen
c435e8336a no message 2023-03-14 16:40:50 -07:00
He Liu
a0a3f4bff3
Fetch byte sample file (#9657) 2023-03-14 16:24:08 -07:00
Dan Adkins
6c796fa0d1 Get read version after setting transaction options. 2023-03-14 15:55:10 -07:00
Dan Adkins
4757545396 Replace 10-second delay with explicit wait for cluster recovery in checkExtraDataStores.
CheckExtraDataStores reboots or kills storage servers with extra data stores.
Since this occurs during a consistency check, the expectation is that the database
is quiet and not in the midst of recovery. This was done with a 10-second delay,
but it's possible during simulation tests that it takes longer than 10 seconds
to recruit a new master, so this assumption is invalid and can cause a test failure
when the consistency checks proceed.

Instead of a delay, we run an empty transaction through the system and explicitly
wait for the cluster to return to a fully-recovered state.
2023-03-14 12:46:13 -07:00
Yanqin Jin
37b0b0852c Merge remote-tracking branch 'origin/main' into deflake-test-1 2023-03-14 09:12:01 -07:00
Hui Liu
499a4cab93
Add correctness test for point-in-time restore (#9185) 2023-03-14 08:56:34 -07:00
A.J. Beamon
d39cda610a Merge branch 'main' into metacluster-improvements
# Conflicts:
#	fdbcli/TenantCommands.actor.cpp
2023-03-13 15:58:39 -07:00
A.J. Beamon
45056370b8 Merge branch 'main' into metacluster-improvements 2023-03-13 13:14:09 -07:00
A.J. Beamon
18cf523f49
Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix
2023-03-13 13:12:30 -07:00
Ata E Husain Bohra
ea796eb3ec
EaR: REST kms misc fixes (#9664)
* EaR: REST kms misc fixes

Description

Patch addresses following issues:
1. Fix "return connection" routine, it fixes a regression introduced by
an earlier fix.
2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map"
for O(1) lookups
3. Improve logging
4. Make RESTUrl parsing handle extra '/' for 'resource'

Testing

Standalone fdbserver connecting to external KMS and database create
2023-03-13 13:11:05 -07:00
Josh Slocum
4a0ceca75e swallowing errors in redwood dispose 2023-03-10 17:49:56 -06:00
A.J. Beamon
cbc330697c Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID. 2023-03-10 15:36:37 -08:00
Yanqin Jin
86682668ca Merge remote-tracking branch 'origin/main' into deflake-test-1 2023-03-10 14:57:59 -08:00
Jingyu Zhou
b13e496986
Merge pull request #9645 from sfc-gh-huliu/fixasan
Fix asan error caused by StringRef parameter of updateRestoreState
2023-03-09 17:35:56 -08:00
Jingyu Zhou
b755e668bf
Merge pull request #9601 from jzhou77/fix-head
Allow log router to detect slow peeks and to switch DC for peeking
2023-03-09 15:34:24 -08:00