361 Commits

Author SHA1 Message Date
Lukas Joswiak
c357a90b30 Ensure change coordinators request came from same generation proxy
There was a rare but possible issue where a change coordinators request
could be in flight, but reading `\xff/coordinators` would return the old
set of coordinators, even though a previous attempt to write it returned
`commit_unknown_result`. This happened if there was an unrelated
recovery between setting `\xff/coordinators` and the updated
coordinators reaching each machine. The `commit_unknown_result` caused
by this recovery meant the change coordinators retry loop would read
`\xff/coordinators`, but since the request was ongoing it would read the
old set of coordinators. The fix is to have the cluster controller
ignore any change coordinators request from an old generation, meaning
when the retry loop reads the old coordiantors from `\xff/coordinators`,
it is guaranteed that the in progress change coordinators message will
be rejected when arriving at the cluster controller.
2022-07-19 17:44:48 -07:00
Markus Pilman
1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Trevor Clinkenbeard
1eec6f993d
Merge pull request #7572 from sfc-gh-akejriwal/tenantquota
Introduce storage quotas per tenant
2022-07-18 20:00:22 -07:00
Ankita Kejriwal
882df774c0 Storage quota: use transaction options and minor fixes in the test 2022-07-15 16:34:04 -07:00
Ankita Kejriwal
6f130b4a69 Change quota type to unit64_t to avoid macOS compilation issue 2022-07-14 17:09:03 -07:00
Ankita Kejriwal
bb05321d24 Introduce storage quotas per tenant.
This change adds:
* ability to store the mapping from tenants to quota in the system keyspace,
* a setter and getter function
* a new workload to test this functionality

FDBCORE-2437
2022-07-14 16:35:12 -07:00
Josh Slocum
0b0ac16a4c Merge branch 'main' into granule_merging 2022-07-12 09:09:30 -05:00
Chaoguang Lin
29f98f3654
Avoid duplicate snapshot on one process if it serves as multiple roles (#7294)
* Fix comments

* Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT

* A work version with correctness clean

* Remove unnecessay comments; debugging symbols

* Only check secondary address for coordinators, same as before

* Change the trace to SevError and remove the ASSERT(false)

* Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest

* Add retry for network failures

* Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race

* Add retry limit as a knob and make backoff exponentail

* Add getDatabaseConfiguration(Transaction* tr)

* revert back to send request for each role once

* update some comments
2022-06-29 11:23:07 -07:00
Markus Pilman
d35445a868 enforce include modularization in cmake 2022-06-23 14:37:35 -06:00
Xiaoge Su
00b805d8e0 fixup! Reformat source 2022-06-14 10:43:13 -07:00
Xiaoge Su
e493f1c3cd fixup! Add a retry mechanism in changeQuorumChecker and changeQuorum
This is to fix an issue when recovery and change coordinator key happens
together. The issue will occur when:

1. Recovery starts
2. Coordinator key change transaction started
3. During the recovery the coordinator key is read from cluster file and
   stored in the storage server
4. The cluster controller received `ChangeCoordiatorsRequest`, and
   updated the cluster name with the new value.

at this stage, the value related to coordinator key in storage server and
the worker is inconsistent.

5. changeQuorumChecker is called, which will verify such consistency.
   Since they are different, the call is returning failure and the
   caller, which could be a TEST_CASE, fails.

This is a rare race issue, and it is also noticed that when the
recovery/coordinator key change process is done, the database is in a
proper state which allows changeQuorumChecker behave properly. In this
case, a retry mechanism should be sufficiently fix corresponding test
failures.
2022-06-14 10:43:13 -07:00
Josh Slocum
d6920cde28 Implemented blob granule merging 2022-06-09 10:50:53 -05:00
Xiaoxi Wang
fd35fde481 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-23 15:09:03 -07:00
Renxuan Wang
df4e0deb4d
coordinatorsKey should not always store IP addresses. (#7204)
* coordinatorsKey should not storing IP addresses.

Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.

* Remove the legacy coordinators() function.

* Update async_resolve().

ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.

* Clean code format.

* Fix typo.

* Remove SpecifiedQuorumChange and NoQuorumChange.
2022-05-23 11:42:56 -07:00
Xiaoxi Wang
69985ba251 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-02 10:53:22 -07:00
Renxuan Wang
c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Xiaoxi Wang
0639810b66 merge upstream/main 2022-04-22 11:09:15 -07:00
Zhe Wang
6c9ff6ee5e
Add sharded rocksdb type (#6862)
* add-sharded-rocksdb-type

* address comments

Co-authored-by: Zhe Wang <zhewang@Zhes-MacBook-Pro.local>
2022-04-21 22:53:14 -04:00
Xiaoxi Wang
61a1f7683b fix dd command line read special key space error 2022-04-11 22:49:21 -07:00
He Liu
dd15489605 rename ssd-rocksdb-experimental as ssd-rocksdb-v1. 2022-03-29 10:53:38 -07:00
Josh Slocum
f27475e2f4 Merge branch 'main' into blob_integration 2022-03-22 11:41:58 -05:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Josh Slocum
37e7c80f26 Merge branch 'main' into blob_integration 2022-03-17 18:45:42 -05:00
sfc-gh-tclinkenbeard
62f547ff6e Add line before actorcompiler.h include (to prevent IDE from reordering includes) 2022-03-16 14:15:48 -07:00
sfc-gh-tclinkenbeard
66d71e107d Move actorcompiler.h include to the end of includes 2022-03-16 00:09:16 -07:00
Josh Slocum
e71b3533f9 Merge branch 'main' into blob_integration 2022-03-09 08:59:56 -06:00
A.J. Beamon
6a2e54d955 Rename OPTIONAL to avoid windows macro conflict 2022-03-08 13:45:29 -08:00
A.J. Beamon
72a34945ce Add the ability to disable tenants. Server processes verify the ID of tenants being read or written. 2022-03-06 21:54:21 -08:00
A.J. Beamon
5fa9d3e1b7 Add a tenant parameter to read and commit requests. Store a map of all tenants on commit proxy and storage servers. Add an option to require tenant mode. 2022-03-06 21:54:21 -08:00
A.J. Beamon
36435af4f5 Add support for a limits hint in special keys. Fix a typo and resolve some comment formatting weirdness. 2022-03-06 21:52:39 -08:00
Jingyu Zhou
1a5bf25b5c Update code base to use fmt 8.1.1 2022-03-04 15:52:06 -08:00
Renxuan Wang
233c918ffb Replace printf() and fprintf() with fmt::print(). 2022-02-25 19:06:57 -08:00
Renxuan Wang
f7eb66441d Try eliminating warnings in macOS and Windows CI builds.
MacOS warnings are format warnings, e.g., `format specifies type 'long' but the argument has type 'Version' (aka 'long long')`.
Windows warnings are `ACTOR does not contain a wait() statement`.
2022-02-25 19:06:57 -08:00
Renxuan Wang
06b1d06d38 Support hostname in coordinators commands. 2022-02-24 23:02:29 -08:00
Renxuan Wang
70d723f70c Move the construction of ConnectionString in changeQuorumChecker() to coordinatorsCommitActor(). 2022-02-24 23:02:29 -08:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Xiaoxi Wang
59ac1dffb4 change storage_migration_mode to storage_migration_type 2022-02-22 09:36:30 -08:00
Josh Slocum
38a75a8b89 Merge branch 'main' into blob_integration 2022-02-17 17:47:38 -06:00
A.J. Beamon
8418cb2839 Mark various externally exposed system transactions as read/access system keys. Also adjust a few lock aware declarations to read lock aware. 2022-01-27 13:58:33 -08:00
Renxuan Wang
2227fc2943 Fix a bug in getDesiredCoordinators().
When no workers are chosen, we should return 0 coordinators.
2021-12-15 10:42:28 -08:00
Suraj Gupta
cb568bbd55 Add watch on config key. 2021-12-10 14:00:34 -06:00
negoyal
037b30fbde Clang format. 2021-12-03 10:01:49 -08:00
negoyal
2725183b26 Don't include an unreliable process in the protected list. 2021-12-03 09:44:52 -08:00
sfc-gh-tclinkenbeard
464d9488ef Merge remote-tracking branch 'origin/master' into fix-unused-warnings 2021-12-01 23:52:09 -08:00
sfc-gh-tclinkenbeard
90ced244eb Fix -Wunused-but-set-variable warnings 2021-12-01 18:15:53 -08:00
sfc-gh-tclinkenbeard
ec64890ac1 Remove some usages of PRId64 by using fmt library 2021-11-30 23:35:36 -08:00
Steve Atherton
c53f5aa110 Renamed redwood to redwood-1-experimental and file extension to .redwood-v1. 2021-11-16 02:15:22 -08:00
Evan Tschannen
6f7558b8ea Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed
# Conflicts:
#	tests/CMakeLists.txt
2021-10-24 21:06:33 -07:00
A.J. Beamon
e882eb33fc Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem. 2021-10-22 11:05:18 -07:00
Evan Tschannen
3f7df58a77 fixed a number of issues 2021-10-19 13:56:52 -07:00