126 Commits

Author SHA1 Message Date
Xiaoge Su
751cf6a721 fixup! typo 2022-08-15 15:25:20 -07:00
Xiaoge Su
c4d1231914 Remove ItemWithExamples, replace by Samples
This is a fully cleanup of ItemWithExamples with a new Samples class.
2022-08-15 15:25:20 -07:00
Xiaoge Su
68dc99ea0f Refactor ClientStatusStats and ClientStats
MonitorLeader.actor.cpp:ClientStatusStats and
Status.actor.cpp:ClientStats shares the same structure, thus refactored.
2022-08-15 15:25:20 -07:00
Markus Pilman
1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Vaidas Gasiunas
e28a8401fb
Update coordinator list from cluster file (#7382)
* Log failed connection attempts in monitorProxies

* Update coordinator list from the cluster file after failing to connect to all coordinators

* Wiggle and upgrade test with legacy version monitoring; updating tests to use 7.1.9

* Update coordinator list from the cluster file: addressing review comments

* Update coordinator list from the cluster file: addressing review comments

* Wait on future for all setAndPersistConnectionString calls
2022-06-23 09:22:09 +02:00
Renxuan Wang
839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
Renxuan Wang
df4e0deb4d
coordinatorsKey should not always store IP addresses. (#7204)
* coordinatorsKey should not storing IP addresses.

Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.

* Remove the legacy coordinators() function.

* Update async_resolve().

ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.

* Clean code format.

* Fix typo.

* Remove SpecifiedQuorumChange and NoQuorumChange.
2022-05-23 11:42:56 -07:00
A.J. Beamon
993a79b42c
Restart protocol monitoring logic when we cycle the coordinator, even if its the same coordinator. Add some defensive code in case we fetch the protocol version for an absent peer. (#6397) 2022-05-19 10:00:23 +02:00
Renxuan Wang
30e124c09b Remove HostnameStatus and resolve trigger.
They are no longer needed since we have coordinators DNS cache; and they are introducing complex crashes.
2022-05-12 10:13:55 -07:00
Renxuan Wang
154de018ff
One place in PaxosConfigConsumer was missed out in #6926. (#7006)
* One place in PaxosConfigConsumer was missed out.

* Minor improvements.
2022-04-28 18:32:55 -07:00
Renxuan Wang
c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Renxuan Wang
e40cc8722c
A few hostname improvements. (#6825)
* Add tryResolveHostnames() in connection string.

* Add missing hostname to related interfaces.

* Do not pass RequestStream into *GetReplyFromHostname() functions.

Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults.

* Add dynamic hostname resolve and reconnect intervals.

* Address comments.
2022-04-20 13:42:46 -07:00
Jingyu Zhou
17dc1a61f3 ClientDBInfo may be unintentionally not set
The ClientDBInfo's comparison is through an internal UID and shrinkProxyList()
can change proxies inside ClientDBInfo. Since the UID is not changed by that
function, subsequent set can be unintentionally skipped.

This was not a big issue before. However, VV introduces a change that the
client side compares the returned proxy ID with its known set of GRV proxies
and will retry GRV if the returned proxy ID is not in the set. Due the above
bug, GRV returned by a proxy is not within the client set, and results in
indefinite retrying GRVs.
2022-04-18 09:09:14 -07:00
Renxuan Wang
938e8ed996 Do not throw lookup_failed when resolving fails.
Instead, return an empty Optional<NetworkAddress>. For resolveWithRetry(), still return NetworkAddress because it retries until succeed.
2022-04-08 14:21:49 -07:00
Renxuan Wang
465ff712b6
Move Hostname to its own files. (#6759)
* Change DNS cache to use std::map.

Revert commit 90c259d84e95dd35e01149c0a86bd18e82e33930, because if we use unordered_map, toString() can be inconsistent.

* Move ClientKnob::COORDINATOR_HOSTNAME_RESOLVE_DELAY to FlowKnob::HOSTNAME_RESOLVE_DELAY.

* Move Hostname to its own files.

Also, add resolve-related variables and functions in Hostname.
2022-04-04 19:04:51 -07:00
Renxuan Wang
2a59c5fd4e
Workers should monitor coordinators in submitCandidacy(). (#6655)
* Workers should monitor coordinators in submitCandidacy().

* Change re-resolve delay to a knob.
2022-03-24 19:20:42 -07:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Renxuan Wang
3e761aef3b Add resetConnectionString() function to ClusterConnectionString. 2022-02-24 23:02:29 -08:00
A.J. Beamon
250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Renxuan Wang
3c1394578b Address comments. 2022-02-22 16:29:59 -08:00
Renxuan Wang
622d89b552 Rebase on main.
Since we changed ClusterConnectionString's status flag from boolean to enum in #6422, we need to update this PR correspondingly.
2022-02-22 16:29:59 -08:00
Renxuan Wang
8eb7a10404 Address comments. 2022-02-22 16:29:59 -08:00
Renxuan Wang
481587a8c6 Turn on hostname logic. 2022-02-22 16:29:59 -08:00
Renxuan Wang
c29c522d3e Change ClusterConnectionString's status flag from boolean to enum.
So that when multiple threads try to resolve hostnames concurrently, we should have only one resolving and others wait on the result. Otherwise, they would add duplicated resolved results to coordinators.
2022-02-21 10:53:03 -08:00
Renxuan Wang
f9f3735f73 Add resolveHostnamesBlocking() in ConnectionString and IClusterConnectionRecord.
Also, combine IClusterConnectionRecord::getConnectionString() and IClusterConnectionRecord::getMutableConnectionString() to IClusterConnectionRecord::getConnectionString(), and rename setConnectionString() to setAndPersistConnectionString().
2022-01-28 12:20:41 -08:00
Renxuan Wang
8c49391c51 Address comments. 2022-01-20 19:18:15 -08:00
Renxuan Wang
02f72dd377 Address comments. 2022-01-20 19:18:15 -08:00
Renxuan Wang
a6c482ee91 Add the support of using hostname in ConnectionString.
This PR comes without simulation tests except some unit tests. The simulation tests will be in the PR that uses hostname in code logic.
2022-01-20 19:18:15 -08:00
Renxuan Wang
9f70e84a7b Remove trimFromHostname(). 2021-12-07 15:39:51 -08:00
Renxuan Wang
85dff214a4 Address comments. 2021-11-04 11:42:28 -07:00
Renxuan Wang
f15ceb5489 Add Hostname struct, and fromHostname in NetworkAddress struct. 2021-11-04 11:42:28 -07:00
Renxuan Wang
f503ba6a7c Move getConnectionString() to IClusterConnectionRecord.
ClusterConnectionString was moved to IClusterConnectionRecord in PR #5853.
2021-10-27 12:59:52 -07:00
A.J. Beamon
9358adcf49 Address some review comments. 2021-10-22 11:05:18 -07:00
A.J. Beamon
e882eb33fc Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem. 2021-10-22 11:05:18 -07:00
Vaidas Gasiunas
16171d8252 Refactoring well-known endpoint registration
- List all well-known endpoints of FDB in a single enum
- Identify well-known endpoints by plain IDs
2021-09-21 11:05:31 -06:00
Xiaoge Su
abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Renxuan Wang
96fcde45c2 Minor leader election code improvements.
1. Rename monitorLeaderRemotely* functions to monitorLeaderWithDelayedCandidacy*. "Remote" is not clearly describing what the functions are doing;
2. Rename monitorLeaderForProxies() to monitorLeaderAndGetClientInfo() to better describe the function;
3. Remove monitorLeaderRemotelyInternal() and monitorLeaderRemotely() in MonitorLeader.actor.cpp, to eliminate code duplication. They already exist in worker.actor.cpp;
4. Move the declaration of getLeader() from LeaderElection.actor.cpp to MonitorLeader.h;
5. Update a few comments.
2021-09-16 15:34:45 -07:00
FDB Formatster
69508b980f modify comments to make clang-format and coverage tool play nice 2021-08-27 17:18:00 -07:00
sfc-gh-tclinkenbeard
3c1cabf041 Coordinator lets client know if it cannot communicate with cluster controller 2021-07-13 17:26:28 -07:00
sfc-gh-tclinkenbeard
9379bab04e Move trim to anonymous namespace 2021-07-13 14:28:46 -07:00
sfc-gh-tclinkenbeard
371a38e6e5 Merge remote-tracking branch 'origin/master' into remove-extra-copies 2021-06-07 10:26:06 -07:00
Dan Lambright
64c10d3625 fix joshua failures, formatting 2021-05-27 08:08:07 -04:00
Dan Lambright
fcfb78162c misc cleanup for publishing 2021-05-27 08:08:07 -04:00
Dan Lambright
742c22cef2 Don't allow changing desriptor if knob is set 2021-05-27 08:08:07 -04:00
sfc-gh-tclinkenbeard
f28ac955c3 Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>> 2021-05-10 16:32:50 -07:00
Renxuan Wang
652c5c4e84
Merge pull request #4668 from RenxuanW/worker
Improve logging on worker joining cluster
2021-04-30 10:24:02 -07:00
Andrew Noyes
904a39e473
Merge pull request #4667 from sfc-gh-ajbeamon/feature-mvc-monitor-protocol-version
Use fewer connections in the multi-version client
2021-04-28 14:13:17 -07:00
RenxuanW
0145eea684 Make MonitorLeaderForwarding and LeaderForwarding trackLatest events. 2021-04-27 15:17:20 -07:00
Andrew Noyes
8a00c6cdf8 Add -Wshift-sign-overflow
This catches the bug fixed in #4656 at compile time
2021-04-21 23:49:26 +00:00
A.J. Beamon
b2d6930103 The multi-version client monitors the cluster's protocol version and only activates the client library that can connect. 2021-04-15 11:45:14 -07:00