1637 Commits

Author SHA1 Message Date
Renxuan Wang
c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Ata E Husain Bohra
333aadb903
Interface to enable clients to send/receive REST requests/responses (#6866)
* Interface to enable clients to send/receive REST requests/responses

Description

Major changes:
1. Add RESTClient interface enabling client to send/receive REST HTTP
   requests. Support REST APIs are: get, head, put, post, delete, trace
2. Add RESTUtil file introducing below interfaces:
 2.1. RESTUrl - Extract URI information: host, service, request-parameters.
 2.2. RESTConnectionPool-
      Connection establishment, life-cycle management, connection-pool (TTL)
 2.3. RESTClientKnobs - supports REST Knob parameter management and updates

Testing

Unit test - fdbrpc/RESTClient, fdbrpc/RESTUtils
2022-04-27 12:17:52 -07:00
Renxuan Wang
e40cc8722c
A few hostname improvements. (#6825)
* Add tryResolveHostnames() in connection string.

* Add missing hostname to related interfaces.

* Do not pass RequestStream into *GetReplyFromHostname() functions.

Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults.

* Add dynamic hostname resolve and reconnect intervals.

* Address comments.
2022-04-20 13:42:46 -07:00
Binglin Chang
408c0cf1c9
Fix compile errors on ubuntu 20.04 (#4931) 2022-04-20 10:00:46 -07:00
Junhyun Shim
edc659d339 Use camelCase & move error code to 6xxx 2022-04-13 21:11:52 +02:00
Junhyun Shim
b6a0c0f942 Merge remote-tracking branch 'upstream/main' into tenant-token-sign 2022-04-13 19:55:37 +02:00
Markus Pilman
099385928c Address review comments 2022-04-11 09:17:10 -06:00
Markus Pilman
64ac66c1d0 fix merge conflict 2022-04-10 14:16:21 -06:00
Markus Pilman
16467262f0 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-10 14:12:37 -06:00
Markus Pilman
d8a0b57b6c clients have to listen on a port in simulation 2022-04-10 14:09:15 -06:00
Renxuan Wang
938e8ed996 Do not throw lookup_failed when resolving fails.
Instead, return an empty Optional<NetworkAddress>. For resolveWithRetry(), still return NetworkAddress because it retries until succeed.
2022-04-08 14:21:49 -07:00
Renxuan Wang
70c49b1b87 DNS cache entry should also be removed in tryGetReplyFromHostname(). 2022-04-08 14:21:49 -07:00
Renxuan Wang
f3a8ac21be Change resolve functions in hostname to return network address. 2022-04-08 14:21:49 -07:00
Renxuan Wang
3290c81f1d Fix tryGetReplyFromHostname(). 2022-04-08 14:21:49 -07:00
Josh Slocum
6276cebad9
Blob integration (#6808)
* Fixing leaked stream with explicit notify failed before destructor

* better logic to prevent races in change feed fetching

* Found new race that makes assert incorrect

* handle server overloaded in initial read from fdb

* Handling more blob error types in granule retry

* Fixing rollback metadata problem, added better debugging

* Fixing version race when fetching change feed metadata

* Better racing split request handling

* fixing assert

* Handle change feed popped check in the blob worker

* fix: do not use a RYW transaction for a versionstamp because of randomize API version (#6768)

* more merge conflict issues

* Change feed destroy fixes

* Fixing change feed destroy and move race

* Check error condition in BG file req

* Using relative endpoints for blob worker interface

* Fixing bug in previous fix

* More destroy and move race fixes

* Don't update empty version on destroy in case it gets rolled back. moved() and removing will take care of ensuring it is not read

* Bug fix (#6796)

* fix: do not use a RYW transaction for a versionstamp because of randomize API version

* fix: if the initialSnapshotVersion was pruned, granule history was incorrect

* added a way to compress null bytes in printable()

* Fixing durability issue with moving and destroying change feeds

* Adding fix for not fully deleting files for a granule that child granules need to re-snapshot

* More destroy and move races

* Fixing change feed destroy and pop races

* Renaming bg prune to purge, and adding a C api and unit test for it

* more cleanup

* review comments

* Observability for granule purging

* better handling for change feed not registered

* Fixed purging bugs (#6815)

* fix: do not use a RYW transaction for a versionstamp because of randomize API version

* fix: if the initialSnapshotVersion was pruned, granule history was incorrect

* added a way to compress null bytes in printable()

* fixed a few purging bugs

Co-authored-by: Evan Tschannen <evan.tschannen@snowflake.com>
2022-04-08 14:15:25 -07:00
Markus Pilman
7631d299bf Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-08 09:58:56 -06:00
Zhe Wu
ebb60f690b Fix valgrind error in HealthMonitor 2022-04-07 15:48:06 -07:00
Markus Pilman
bf956f5630 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-07 13:29:27 -06:00
Markus Pilman
e2d7d4075d multiple bug fixes 2022-04-07 11:08:07 -06:00
Zhe Wu
5fd494a57b Allow worker health monitor to report recent destroyed peers who currently have roles in transaction systems 2022-04-06 13:33:50 -07:00
Renxuan Wang
267c4deaee
Add tryGetReplyFromHostname() and retryGetReplyFromHostname(). (#6761)
* Add hostname to coordination interfaces.

* Add tryGetReplyFromHostname() and retryGetReplyFromHostname().

* Change tryGetReplyFromHostname() to call hostname.resolve().

* Add throw for actor_cancelled.
2022-04-06 10:47:00 -07:00
Renxuan Wang
e548c0d604 Add DNS cache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
ff934ca2ad Change MockDNS to DNSCache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
ebe928e7e1 Throw lookup_failed() when hostname resolving fails. 2022-04-04 15:08:17 -07:00
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Yi Wu
acfee48894 AsyncFileKAIO: add latency histograms 2022-03-30 10:45:04 -07:00
Markus Pilman
b595d4462f Throw error on unauthorized access 2022-03-29 14:58:43 -06:00
sfc-gh-tclinkenbeard
77786f4fc6 Merge remote-tracking branch 'origin/main' into change-data-hall 2022-03-27 12:44:05 -07:00
Steve Atherton
478ff1eb76
Merge pull request #6652 from sfc-gh-anoyes/anoyes/fast-alloc
Move most usage of FastAlloc to malloc
2022-03-26 15:46:05 -07:00
Junhyun Shim
410a422bd7 Use Standalone instead of embedded Arenas
- Repeat TokenSign test 100 times per run instead of 1
- Test for verify fail case
2022-03-25 14:03:37 +01:00
Junhyun Shim
9f3fa5ba9b Fix clang format 2022-03-24 19:12:34 +01:00
Junhyun Shim
99fe104f98 Sign and verify auth tokens for multi-tenant FDB 2022-03-24 19:04:00 +01:00
Andrew Noyes
8a35f03a18 Use allocateFast4kAligned instead of duplicating its logic 2022-03-22 11:01:33 -07:00
Josh Slocum
f27475e2f4 Merge branch 'main' into blob_integration 2022-03-22 11:41:58 -05:00
sfc-gh-tclinkenbeard
a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Markus Pilman
35f7843d84 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-03-21 12:33:53 +01:00
Markus Pilman
1fbeca8038 fix memory issue 2022-03-19 11:03:32 +01:00
Josh Slocum
8205771e8f ReplyPromiseStreams need synchronous disconnect to avoid change feed races 2022-03-18 17:05:12 -05:00
Josh Slocum
37e7c80f26 Merge branch 'main' into blob_integration 2022-03-17 18:45:42 -05:00
Andrew Noyes
68c03a7e32
Jemalloc integration fixes (#6626)
* Set default for USE_JEMALLOC initially in ConfigureCompiler

Instead of trying to change the value later on. This fixes the valgrind
build, which was previously incorrectly getting jemalloc involved.

* Check aligned_alloc result for null

And OOM if so - don't assert

* Check that we can allocate magazines with no internal fragmentation

We may want to do this so that the jemalloc heap profiler has some
knowledge of FastAlloc

* Populate TestFile field for noSim tests in TestHarness

* Remove handling for nonexistent "ActualRun"
2022-03-17 15:17:27 -07:00
Evan Tschannen
7908ea54f3 fix: NetNotifiedQueueWithAcknowledgements could miss disconnects that happen during the delay(0) in deliver() 2022-03-17 09:57:46 -07:00
Markus Pilman
158da462f7 fix memory bug 2022-03-17 14:37:05 +01:00
Markus Pilman
118b53b7cf Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-03-17 12:06:44 +01:00
sfc-gh-tclinkenbeard
320c115c71 Apply clang-format to mis-formatted files 2022-03-16 14:25:32 -07:00
sfc-gh-tclinkenbeard
58de6e22cc Add BalanceOnRequests boolean parameter for ModelInterface 2022-03-16 14:25:32 -07:00
Trevor Clinkenbeard
765e018afb
Merge pull request #6580 from sfc-gh-tclinkenbeard/const-smoother
Mark `Smoother::smooth*` methods `const`
2022-03-16 13:43:25 -07:00
Steve Atherton
59762fc784
Merge pull request #6423 from sfc-gh-satherton/redwood-bug-fixes-and-memory-leak
Redwood memory usage and small memory leak fixes
2022-03-15 16:52:28 -07:00
Markus Pilman
117ee637db Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-03-15 17:17:47 +01:00
Markus Pilman
f346a7ea1c Fix Windows build 2022-03-15 17:15:26 +01:00
Markus Pilman
bed799220a Addressed review comments, added test 2022-03-15 16:57:26 +01:00