19934 Commits

Author SHA1 Message Date
Jingyu Zhou
f68fd28d73 Refactor duplicated code into IKnobCollection::setupKnobs() 2022-04-05 02:06:38 -07:00
Steve Atherton
6c8eca061a In redwood unit test, only reopen the btree prior to the destructive sanity check half the time. 2022-04-04 19:29:32 -07:00
Renxuan Wang
465ff712b6
Move Hostname to its own files. (#6759)
* Change DNS cache to use std::map.

Revert commit 90c259d84e95dd35e01149c0a86bd18e82e33930, because if we use unordered_map, toString() can be inconsistent.

* Move ClientKnob::COORDINATOR_HOSTNAME_RESOLVE_DELAY to FlowKnob::HOSTNAME_RESOLVE_DELAY.

* Move Hostname to its own files.

Also, add resolve-related variables and functions in Hostname.
2022-04-04 19:04:51 -07:00
Ray Jenkins
bb9b9d2471
OpenTelemetry API Tracing. (#6478)
* OTEL Span Implementation.

* Addi trace logging, refactor constructors, unit tests.

* Unit tests for creating OTELSpans

* refactor flag names

* Additional comments.

* Formatting.

* Add back Arena.h include

* cleanup header includes

* Remove include cstddef.

* Remove memory include.

* Remove trailing commas on enums.

* Enum formatting.

* Changing SpanStatus enum from ERROR to ERR to see if it is clashing with Windows.h.

* Move OTELEvents to SmallVectorRef<KeyValueRef>.

* Clean up unused includes.

* Unit tests

* Const reference arguments for OTEL constructors and additional addAttribute
unit tests. Adding return of OTELSpan reference on addAttribute.

* Formatting.

* Begin messagepack encoding tests.

* Formatting.

* MessagePack encoding unit tests.

* Formatting.

* Remove swapBinary.

* remove ambiguous helper methods

* Formatting fixes

* Fix ambiguous calls in AddEvents unit tests.

* Include AddAttributes unit test.

* descope windows for UDP encoding test

* Move ifndef WIN32 around MPEncoding unit test.

* Fix AddEvents Attributes size assertion.

* Formatting.

* Enable AddLinks unit test.

* Full MP encoding testing.

* Fix for encoding longer strings with MessagePack and unit test.

* Remove unnecessary header includes and serialize_string_ref function.

* Fix typos

* Update flow/Tracing.actor.cpp

Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com>

* Update flow/Tracing.actor.cpp

Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com>

* Use ASSERT_WE_THINK and add logging.

We don't want people creating incredibly large traces, so we are only
supporting a subset of MessagePack collection and string sizes. Assert
and log when we hit these unsupported sizes.

* Remove TODOs no longer applicable.

* Refactor OTELEvent to OTELEventRef.

* Remove unnecessary public declaration in struct.

* fix OTELEventRef attribute size assertion

* Formatting

Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com>
2022-04-04 17:55:38 -07:00
Renxuan Wang
5a336655f1 Use unordered_map in DNS cache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
7da31857b7 Address comments. 2022-04-04 15:08:17 -07:00
Renxuan Wang
e548c0d604 Add DNS cache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
ff934ca2ad Change MockDNS to DNSCache. 2022-04-04 15:08:17 -07:00
Renxuan Wang
ebe928e7e1 Throw lookup_failed() when hostname resolving fails. 2022-04-04 15:08:17 -07:00
Jingyu Zhou
5861ff2dc6
Merge pull request #6717 from sfc-gh-ajbeamon/thread-future-safety-check
Disallow anonymous standalone thread futures in safeThreadFutureToFuture
2022-04-04 13:39:37 -07:00
Chaoguang Lin
c8455237ea Fix the bug where use the pointer after it's cleaned 2022-04-04 11:49:41 -07:00
Xiaoge Su
6b69c439f0 Allowing globally knob change in TOML file based test
In commit 99b030c2f63a3c9ad92ed56aa2b5709322a4cb06, it is allowed to set
knob values in TOML file per single test, using syntax

[[test]]
    [[test.knobs]]
    knob_key = knob_value

the knob key/value pairs are changed before the TEST_CASE starts, then
reverted after TEST_CASE completes.

With this patch, it is possible to *globally* update the knob value,
i.e.

[[knobs]]
enable_encryption = true

[[test]]
testTitle = 'EncryptKeyProxy'

    [[test.workload]]
    testName = 'EncryptKeyProxyTest'

This is manually tested by printing out knob key/value pairs. Also
tested using Ata's EncryptKeyProxy test code by enabling
enable_encryption key.
2022-04-04 11:17:32 -07:00
Josh Slocum
cb918b9cef Added basic blob granule consistency check 2022-04-04 11:38:42 -05:00
Josh Slocum
268caa5ac8 fixing shard size knobs outside of simulation 2022-04-04 11:38:18 -05:00
Dan Lambright
0d60764b25
Merge pull request #6742 from jzhou77/vv
Merge main into version vector
2022-04-04 11:46:57 -04:00
Steve Atherton
23a27d78db Changed DeltaTree2::Cursor to hold a Reference<DecodeCache> instead of a pointer to enable more flexible DecodeCache lifetimes. 2022-04-04 03:38:50 -07:00
Steve Atherton
1033f64da2 Removed BTree header parse from cursor creation, which recently became necessary, by storing the extracted root link in the PagerSnapshot. Refactored ArenaPage::userData and userDataDestructor to a generic ArbitraryObject class which can own arbitrary heap allocated objects or references and has a cleaner interface. Added ArbitraryObject to IPagerSnapshot. 2022-04-04 02:45:08 -07:00
Steve Atherton
f0c01d74ac Remove warnings that were just marking TODOs. 2022-04-03 23:50:19 -07:00
Steve Atherton
a5ff1e5f0e Code simplification and minor overhead reduction around snapshot expiration and oldest effective version checking. 2022-04-03 23:49:31 -07:00
Steve Atherton
d7838982f9 Code simplification and minor overhead reduction - don't check multi-page BTree nodes for remapped pages because they no longer can contain any. 2022-04-03 23:48:54 -07:00
sfc-gh-tclinkenbeard
70f378bacc Restrict write access to getUnhealthyRelocationCount 2022-04-03 23:47:54 -07:00
sfc-gh-tclinkenbeard
33fb6ab983 Prevent coordFaultTolerance from dropping below 0 2022-04-03 23:37:42 -07:00
sfc-gh-tclinkenbeard
4f61c86b69 Add MAX_COORDINATOR_SNAPSHOT_FAULT_TOLERANCE knob 2022-04-03 23:28:57 -07:00
sfc-gh-tclinkenbeard
253db642be Add MAX_SNAPSHOT_FAULT_TOLERANCE knob 2022-04-03 22:31:45 -07:00
Steve Atherton
dbccd47650 Added pageFormat to ArenaPage header. 2022-04-03 03:06:53 -07:00
Steve Atherton
da339ed813 Disable Redwood for upgrades from 7.0 or downgrades to 7.0. 2022-04-03 02:37:37 -07:00
Steve Atherton
3b76b9d9cc Apply clang-format. 2022-04-03 00:44:02 -07:00
Steve Atherton
47d1a7b373 Merge commit '38190ad7e787d759f88687e83af0ebabdbc600e8' into redwood-header-changes
# Conflicts:
#	flow/error_definitions.h
2022-04-03 00:39:53 -07:00
Steve Atherton
d0152d8442 Fix error description. 2022-04-03 00:37:37 -07:00
Steve Atherton
39fb0a44d7 Merge commit 'f09bdc840c00d712487500b9e752d87cedb1964a' into redwood-header-changes
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
2022-04-03 00:37:01 -07:00
Jingyu Zhou
4200ffe53b Tune down BLOCKING_PEEK_TIMEOUT to 0.4s
Many simulation failures seems to be due to this knob being too high:
when a read range request comes, the version is bumped by about 1M, which
causes read version to be before storage version.
2022-04-02 12:26:35 -07:00
Steve Atherton
38190ad7e7
Merge pull request #6737 from sfc-gh-satherton/fix-storage-timestamps
Change storage metadata and perpetual wiggle timestamps to double epoch seconds
2022-04-02 09:47:23 -07:00
Jingyu Zhou
64d4658034 Merge branch 'main' into vv
Fix Conflicts:
	flow/error_definitions.h
2022-04-01 21:49:24 -07:00
Steve Atherton
b179813989 Updated status schema and fixed spacing. 2022-04-01 17:21:35 -07:00
Steve Atherton
6eb1c2ae48
Merge pull request #6574 from sfc-gh-satherton/redwood-rare-bugs
Rare correctness bug fixes in Redwood
2022-04-01 16:40:22 -07:00
Josh Slocum
377e252fcf
Better split sizing in blob manager (#6725) 2022-04-01 16:09:46 -07:00
Jingyu Zhou
a1be7abdad Clear VV cache instead of throw 2022-04-01 15:19:24 -07:00
Jingyu Zhou
7cd5ef711d Fix test failure to BlobGranule due to missing private mutations
The change feed metadata mutations use \xff\x02/feed/ prefix, which was not
considered as "metadata mutations", thus not sent to the resolvers. This makes
the private mutation generation not possible for change feed if the knob
PROXY_USE_RESOLVER_PRIVATE_MUTATIONS is on. Fix by making it a metadata
mutation.
2022-04-01 14:23:03 -07:00
Jon Fu
c46ad3ce75 Only clear state map if it isn't the initial protocol version change from a null value 2022-04-01 13:51:57 -04:00
Bharadwaj V.R
d248b73df5
Merge pull request #6648 from sfc-gh-bvr/ssupdateb4registration
Update StorageServers before registering SSI
2022-03-31 20:11:19 -07:00
Bharadwaj V.R
f749aac223
Merge branch 'apple:main' into ssupdateb4registration 2022-03-31 18:59:44 -07:00
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Bharadwaj V.R
8ff3b7d8a2
Merge branch 'apple:main' into ssupdateb4registration 2022-03-31 16:12:06 -07:00
Xiaoxi Wang
c7d2f5fee2
Merge pull request #6739 from sfc-gh-jslocum/ddq_assert
fix destination limiting and cancelling logic in move_to_removed_serv…
2022-03-31 14:22:09 -07:00
Tao Lin
001909be08
Fixes for when getMappedRange cannot parse as tuple (#6665) 2022-03-31 14:06:45 -07:00
He Liu
966caadb3e
Merge pull request #6706 from kakaiu/Fix-block-cache-recreation-issue
Fix RocksDB Block Cache Recreation Problem
2022-03-31 13:50:15 -07:00
A.J. Beamon
dee1293da2
Merge pull request #6731 from sfc-gh-ajbeamon/fix-tenant-memory-leak
Use a TenantState object in the MVC implementation to help manage tenant lifetime
2022-03-31 13:35:20 -07:00
A.J. Beamon
b6a0eeda11
Merge pull request #6740 from sfc-gh-ajbeamon/docs-tenant-open-note
Add a note that opening a tenant does not check whether that tenant exists in the cluster
2022-03-31 13:35:04 -07:00
Josh Slocum
9e06881673 fix destination limiting and cancelling logic in move_to_removed_server case 2022-03-31 14:05:15 -05:00
A.J. Beamon
5469b57a2b Add a note that opening a tenant does not check whether that tenant exists in the cluster 2022-03-31 11:39:50 -07:00