447 Commits

Author SHA1 Message Date
A.J. Beamon
ff23d5994e
Merge pull request #7729 from sfc-gh-ajbeamon/feature-metacluster
Metacluster
2022-08-04 15:29:44 -07:00
A.J. Beamon
fbe1a4a69a Use multiple databases in the metacluster managemen test. Fix a test bug as well as some issues with setting up multiple extra databases. 2022-08-03 19:10:34 -07:00
He Liu
fa418fd784
Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770)
Co-authored-by: He Liu <heliu@apple.com>
2022-08-03 13:51:40 -07:00
A.J. Beamon
e8e4f3ad3a Merge branch 'main' into feature-metacluster
# Conflicts:
#	fdbclient/include/fdbclient/Tenant.h
2022-07-28 16:53:29 -07:00
He Liu
35a4cb91d5
Disable ShardedRocksDB in simulation when shard_encode_location_metadata is disabled (#7726)
* Disabled tests for ShardedRocks.

Cleaned up ShardedRocks TraceEvent.

Added assertion in ShardManager::validate().

* Added test trace.

* Make sure TraceEvent contains `ShardedRocks`.

* Exclude ShardedRocksDB when SHARD_ENCODE_LOCATION_METADATA is disabled.

Co-authored-by: He Liu <heliu@apple.com>
2022-07-28 13:54:29 -07:00
Junhyun Shim
c6342a6e5b
Merge branch 'main' into features/authz 2022-07-27 20:51:32 +02:00
A.J. Beamon
7c6b3fb0b8 Merge branch 'main' into feature-metacluster 2022-07-27 08:55:10 -07:00
He Liu
edbc373815
Get ShardedRocks ready for simulation test. (#7679)
* Disabled tests for ShardedRocks.

Cleaned up ShardedRocks TraceEvent.

Added assertion in ShardManager::validate().

* Added test trace.

* Make sure TraceEvent contains `ShardedRocks`.

Co-authored-by: He Liu <heliu@apple.com>
2022-07-26 21:49:33 -07:00
Junhyun Shim
e2a3fedfc7
Merge branch 'main' into features/authz 2022-07-27 00:08:57 +02:00
A.J. Beamon
978ca7fb6f Fix some merge related issues 2022-07-20 12:56:00 -07:00
A.J. Beamon
279296c29f Merge branch 'tenant-metadata-change' into feature-metacluster
# Conflicts:
#	fdbclient/SystemData.cpp
#	fdbclient/Tenant.cpp
#	fdbclient/include/fdbclient/SystemData.h
#	fdbclient/include/fdbclient/Tenant.h
#	fdbclient/include/fdbclient/TenantManagement.actor.h
#	fdbserver/TenantCache.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/TenantManagementWorkload.actor.cpp
2022-07-20 09:18:27 -07:00
Josh Slocum
fd9201f60b Merge branch 'main' into cf_metadata_rewrite 2022-07-20 07:55:00 -05:00
Markus Pilman
1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Josh Slocum
b9317bebca Reducing ss injection rates 2022-07-19 14:10:47 -05:00
Josh Slocum
78d4d85f3b Adding non-tss delay injection to SS as well 2022-07-19 09:59:14 -05:00
Josh Slocum
0d9bb9f4a5 Added targeted storage server restarts at critical metadata points 2022-07-19 08:33:43 -05:00
A.J. Beamon
860d3843cc Merge remote-tracking branch 'origin/feature-tenant-groups' into feature-metacluster 2022-07-16 19:33:26 -07:00
Yi Wu
7d7ce0909f
Restart tests carry forward encryption knobs value (#7497)
Previously to get around the issue that EKP is not present when restart test switching encryption from on to off and read encrypted data, EKP was made to start in simulation regardless of encryption knob. This PR revert that change, and instead force restart test not to change encryption knob, by passing previous encryption knob through restartInfo.ini file. Also since we don't allow downgrading an encrypted cluster to previous version, disable encryption in downgrade tests.

Also adding an assert to allow reading encrypted mutations only if encryption knob is on. We may reconsider allowing switching encryption on/off for existing cluster, but for now we don't allow it.
2022-07-14 14:45:17 -07:00
A.J. Beamon
cb499fa5db Merge branch 'main' into feature-metacluster 2022-07-13 15:28:34 -07:00
Junhyun Shim
ac5436a090 Catch up on changes to main 2022-07-12 16:48:09 +02:00
Markus Pilman
2edbcf2c65
Merge pull request #44 from apple/main
Merge main
2022-07-12 07:51:22 -06:00
Junhyun Shim
545a9a8043 Fix bugs and add token timeout-in-cache test 2022-07-11 16:58:04 +02:00
He Liu
bc5bfaffda
Shard based move (#6981)
* Shard based move.

* Clean up.

* Clear results on retry in getInitialDataDistribution.

* Remove assertion on SHARD_ENCODE_LOCATION_METADATA for compatibility.

* Resolved comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-07-07 20:49:16 -07:00
A.J. Beamon
e1a93988ef Merge branch 'main' into feature-metacluster 2022-06-28 14:58:07 -07:00
Markus Pilman
3aaae9c521 Merge remote-tracking branch 'origin/main' into features/authz 2022-06-27 11:07:14 -06:00
Markus Pilman
a47ed89018 Linux fixes and addressed review comments 2022-06-23 20:52:13 -06:00
Markus Pilman
38e100ebc5 flow bindings are compiling 2022-06-23 19:06:05 -06:00
Markus Pilman
ffaf15c12a moved wellknownendpoints and fixed some includes 2022-06-23 17:03:53 -06:00
A.J. Beamon
9f3819752f Change the command to create a metacluster from using 'configure tenant_mode=management' to 'metacluster create <NAME>'. Distribute this name to all processes in a metacluster. Eliminate the tenant mode entirely from metacluster clusters, instead relying on a metacluster registration key. 2022-06-22 12:15:43 -07:00
Markus Pilman
13d8b13722 Migrated Authz code to use JWT 2022-06-13 18:20:27 -06:00
Markus Pilman
799fe32346 Merge remote-tracking branch 'origin/main' into features/authz 2022-06-13 18:02:11 -06:00
A.J. Beamon
986dd67278 Add some basic support for running multiple extra clusters in simulation. Use this to simulate a metacluster in some tests. 2022-06-10 10:08:18 -07:00
Josh Slocum
ad54be5d0b Enabling tenant tests for blob granules and fixing bug 2022-05-25 17:16:56 -05:00
A.J. Beamon
7ba2be6f30 Update simulator to sometimes create tenants it doesn't use. Fix bug in mapped range where it would fail if any tenants existed and transactions were run not using the tenants. 2022-05-05 10:03:30 -07:00
Steve Atherton
338d2304d7
Bug fix: Killing a machine process would not wait for AsyncFileNonDurable close operations to finish, causing a later reopen of the same file in a new process to hang forever. Renamed AsyncFileNonDurable::deleteFile to closeFile for clarity. Renamed Machine deletingFiles to deletingOrClosingFiles for clarity. (#7007) 2022-04-29 14:01:18 -07:00
Renxuan Wang
c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Renxuan Wang
ec95fb7e74 Disable hostname in simulations.
Hostnames are causing simulation failures. Turn on again in #6926.
2022-04-25 11:05:25 -07:00
Markus Pilman
21336fbf12 Merge remote-tracking branch 'origin/main' into features/authz 2022-04-25 08:45:23 -06:00
Markus Pilman
ccf97eb187 Make transport work 2022-04-22 17:06:05 -06:00
Bharadwaj V.R
ed08cfbf52
Merge branch 'apple:main' into block-down 2022-04-22 06:19:38 -07:00
Zhe Wang
6c9ff6ee5e
Add sharded rocksdb type (#6862)
* add-sharded-rocksdb-type

* address comments

Co-authored-by: Zhe Wang <zhewang@Zhes-MacBook-Pro.local>
2022-04-21 22:53:14 -04:00
Markus Pilman
436d15ea98 Implemented token management and testing
Code is still untested at this point, but testing code was written

Implemented token management and testing

Code is still untested at this point, but testing code was written
2022-04-16 11:29:13 -06:00
Bharadwaj V.R
e4f22e5394
Merge branch 'apple:main' into block-down 2022-04-12 10:50:02 -07:00
Markus Pilman
bf956f5630 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-07 13:29:27 -06:00
Markus Pilman
e2d7d4075d multiple bug fixes 2022-04-07 11:08:07 -06:00
Bharadwaj V.R
58a6442d1f Add tracing and fix the version write and update logic 2022-04-06 21:06:01 -07:00
Chaoguang Lin
7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
He Liu
dd15489605 rename ssd-rocksdb-experimental as ssd-rocksdb-v1. 2022-03-29 10:53:38 -07:00
sfc-gh-tclinkenbeard
77786f4fc6 Merge remote-tracking branch 'origin/main' into change-data-hall 2022-03-27 12:44:05 -07:00
Josh Slocum
f27475e2f4 Merge branch 'main' into blob_integration 2022-03-22 11:41:58 -05:00