212 Commits

Author SHA1 Message Date
Jingyu Zhou
41f0cf2bb5 Add decode function for backup progress 2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26 Enable pop from backup workers
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Meng Xu
bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
mpilman
d3d6016c90 Merge remote-tracking branch 'negoyal/fdb_cache_subfeature2' into features/cache-initialization 2020-01-07 19:53:09 -08:00
negoyal
29b77863f0 Cache warmup and Consistency check workload changes. 2020-01-07 13:06:58 -08:00
chaoguang
10719200c3 A hack way to call API through getRange("\xff\xff/conflicting_keys\<start_key>", "\xff\xff/conflicting_keys\<end_key>"). 2020-01-06 11:22:11 -08:00
mpilman
821edcb207 Register caches through keyspace
This also removes the old mechanism that registers them
through the serverDBInfo.

Caches do now self-recruit at startup
2019-12-06 13:28:44 -08:00
negoyal
cf2563f1c7 Mix of various things, a lot of which will change. 2019-12-05 17:10:32 -08:00
negoyal
d46c7ded59 Merge remote-tracking branch 'origin/master' into storage-cache-subfeature1 2019-11-14 17:52:22 -08:00
negoyal
a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Meng Xu
630c29d160 FastRestore:resolve review comments
1) wait on whenAtLeast;
2) Put BigEndian64 into the function call and the decoder to prevent
future people from making the same mistake.
2019-11-11 17:00:16 -08:00
Meng Xu
eb67886b75 FastRestore:Move comment to func definition
Resolve review comments.
2019-11-11 15:10:27 -08:00
Meng Xu
58aa6711e4 FastRestore:ApplyToDB:BugFix:Serialize integer as bigEndian to ensure lexico order 2019-11-03 17:26:07 -08:00
Andrew Noyes
b7b5d2ead3 Remove several nonsensical const uses
These seem to be all the ones that clang's -Wignored-qualifiers
complains about
2019-10-26 14:30:34 -07:00
Jon Fu
f4237ebfff Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-16 11:32:16 -07:00
Meng Xu
71509a5157 FastRestore:Applier:applyToDB:Clang format 2019-10-10 17:36:38 -07:00
Meng Xu
84b5a5525f FastRestore:Add restoreApplierKeys 2019-10-10 17:18:34 -07:00
Jon Fu
471e283128 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-09-18 11:49:07 -07:00
Evan Tschannen
8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu
c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu
d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Jon Fu
c908c6c1db added command to fdbcli and changes to SystemData and ManagementAPI 2019-08-27 14:39:43 -07:00
Meng Xu
e6284684f0 StorageEngineSwitch:Always remove wrong storeType SS
In the old logic of switching storage engines, it marks a storage server
with wrong store type as undesired even though this can lead to no healthy team.

In the first version of the new storage engine switch, we mimic the same logic
of the old version.
2019-08-13 14:59:46 -07:00
Meng Xu
a588710376 StorageEngineSwitch:Graceful switch
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Jingyu Zhou
4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Meng Xu
7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
Evan Tschannen
ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
Meng Xu
3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu
c9c50ceff8 Comments:Add comments to DiskQueue
No functional change.
2019-08-01 15:20:01 -07:00
Meng Xu
7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Meng Xu
b0c31f28af FastRestore:Fix bug that blocks restore
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu
45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Evan Tschannen
94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam
7489f83a7f Disable/Re-enable consistency check through a database key.
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
mpilman
8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec9d5a347b68bf4fa89bf042f5ac312e.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman
da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Meng Xu
022b555b69 FastRestore:Fix bug in finish restore
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.

This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu
477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Meng Xu
67f5c8b493 FastRestore:Remove performance status
Remove the non-functional code to reduce the code review size.
2019-05-30 20:24:40 -07:00
sramamoorthy
4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy
61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy
898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
Evan Tschannen
b451c2cd56
Merge pull request #1497 from alexmiller-apple/fastrecovery
Add an \xff keyrange that is backed by the txnStateStore.
2019-05-23 10:52:35 -07:00
Meng Xu
f235bb7e0d FastRestore:Use readVersion to trigger watch
Use readVersion to trigger watch on the restoreRequestTriggerKey and
restoreRequestDoneKey.
2019-05-22 13:20:59 -07:00
Evan Tschannen
f4fbaac6b0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-19 10:27:59 -07:00