303 Commits

Author SHA1 Message Date
negoyal
a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Meng Xu
630c29d160 FastRestore:resolve review comments
1) wait on whenAtLeast;
2) Put BigEndian64 into the function call and the decoder to prevent
future people from making the same mistake.
2019-11-11 17:00:16 -08:00
Meng Xu
eb67886b75 FastRestore:Move comment to func definition
Resolve review comments.
2019-11-11 15:10:27 -08:00
Meng Xu
58aa6711e4 FastRestore:ApplyToDB:BugFix:Serialize integer as bigEndian to ensure lexico order 2019-11-03 17:26:07 -08:00
Andrew Noyes
b7b5d2ead3 Remove several nonsensical const uses
These seem to be all the ones that clang's -Wignored-qualifiers
complains about
2019-10-26 14:30:34 -07:00
Jon Fu
f4237ebfff Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-16 11:32:16 -07:00
Meng Xu
71509a5157 FastRestore:Applier:applyToDB:Clang format 2019-10-10 17:36:38 -07:00
Meng Xu
84b5a5525f FastRestore:Add restoreApplierKeys 2019-10-10 17:18:34 -07:00
Jon Fu
471e283128 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-09-18 11:49:07 -07:00
Evan Tschannen
8fbd90e2f6
Merge pull request #1985 from xumengpanda/mengxu/storage-engine-switch-PR-v2
Graceful storage engine migration
2019-09-09 13:51:53 -07:00
Meng Xu
c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu
d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Jon Fu
c908c6c1db added command to fdbcli and changes to SystemData and ManagementAPI 2019-08-27 14:39:43 -07:00
Meng Xu
e6284684f0 StorageEngineSwitch:Always remove wrong storeType SS
In the old logic of switching storage engines, it marks a storage server
with wrong store type as undesired even though this can lead to no healthy team.

In the first version of the new storage engine switch, we mimic the same logic
of the old version.
2019-08-13 14:59:46 -07:00
Meng Xu
a588710376 StorageEngineSwitch:Graceful switch
When fdbcli change storeType for storage engines,
we switch the store type of storage servers one by one gracefully.
This avoids recruiting multiple storage servers on the same process,
which can cause OOM error.
2019-08-12 17:37:52 -07:00
Jingyu Zhou
4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Meng Xu
7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
Evan Tschannen
ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
Meng Xu
3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu
c9c50ceff8 Comments:Add comments to DiskQueue
No functional change.
2019-08-01 15:20:01 -07:00
Meng Xu
7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Meng Xu
b0c31f28af FastRestore:Fix bug that blocks restore
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu
45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Evan Tschannen
94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Balachandar Namasivayam
7489f83a7f Disable/Re-enable consistency check through a database key.
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
mpilman
8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec9d5a347b68bf4fa89bf042f5ac312e.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman
da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Meng Xu
022b555b69 FastRestore:Fix bug in finish restore
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.

This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu
477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Meng Xu
67f5c8b493 FastRestore:Remove performance status
Remove the non-functional code to reduce the code review size.
2019-05-30 20:24:40 -07:00
sramamoorthy
4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy
61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy
898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
Evan Tschannen
b451c2cd56
Merge pull request #1497 from alexmiller-apple/fastrecovery
Add an \xff keyrange that is backed by the txnStateStore.
2019-05-23 10:52:35 -07:00
Meng Xu
f235bb7e0d FastRestore:Use readVersion to trigger watch
Use readVersion to trigger watch on the restoreRequestTriggerKey and
restoreRequestDoneKey.
2019-05-22 13:20:59 -07:00
Evan Tschannen
f4fbaac6b0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-19 10:27:59 -07:00
Evan Tschannen
f3897238f8 added the ability to add a read conflict range on the metadata version key without the READ_SYSTEM_KEYS option 2019-05-15 10:13:38 -07:00
Meng Xu
a08a6776f5 FastRestore: Refactor to smaller components
The current code uses one restore interface to handle the work
for all restore roles, i.e., master, loader and applier.
This makes it harder to review or maintain or scale.

This commit split the restore into multiple roles by mimicing FDB
transaction system:
1) It uses a RestoreWorker as the process to host restore roles;
   This commit assumes one restore role per RestoreWorker; but
   it should be easy to extend to support multiple roles per RestoreWorker;
2) It creates 3 restore roles:
   RestoreMaster: Coordinate the restore process and send commands to the other two roles;
   RestoreLoader: Parse backup files to mutations and send mutations to appliers;
   RestoreApplier: Sort received mutations and apply them to DB in order.

Compilable version. To be tested in correctness.
2019-05-10 14:20:06 -07:00
Alex Miller
797d431934 Add an \xff keyrange that is backed by the txnStateStore. 2019-04-25 17:04:20 -07:00
Meng Xu
529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Meng Xu
092a890da5 FastRestore: Fix MacOS compilation
The bug shown in MacOS compilation may also cause logic error
in the implementation, even in Linux.
2019-04-09 22:37:24 -07:00
mpilman
1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu
c4a8a80d6f Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-04 22:51:00 -07:00
Meng Xu
eb1e880fef FastRestore: Rename RestoreCommandInterface
Rename it to RestoreInterface.
The new name is more general because we will have different type of
RequestStreams for each type of commands.
2019-04-04 13:52:24 -07:00
Evan Tschannen
781cf9b5a0 added the ability to make a zoneId for maintenance in fdbcli 2019-04-01 17:55:13 -07:00