98 Commits

Author SHA1 Message Date
chaoguang
b76baa9cc2 Update conflicting keys related constants 2020-04-03 16:08:57 -07:00
Andrew Noyes
289487559d Revert "Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key""
This reverts commit 804fe1b22e0a5b7f0d8fe87fe86881bfe0928546.
2020-03-24 18:11:15 -07:00
Balachandar Namasivayam
804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933e0f606de07e2b855c62a9ed828cd3a, reversing
changes made to 487d131b38d12f8e766985c77b49e60234acac9e.
2020-03-19 21:34:28 -07:00
chaoguang
0094293d50 add const vars 2020-03-11 23:11:49 -07:00
chaoguang
d1c56d3b57 add constant KeyRefs in SystemData 2020-03-11 12:25:50 -07:00
chaoguang
7a76e9556d Merge remote-tracking branch 'upstream/master' into report-conflicting-key 2020-03-04 11:24:39 -08:00
Meng Xu
a12a161fb3 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-02-18 14:49:52 -08:00
Jingyu Zhou
5a602f58e8 Start backup with a wait on all backup workers running
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.

Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Meng Xu
16f9ec45bd Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-23 20:15:21 -08:00
Jingyu Zhou
c08a192c75 Add a backup start key
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
19d6a889ff Recruit backup workers for old epochs
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
41f0cf2bb5 Add decode function for backup progress 2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26 Enable pop from backup workers
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Meng Xu
bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
chaoguang
10719200c3 A hack way to call API through getRange("\xff\xff/conflicting_keys\<start_key>", "\xff\xff/conflicting_keys\<end_key>"). 2020-01-06 11:22:11 -08:00
negoyal
d46c7ded59 Merge remote-tracking branch 'origin/master' into storage-cache-subfeature1 2019-11-14 17:52:22 -08:00
negoyal
a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Meng Xu
58aa6711e4 FastRestore:ApplyToDB:BugFix:Serialize integer as bigEndian to ensure lexico order 2019-11-03 17:26:07 -08:00
Andrew Noyes
b7b5d2ead3 Remove several nonsensical const uses
These seem to be all the ones that clang's -Wignored-qualifiers
complains about
2019-10-26 14:30:34 -07:00
Andrew Noyes
de8921b660 Move RestoreWorkerInterface to fdbclient 2019-10-25 10:42:22 -07:00
Andrew Noyes
d4de608bb6 Fix OPEN_FOR_IDE build 2019-10-25 10:42:22 -07:00
Jon Fu
f4237ebfff Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-16 11:32:16 -07:00
Meng Xu
84b5a5525f FastRestore:Add restoreApplierKeys 2019-10-10 17:18:34 -07:00
Jon Fu
471e283128 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-09-18 11:49:07 -07:00
Meng Xu
d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Jon Fu
c908c6c1db added command to fdbcli and changes to SystemData and ManagementAPI 2019-08-27 14:39:43 -07:00
Meng Xu
7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
Evan Tschannen
ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
Meng Xu
9cc832cfd6 FastRestore:Fix Mac and Windows compilation error 2019-08-02 14:33:08 -07:00
Meng Xu
3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu
7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
Xin Dong
1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong
ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
Xin Dong
4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Meng Xu
b0c31f28af FastRestore:Fix bug that blocks restore
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu
45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Balachandar Namasivayam
7489f83a7f Disable/Re-enable consistency check through a database key.
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00
Meng Xu
022b555b69 FastRestore:Fix bug in finish restore
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.

This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu
477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
sramamoorthy
4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
Evan Tschannen
b451c2cd56
Merge pull request #1497 from alexmiller-apple/fastrecovery
Add an \xff keyrange that is backed by the txnStateStore.
2019-05-23 10:52:35 -07:00
Meng Xu
fac63a83c4 FastRestore:Use NotifiedVersion to deduplicate requests
Add a NotifiedVersion into an applier data which represents
the smallest version the applier is at.

When a loader sends mutation vector to appliers, it sends
the request that contains prevVersion and commitVersion.

This commits also put actor into an actorCollector for
loop-choose-when situation.
2019-05-22 22:09:54 -07:00
Meng Xu
f235bb7e0d FastRestore:Use readVersion to trigger watch
Use readVersion to trigger watch on the restoreRequestTriggerKey and
restoreRequestDoneKey.
2019-05-22 13:20:59 -07:00
Evan Tschannen
f3897238f8 added the ability to add a read conflict range on the metadata version key without the READ_SYSTEM_KEYS option 2019-05-15 10:13:38 -07:00
Meng Xu
a08a6776f5 FastRestore: Refactor to smaller components
The current code uses one restore interface to handle the work
for all restore roles, i.e., master, loader and applier.
This makes it harder to review or maintain or scale.

This commit split the restore into multiple roles by mimicing FDB
transaction system:
1) It uses a RestoreWorker as the process to host restore roles;
   This commit assumes one restore role per RestoreWorker; but
   it should be easy to extend to support multiple roles per RestoreWorker;
2) It creates 3 restore roles:
   RestoreMaster: Coordinate the restore process and send commands to the other two roles;
   RestoreLoader: Parse backup files to mutations and send mutations to appliers;
   RestoreApplier: Sort received mutations and apply them to DB in order.

Compilable version. To be tested in correctness.
2019-05-10 14:20:06 -07:00
Meng Xu
25c75f4222 FastRestore: Add new empty files for restore roles
Add .h and .cpp files for RestoreLoader and RestoreApplier roles.
We will split the code for each restore role into a separate file.

This commit also fixes the bug in including RestoreCommon.actor.h, and
remove the unused code.
2019-05-06 16:59:41 -07:00
Alex Miller
797d431934 Add an \xff keyrange that is backed by the txnStateStore. 2019-04-25 17:04:20 -07:00
Meng Xu
c4a8a80d6f Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-04 22:51:00 -07:00
Meng Xu
eb1e880fef FastRestore: Rename RestoreCommandInterface
Rename it to RestoreInterface.
The new name is more general because we will have different type of
RequestStreams for each type of commands.
2019-04-04 13:52:24 -07:00