3230 Commits

Author SHA1 Message Date
Meng Xu
4ac92d223b Cleanup batch buffer for each restore request 2020-01-21 14:49:36 -08:00
Meng Xu
e933716109 FastRestore:Enable multiple batch pipelining 2020-01-17 17:01:09 -08:00
Meng Xu
1a130b0df3 FastRestore:Fix race condition on handleApplyToDBRequest 2020-01-17 17:01:09 -08:00
Meng Xu
8d3f3aa926 FastRestore:Pipeline multiple version batches 2020-01-17 17:01:06 -08:00
Meng Xu
441f3e2814 FastRestore:Master buffer data and progress for each batch 2020-01-17 17:01:06 -08:00
Meng Xu
d69bd2f661 FastRestore:Loader buffer data for multiple batches 2020-01-17 17:01:06 -08:00
Meng Xu
bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
Meng Xu
35bc92b9a4 FastRestore:Refactor code to enable pipeline on Applier 2020-01-14 13:23:33 -08:00
Meng Xu
f436ea806e FastRestore:Resolve review comment
1) Sort logfiles by endVersion

2) Exit program early when restore will not succeed

3) Do not increase nextVersion unncessarily when
calculate version batches.

4) Change assert condition that ensures progress in
calculating version batches.
2020-01-13 14:08:27 -08:00
Meng Xu
dba85d28fc FastRestore:Cosmetic revision 2020-01-08 10:53:53 -08:00
Meng Xu
83a572ae22 FastRestore:buildVersionBatches:remove unused variable 2020-01-07 18:24:23 -08:00
Meng Xu
a2b26906e8 FastRestore:Filter out empty files before distributing workload
and clean up unused code
2020-01-07 17:01:53 -08:00
Meng Xu
c29e380076 FastRestore:Remove prevVersion from LoadingParam 2020-01-07 14:59:17 -08:00
Meng Xu
9df02512ab FastRestore:Apply clang-format 2020-01-07 11:50:32 -08:00
Meng Xu
67e913c3d5 Change LoadingParam struct and endVersion definition
1) Remove endVersion field because it has been included in RestoreAsset;

2) Ensure endVersion in VersionBatch and RestoreAsset is always exclusive;

3) Revise ASSERT in laoder and applier in situations when the dummy commit version
is endVersion, to avoid false positive ASSERT failure.
2020-01-07 11:48:03 -08:00
Meng Xu
c3f8f3b445 FastRestore:Build VersionBatch less than threshold size 2020-01-07 11:46:56 -08:00
Jingyu Zhou
45e24fc6a1
Merge pull request #2493 from xumengpanda/mengxu/fast-restore-restoreAsset-PR
Performant restore [13/XX]: Introduce RestoreAsset to uniquely identify the backup block to restore
2019-12-23 16:31:15 -08:00
Meng Xu
c10035ba54 FastRestore:Use isInVersionRange based on code review 2019-12-23 15:01:27 -08:00
Meng Xu
8d6f511816 FastRestore:Resolve review comment
Filter out range mutations that do not overlap with the restore range.
Small changes on format.
2019-12-22 20:09:10 -08:00
Meng Xu
61b29de3ce FastRestore:Self code review
Clean up commented code;
Add sanity check.
2019-12-20 22:24:34 -08:00
Meng Xu
ddcf3fdd80 FastRestore:Apply clang format 2019-12-20 22:00:36 -08:00
Meng Xu
2cd1f0780a FastRestore:Split asset to subasset for async parsing files 2019-12-20 21:44:40 -08:00
Meng Xu
d888e3100b FastRestore:Applier:Add invariant 2019-12-20 19:34:28 -08:00
Meng Xu
e98b2a0d1c FastRestore:Introduce RestoreAsset 2019-12-20 18:00:10 -08:00
Jingyu Zhou
53d196070b
Merge pull request #2485 from xumengpanda/mengxu/change-StringRefReaderMX-PR-v2
Performant restore [12/XX add-on]: Rename StringRefReaderMX to BackupStringRefReader
2019-12-19 15:44:19 -08:00
Jingyu Zhou
db953cc275
Merge pull request #2481 from alexmiller-apple/dq-pop-fixes
Fix an issue where a peek starting from version 1 could crash a TLog
2019-12-19 13:18:28 -08:00
Meng Xu
ffc8f76710 FastRestore:Rename StringRefReaderMX to BackupStringRefReader 2019-12-19 11:49:37 -08:00
Alex Miller
f58507c830 Rename poppedLocationForVersion -> versionForPoppedLocation 2019-12-19 10:24:31 -08:00
Alex Miller
b5d82a74c3
Update fdbserver/TLogServer.actor.cpp
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-12-19 10:20:52 -08:00
Alex Miller
b98107ccab
Update fdbserver/OldTLogServer_6_2.actor.cpp 2019-12-18 11:15:18 -08:00
Alex Miller
d8cbd495af Fix another pop + spill/dq-pop interleaving issue
This fixes an issue introduced in the previous patch, where pop would
immediately set `poppedLocationNeedsUpdate`, but setting the popped
version was now delayed.  This means that we could:

1. Run the spill loop and persist all popped versions
2. Receive a pop, and set the poppedLocationNeedsUpdate flag
3. Run the dq-pop loop, and clear the poppedLocationNeedsUpdate flag

and now when we update the persistentPopped version again, we won't have
the flag set for dq-pop to know that it needs to scan the spilled data
again for the minLocation.

We could more carefully update the flag, but instead, I've just
converted it into a version that's kept in sync purely in the dq-pop
loop, to remove shared state between pop and the dq-pop loop.
2019-12-17 23:15:48 -08:00
Alex Miller
b36062a509 DiskQueue should only pop based off of persisted popped tag versions
This commit is to fix a bug where popping a tag between
updatePersistentData and popDiskQueue can cause the TLog to recover to
an incorrect understanding of what data it has available.

The following series of events need to happen to trigger this bug:

    Tag 1:1 is popped to version 10
    updatePersistentData is run...
      updatePersistentPopped runs and we persistentData stores 1:1 as popped to 10
      A mutation is spilled for 1:1 at version 11 at location 1000
      A mutation is spilled for 1:1 at version 21 at location 5000
    updatePersistentData finishes and commits the btree changes
    Tag 1:1 is popped to version 20
    popDiskQueue runs
      The btree is read for spilled mutations with version >=20
      The minimum location required for the disk queue is found to be location 5000
      The disk queue is popped to location 5000

    The TLog crashes

    The worker restarts, and reloads the TLog files from disk
    restorePersistentPopped restores tag 1:1 as having been popped to version 10
    Parallel peeks are received for tag 1:1 starting at version 0
      The first peek is less than the popped version, so we respond with no data, and an end version of 10
      The second peek starts at version 10, which is greater than the popped version
      The btree is read for spilled mutations, and we find that there is a mutation at version 11 at location 1000
      Location 1000 is read in the DiskQueue

The resulting page read at Location 1000 was popped pre-crash, and thus
might either (a) be corrupt or (b) have an incorrect sequence number.

The fix to this is to force popDiskQueue/updatePoppedLocation to use the
popped version that was persisted to disk, and not the most recently
popped version for the given tag.

This bug doesn't manifest in simulation, because we don't have any code
that peeks at a lower version than what has been popped.
2019-12-17 23:02:37 -08:00
Jingyu Zhou
ded2a301e0
Merge pull request #2443 from xumengpanda/mengxu/fast-restore-fix-valgrind-PR
Performant restore [12/XX]: Code clean up
2019-12-13 14:35:20 -08:00
Meng Xu
97030d9168 FastRestore:Revise and test SevFRMutationInfo
Enabled SevFRMutationInfo for valgrind test, no error found, and disable it again.
Revise debug trace message a bit.
2019-12-13 13:51:21 -08:00
Meng Xu
650be617f1 FastRestore:Add tests to CMakefile 2019-12-12 10:32:13 -08:00
Meng Xu
b5d7890ce0 FastRestore:Resolve review comments 2019-12-12 07:45:30 -08:00
Alvin Moore
0373b1af91 Added missing braces 2019-12-12 07:36:19 -08:00
Alvin Moore
3bf971ba8b Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Meng Xu
9670d64fbd FastRestore:Remove commented code 2019-12-11 16:48:40 -08:00
Meng Xu
1371db4cdc FastRestore:Self code review and cleanup
1. Review memory use cases and improve:
Ensure state varialble is initialized and
change unnecessary  state variable to variable.

2. Remove debug code that is no longer useful;

3. Mute verbose debug.
2019-12-11 16:37:33 -08:00
Meng Xu
9a6dabe47e Merge branch 'mengxu/fastrestore-code-cleanup-PR' into mengxu/fast-restore-fix-valgrind-PR 2019-12-10 20:05:35 -08:00
Meng Xu
feb2a8c70c FastRestore Change RestoreSendMutationVectorVersionedRequest name
Change RestoreSendMutationVectorVersionedRequest to
RestoreSendVersionedMutationsRequest for better naming
2019-12-10 17:23:40 -08:00
Meng Xu
20a19978f9 FastRestore:LoadingParam cleanup 2019-12-10 17:20:44 -08:00
Andrew Noyes
56f1ff7ff6 Test client-side buggify in simulation 2019-12-09 12:55:23 -08:00
Meng Xu
e8dfc1c187 Replace pop_front(size) with new empty standalone obj 2019-12-06 23:16:49 -08:00
Meng Xu
4a66366a05 Use MutationsVec instead of VectorRef 2019-12-06 22:00:40 -08:00
Andrew Noyes
9188344d7b Update fdbserver/SkipList.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 15:44:43 -08:00
Andrew Noyes
46b675a719 Update fdbserver/SkipList.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 15:44:43 -08:00
Andrew Noyes
4263a17188 Change bitMask return type to wordType 2019-12-05 15:44:43 -08:00
Andrew Noyes
604351680b Corresponding fix for lowBits 2019-12-05 15:44:43 -08:00