218 Commits

Author SHA1 Message Date
Andrew Noyes
6aa0ada7b1 Replace scalar root types with proper messages 2019-08-28 14:40:50 -07:00
Evan Tschannen
9382a58390 fix: after a forced recovery it is possible to not have logs from all generations, so only wait at most a second for getting a popped txs version 2019-08-06 16:32:28 -07:00
Evan Tschannen
4c9a392f05 the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped 2019-08-05 17:01:48 -07:00
Evan Tschannen
653d9be6e2 we cannot pop old generations because it breaks forced recoveries 2019-07-31 18:27:36 -07:00
Evan Tschannen
1ea3ce8f9c txs pops also go to the old generations of tlogs to reduce the chance we have to restart txnStateStore recovery 2019-07-31 18:06:39 -07:00
Evan Tschannen
ff171e293e fix: always make sure to add txsTags to localTags for remote logs 2019-07-31 16:04:35 -07:00
Evan Tschannen
b5cb7919b6 fix: canDiscardPopped was not reset when necessary in all cases 2019-07-30 13:44:44 -07:00
Evan Tschannen
9e3ec2cb33 fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data 2019-07-30 13:25:25 -07:00
Evan Tschannen
45f7b41b48 fix: multi-cursor could discard popped commits after already returning data 2019-07-29 21:36:42 -07:00
Evan Tschannen
5bb322b483 implement popped on bufferedCursor 2019-07-29 21:19:47 -07:00
Evan Tschannen
9a0db74230 fix: forced recovery did not copy txsTags properly 2019-07-28 19:31:53 -07:00
Evan Tschannen
28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy
9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
sramamoorthy
31c010b393 few minor fixes 2019-07-24 15:36:28 -07:00
sramamoorthy
33c2801944 adjut versions to handle KCV > recoveryVersion 2019-07-24 15:36:28 -07:00
Alex Miller
55258709a0 Remove an ASSERT from testing and now inaccurate comment. 2019-07-17 01:30:01 -07:00
Alex Miller
e9684a1f63 Fix issues configuring from sharded txs tag to not
Which is an intermingling of what should be two commits:

1. Rely on TLogVersion instead of txsTags==0

2. Copy and index sharded txsTags between KCV and RV as txsTag when
configuring log_version 4->3.
2019-07-17 01:25:09 -07:00
Alex Miller
95487861be Make sharded txsTag gated on TLogVersion::V4.
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller
9396eedd11 Const some random functions that are trivially const.
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen
b2d8110c13
Update fdbserver/TagPartitionedLogSystem.actor.cpp
Co-Authored-By: Alex Miller <35046903+alexmiller-apple@users.noreply.github.com>
2019-07-12 18:16:44 -07:00
Evan Tschannen
a380dda5e8 fixed a typo 2019-07-10 18:41:12 -07:00
Evan Tschannen
d8948c8be1 Merge branch 'master' into feature-fast-txs-recovery
# Conflicts:
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen
49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00
Alex Miller
44f11702a8 Log Routers will prefer to peek from satellite logs.
Formerly, they would prefer to peek from the primary's logs.  Testing of
a failed region rejoining the cluster revealed that this becomes quite a
strain on the primary logs when extremely large volumes of peek requests
are coming from the Log Routers.  It happens that we have satellites
that contain the same mutations with Log Router tags, that have no other
peeking load, so we can prefer to use the satellite to peek rather than
the primary to distribute load across TLogs better.

Unfortunately, this revealed a latent bug in how tagged mutations in the
KnownCommittedVersion->RecoveryVersion gap were copied across
generations when the number of log router tags were decreased.
Satellite TLogs would be assigned log router tags using the
team-building based logic in getPushLocations(), whereas TLogs would
internally re-index tags according to tag.id%logRouterTags.  This
mismatch would mean that we could have:

    Log0 -2:0 ----- -2:0  Log 0

    Log1 -2:1 \
               >--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0)  Log 1
    Log2 -2:2 /

And now we have data that's tagged as -2:0 on a TLog that's not the
preferred location for -2:0, and therefore a BestLocationOnly cursor
would miss the mutations.

This was never noticed before, as we never
used a satellite as a preferred location to peek from.  Merge cursors
always peek from all locations, and thus a peek for -2:0 that needed
data from the satellites would have gone to both TLogs and merged the
results.

We now take this mod-based re-indexing into account when assigning which
TLogs need to recover which tags from the previous generation, to make
sure that tag.id%logRouterTags always results in the assigned TLog being
the preferred location.

Unfortunately, previously existing will potentially have existing
satellites with log router tags indexed incorrectly, so this transition
needs to be gated on a `log_version` transition.  Old LogSets will have
an old LogVersion, and we won't prefer the sattelite for peeking.  Log
Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly,
and therefore we can safely offload peeking onto the satellites.
2019-07-08 22:25:01 -07:00
Evan Tschannen
15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Alex Miller
ea6898144d Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-07-03 20:44:15 -07:00
Evan Tschannen
79a90d33a7 fix: the push location for txs tags needs to be based on what the tag will become after changing the number of txs tags 2019-07-03 16:06:54 -07:00
Evan Tschannen
4e45a58750 fix: forced recovery did not copy the number of txsTags properly 2019-06-28 20:51:16 -07:00
Evan Tschannen
2c40c818cf fix: txsTags was not copied into oldLogData 2019-06-28 17:51:16 -07:00
Evan Tschannen
7f4586ad49 the number of txsTags needs to be tracked separately from the number of transaction logs because of forced recoveries 2019-06-28 12:33:24 -07:00
Evan Tschannen
2113d6d01e fix: peek all possible txsTags which could have been used by old log sets 2019-06-27 23:39:19 -07:00
Evan Tschannen
52efcfd136 fix: properly create the right number for txsTags when changing between different numbers of logs 2019-06-27 15:15:05 -07:00
sramamoorthy
0a94f96dee sev40 if knownCommittedVersion > recoveryVersion 2019-06-25 16:17:45 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
sramamoorthy
17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy
4bc4c615da exec op to all tlog, restore change in test &other
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
A.J. Beamon
603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Jingyu Zhou
b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
A.J. Beamon
5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Jingyu Zhou
8b5449e608 Fix review comments for PR #1473 2019-04-29 16:45:42 -07:00
Jingyu Zhou
5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
Jingyu Zhou
7cb61c766b Fix tLogLocalities for current LogSet
In toCoreState(), the serialization of current LogSet is different from old
TLog sets. The locality data should be generated, not copied over.

Found by:
-r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on
2019-04-21 10:41:07 -07:00
Jingyu Zhou
9e8ffd2ff7 Refactor OldLogData ctor 2019-04-21 10:41:07 -07:00
Jingyu Zhou
97986a28b7 Replace push_back with emplace_back for efficiency
And better code readability.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
010f825aff Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet 2019-04-21 10:41:07 -07:00
Jingyu Zhou
7befce6bf1 More pseudoLocalities and refactors. 2019-04-21 10:41:07 -07:00
Jingyu Zhou
966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou
82ec80c42f Refactor TLogSet ctor 2019-04-21 10:41:07 -07:00