5178 Commits

Author SHA1 Message Date
Jingyu Zhou
bfd3328448 Fix a race between submit and abort backup
After submit a backup, immediately abort the backup may cause a rare race
condition, which results in BackupCorrectnessLeftoverVersionKey error.

Specifically, in the StartFullBackupTaskFunc:
1st Txn sets the destUid at the source database and the 2nd Txn writes the dest
DB.

An abort can come after the 1st Txn succeeds, and clears the config rage so
that the 2nd Txn above would fail. Because 2nd Txn didn't write destUid, the
3rd Txn of abort can't read the correct source DB for latestVersionKey, which
contains the destUid value.

The fix is to let the 1st Txn of abort to wait until destUid becomes valid.
2020-10-18 23:11:15 -07:00
Andrew Noyes
70c1ac2131 Use TraceEvent::error 2020-10-16 16:55:09 -07:00
Xin Dong
8d0aa02a63 Do not periodically print detailed DD teams info 2020-10-16 16:11:14 -07:00
Andrew Noyes
9dec4bc46a Add ErrorCode to StorageServerTrackerCancelled trace event 2020-10-16 15:40:14 -07:00
Jingyu Zhou
8f17a1a5d6 Merge branch 'release-6.2' into release-6.3 2020-10-16 15:25:39 -07:00
Andrew Noyes
30488df5ea Fix build 2020-10-16 14:26:40 -07:00
Andrew Noyes
81193a9226 Move TraceEvent upward 2020-10-16 14:23:27 -07:00
Andrew Noyes
1e0e800751 Fix build 2020-10-16 12:10:07 -07:00
Andrew Noyes
2b87627d1b Check for cancellation after errorOut.sendError(e) 2020-10-16 12:10:07 -07:00
Xin Dong
92e31dd338 Address review comments 2020-10-15 15:25:00 -07:00
Andrew Noyes
15dbfc0bc4
Merge pull request #3908 from sfc-gh-clin/fix-issue-3905
Fix issue #3905
2020-10-15 14:46:32 -07:00
Chaoguang Lin
3109aa7221 (Previous commit did not catch the change)Increase the probability to generate keys after \xff\xff to test special key framework code 2020-10-15 14:11:04 -07:00
Chaoguang Lin
c145dd5824 Increase the probability to generate keys after \xff\xff to test special key framework code 2020-10-15 14:09:27 -07:00
Xin Dong
1d43729cc9 Added a way to print detailed information about team collection for debugging. 2020-10-15 10:01:56 -07:00
A.J. Beamon
b644969788 Add error checking for a getReadVersion call in a test. 2020-10-15 09:17:16 -07:00
Steve Atherton
0b46af2925 Added a simulation-only check for pager remap cleanup writing the same page twice in a cleanup cycle, which should never happen, but a bug leading to this was fixed recently. Adjusted some buggify logic to widen edge case coverage around remap cleanup parameters. 2020-10-14 21:48:03 -07:00
Chaoguang Lin
bf00369576 getRange only enters special key space codepath when both begin key and end key are in (\xff\xff, \xff\xff\xff) 2020-10-14 16:57:38 -07:00
Daniel Smith
a99b68a7e3 Add knob for tuning number of RocksDB read threads 2020-10-14 22:07:46 +00:00
Steve Atherton
dc35f2b4f5 Bug fix: In page remap cleanup, if a page update's next update was at exactly the oldest retained version then the earlier update would still choose to copy the updated page over top of the original (but it shouldn't) which would race with the later update's copy if the same remap cleanup cycle pops both updates from the queue. 2020-10-13 02:26:46 -07:00
Meng Xu
89469921bb
Merge pull request #3891 from etschannen/feature-reset-proxy-connections
Reset a proxy's network connection with the master or resolvers if it is too far behind
2020-10-12 11:21:24 -07:00
Evan Tschannen
1378ecba4d If a proxy is sufficiently far behind, reset network connections to attempt to fix the problem 2020-10-11 23:06:26 -07:00
A.J. Beamon
3b66a1f2d4 Fix a couple places where we were creating vectors with default elements rather than reserving space. 2020-10-09 10:51:06 -07:00
Daniel Smith
2671157f8f Merge branch 'rocksdb-data-estimate' into rocksdb-unsafe-fsync 2020-10-09 16:56:54 +00:00
Daniel Smith
6e287eb0d1 Merge remote-tracking branch 'upstream/release-6.3' into rocksdb-unsafe-fsync 2020-10-09 16:53:05 +00:00
Daniel Smith
a9301f78da Merge remote-tracking branch 'upstream/release-6.3' into rocksdb-data-estimate 2020-10-08 23:01:04 +00:00
Russell Sears
7543b1efb3
Merge branch 'release-6.3' into rocksdb-lz4 2020-10-06 16:59:34 -07:00
Daniel Smith
4c89e38a29 Fix static linking of lz4 2020-10-06 18:22:03 +00:00
Evan Tschannen
efe50b68e6 fix compile error 2020-10-05 14:16:52 -07:00
Evan Tschannen
7ba06a4434 fix: min and max compute estimate logging on the proxy was always zero
added comments and fixed formatting
2020-10-05 12:35:10 -07:00
Evan Tschannen
5807b1ec3d changed the recent requests to be the per second amount; increased precision of cpu estimate 2020-10-04 19:31:40 -07:00
Evan Tschannen
f546034366 do not prevent computePerOperation from being updated for small computeDurations. Added logging for the compute per operation. Protect against erroneously large compute estimates 2020-10-04 19:19:05 -07:00
Evan Tschannen
da26b0411c increased the proxy commit memory limit 2020-10-04 19:16:51 -07:00
Evan Tschannen
52a6496a54 fix compiler errors 2020-10-04 16:50:54 -07:00
Evan Tschannen
614c8bc895 Get read versions requests must be load balanced on the number of requests because ratekeeper gives out an equal budget to each proxy 2020-10-04 16:20:24 -07:00
sfc-gh-tclinkenbeard
91a8367acb Avoid slow task in ~DataDistributionTracker 2020-10-01 11:44:55 -07:00
Evan Tschannen
b1180f8eb4 fixed naming and comments 2020-09-30 20:35:09 -07:00
Evan Tschannen
b1570c740f extraTlogEligileZones should consider the database available both during a failover and also if the cluster cannot recruit tlogs in the remote region 2020-09-30 18:10:04 -07:00
Evan Tschannen
8c729ca8e6 only add additional fault tolerance for availability if automatic failover is enabled 2020-09-30 18:04:23 -07:00
Evan Tschannen
9f61039858 more fixes 2020-09-30 16:52:58 -07:00
Evan Tschannen
d7454ac7da fixed compile error 2020-09-30 16:49:36 -07:00
Evan Tschannen
2a279f64af Merge branch 'release-6.3' into feature-fix-fault-tolerance 2020-09-30 16:42:18 -07:00
Evan Tschannen
fe5c30e778 fault tolerance was not being properly increased when usable regions was 2 and satellites are configured. 2020-09-30 16:41:00 -07:00
Meng Xu
3aa92286aa FastRestore:Fix segmentation fault 2020-09-29 22:28:52 -07:00
Trevor Clinkenbeard
c613fc6dee
Merge pull request #3761 from sfc-gh-tclinkenbeard/document-watchbytes-overhead
Add comments for WATCH_OVERHEAD_BYTES
2020-09-26 20:39:27 -07:00
Steve Atherton
58e043c7a5 Enable run loop profiler for test and multitest roles. 2020-09-24 14:14:55 -07:00
Xin Dong
de5b0abb92
Merge pull request #3806 from xumengpanda/mengxu/fix-typo-PR
Fast Restore: Fix a typo in FastRestoreApplerPhaseApplyTxnStart event name
2020-09-23 17:11:59 -07:00
Meng Xu
5214becaa8 FR:Fix typo for event FastRestoreApplerPhaseApplyTxnDone 2020-09-23 16:43:35 -07:00
Xin Dong
feb3bda79e
Merge pull request #3797 from xumengpanda/mengxu/fr-write-traffic-control-PR
Fast Restore: Add write rate control
2020-09-23 15:50:08 -07:00
Meng Xu
262307d557 FR:Change applierRemainMB map to unordered_map 2020-09-23 15:39:01 -07:00
Meng Xu
aa683c0d26 FRApplier:Fix applyingDataBytes accounting at exception
When exception is thrown out after txnSize is calculated but before
it is accounted into applyingDataBytes, we will decrease applyingDataBytes in the
error handling block incorrectly.
2020-09-23 15:19:02 -07:00