Jingyu Zhou
bfd3328448
Fix a race between submit and abort backup
...
After submit a backup, immediately abort the backup may cause a rare race
condition, which results in BackupCorrectnessLeftoverVersionKey error.
Specifically, in the StartFullBackupTaskFunc:
1st Txn sets the destUid at the source database and the 2nd Txn writes the dest
DB.
An abort can come after the 1st Txn succeeds, and clears the config rage so
that the 2nd Txn above would fail. Because 2nd Txn didn't write destUid, the
3rd Txn of abort can't read the correct source DB for latestVersionKey, which
contains the destUid value.
The fix is to let the 1st Txn of abort to wait until destUid becomes valid.
2020-10-18 23:11:15 -07:00
Andrew Noyes
70c1ac2131
Use TraceEvent::error
2020-10-16 16:55:09 -07:00
Xin Dong
8d0aa02a63
Do not periodically print detailed DD teams info
2020-10-16 16:11:14 -07:00
Andrew Noyes
9dec4bc46a
Add ErrorCode to StorageServerTrackerCancelled trace event
2020-10-16 15:40:14 -07:00
Jingyu Zhou
8f17a1a5d6
Merge branch 'release-6.2' into release-6.3
2020-10-16 15:25:39 -07:00
Andrew Noyes
30488df5ea
Fix build
2020-10-16 14:26:40 -07:00
Andrew Noyes
81193a9226
Move TraceEvent upward
2020-10-16 14:23:27 -07:00
Andrew Noyes
1e0e800751
Fix build
2020-10-16 12:10:07 -07:00
Andrew Noyes
2b87627d1b
Check for cancellation after errorOut.sendError(e)
2020-10-16 12:10:07 -07:00
Xin Dong
92e31dd338
Address review comments
2020-10-15 15:25:00 -07:00
Andrew Noyes
15dbfc0bc4
Merge pull request #3908 from sfc-gh-clin/fix-issue-3905
...
Fix issue #3905
2020-10-15 14:46:32 -07:00
Chaoguang Lin
3109aa7221
(Previous commit did not catch the change)Increase the probability to generate keys after \xff\xff to test special key framework code
2020-10-15 14:11:04 -07:00
Chaoguang Lin
c145dd5824
Increase the probability to generate keys after \xff\xff to test special key framework code
2020-10-15 14:09:27 -07:00
Xin Dong
1d43729cc9
Added a way to print detailed information about team collection for debugging.
2020-10-15 10:01:56 -07:00
A.J. Beamon
b644969788
Add error checking for a getReadVersion call in a test.
2020-10-15 09:17:16 -07:00
Steve Atherton
0b46af2925
Added a simulation-only check for pager remap cleanup writing the same page twice in a cleanup cycle, which should never happen, but a bug leading to this was fixed recently. Adjusted some buggify logic to widen edge case coverage around remap cleanup parameters.
2020-10-14 21:48:03 -07:00
Chaoguang Lin
bf00369576
getRange only enters special key space codepath when both begin key and end key are in (\xff\xff, \xff\xff\xff)
2020-10-14 16:57:38 -07:00
Daniel Smith
a99b68a7e3
Add knob for tuning number of RocksDB read threads
2020-10-14 22:07:46 +00:00
Steve Atherton
dc35f2b4f5
Bug fix: In page remap cleanup, if a page update's next update was at exactly the oldest retained version then the earlier update would still choose to copy the updated page over top of the original (but it shouldn't) which would race with the later update's copy if the same remap cleanup cycle pops both updates from the queue.
2020-10-13 02:26:46 -07:00
Meng Xu
89469921bb
Merge pull request #3891 from etschannen/feature-reset-proxy-connections
...
Reset a proxy's network connection with the master or resolvers if it is too far behind
2020-10-12 11:21:24 -07:00
Evan Tschannen
1378ecba4d
If a proxy is sufficiently far behind, reset network connections to attempt to fix the problem
2020-10-11 23:06:26 -07:00
A.J. Beamon
3b66a1f2d4
Fix a couple places where we were creating vectors with default elements rather than reserving space.
2020-10-09 10:51:06 -07:00
Daniel Smith
2671157f8f
Merge branch 'rocksdb-data-estimate' into rocksdb-unsafe-fsync
2020-10-09 16:56:54 +00:00
Daniel Smith
6e287eb0d1
Merge remote-tracking branch 'upstream/release-6.3' into rocksdb-unsafe-fsync
2020-10-09 16:53:05 +00:00
Daniel Smith
a9301f78da
Merge remote-tracking branch 'upstream/release-6.3' into rocksdb-data-estimate
2020-10-08 23:01:04 +00:00
Russell Sears
7543b1efb3
Merge branch 'release-6.3' into rocksdb-lz4
2020-10-06 16:59:34 -07:00
Daniel Smith
4c89e38a29
Fix static linking of lz4
2020-10-06 18:22:03 +00:00
Evan Tschannen
efe50b68e6
fix compile error
2020-10-05 14:16:52 -07:00
Evan Tschannen
7ba06a4434
fix: min and max compute estimate logging on the proxy was always zero
...
added comments and fixed formatting
2020-10-05 12:35:10 -07:00
Evan Tschannen
5807b1ec3d
changed the recent requests to be the per second amount; increased precision of cpu estimate
2020-10-04 19:31:40 -07:00
Evan Tschannen
f546034366
do not prevent computePerOperation from being updated for small computeDurations. Added logging for the compute per operation. Protect against erroneously large compute estimates
2020-10-04 19:19:05 -07:00
Evan Tschannen
da26b0411c
increased the proxy commit memory limit
2020-10-04 19:16:51 -07:00
Evan Tschannen
52a6496a54
fix compiler errors
2020-10-04 16:50:54 -07:00
Evan Tschannen
614c8bc895
Get read versions requests must be load balanced on the number of requests because ratekeeper gives out an equal budget to each proxy
2020-10-04 16:20:24 -07:00
sfc-gh-tclinkenbeard
91a8367acb
Avoid slow task in ~DataDistributionTracker
2020-10-01 11:44:55 -07:00
Evan Tschannen
b1180f8eb4
fixed naming and comments
2020-09-30 20:35:09 -07:00
Evan Tschannen
b1570c740f
extraTlogEligileZones should consider the database available both during a failover and also if the cluster cannot recruit tlogs in the remote region
2020-09-30 18:10:04 -07:00
Evan Tschannen
8c729ca8e6
only add additional fault tolerance for availability if automatic failover is enabled
2020-09-30 18:04:23 -07:00
Evan Tschannen
9f61039858
more fixes
2020-09-30 16:52:58 -07:00
Evan Tschannen
d7454ac7da
fixed compile error
2020-09-30 16:49:36 -07:00
Evan Tschannen
2a279f64af
Merge branch 'release-6.3' into feature-fix-fault-tolerance
2020-09-30 16:42:18 -07:00
Evan Tschannen
fe5c30e778
fault tolerance was not being properly increased when usable regions was 2 and satellites are configured.
2020-09-30 16:41:00 -07:00
Meng Xu
3aa92286aa
FastRestore:Fix segmentation fault
2020-09-29 22:28:52 -07:00
Trevor Clinkenbeard
c613fc6dee
Merge pull request #3761 from sfc-gh-tclinkenbeard/document-watchbytes-overhead
...
Add comments for WATCH_OVERHEAD_BYTES
2020-09-26 20:39:27 -07:00
Steve Atherton
58e043c7a5
Enable run loop profiler for test and multitest roles.
2020-09-24 14:14:55 -07:00
Xin Dong
de5b0abb92
Merge pull request #3806 from xumengpanda/mengxu/fix-typo-PR
...
Fast Restore: Fix a typo in FastRestoreApplerPhaseApplyTxnStart event name
2020-09-23 17:11:59 -07:00
Meng Xu
5214becaa8
FR:Fix typo for event FastRestoreApplerPhaseApplyTxnDone
2020-09-23 16:43:35 -07:00
Xin Dong
feb3bda79e
Merge pull request #3797 from xumengpanda/mengxu/fr-write-traffic-control-PR
...
Fast Restore: Add write rate control
2020-09-23 15:50:08 -07:00
Meng Xu
262307d557
FR:Change applierRemainMB map to unordered_map
2020-09-23 15:39:01 -07:00
Meng Xu
aa683c0d26
FRApplier:Fix applyingDataBytes accounting at exception
...
When exception is thrown out after txnSize is calculated but before
it is accounted into applyingDataBytes, we will decrease applyingDataBytes in the
error handling block incorrectly.
2020-09-23 15:19:02 -07:00