156 Commits

Author SHA1 Message Date
Evan Tschannen
33c9b1374a more compile fixes 2020-07-09 22:57:43 -07:00
Evan Tschannen
f6163d0a79 fix compile errors 2020-07-09 22:53:02 -07:00
Evan Tschannen
717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
sfc-gh-ngoyal
693d9e8b89
Merge branch 'master' into fdb_cache_wo_allocator 2020-06-09 15:09:58 -07:00
Alex Miller
ccaac162e2 Resolve performance concerns of nearly-no-op debugMutation being frequently called
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent.  This way, they get entirely compiled out unless
enabled.

Then rewrite all debugMutation uses via sed.
2020-05-13 18:44:15 -07:00
Alex Miller
122762cce1 Add debugMessagesAndTags, and track mutations in more places.
Like:
* Leaving the proxy
* Entering the TLog
* Leaving the TLog
* Being read on a cursor

All of this brought to you by TagsAndMessage!

This also slides in a minor optimization as to how mutations are serialized per target log.
2020-03-27 03:31:04 -07:00
negoyal
acaf91ac47 Merge branch 'master' into fdb_cache_subfeature2 2020-03-26 13:33:08 -07:00
negoyal
8abac91033 Fixed a bug in cache server while peeking at a version lower than popped version and added some logging. 2020-03-26 12:39:07 -07:00
Meng Xu
bd345f85db ConsistencyCheck:Fix failue due to address inconsistency between process and worker
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.

In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.

The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.

This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
Evan Tschannen
303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1076abdee5 fixed crash when interf was not created 2020-03-05 19:09:08 -08:00
Evan Tschannen
1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Evan Tschannen
96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
cf4efca852 fix: buffered cursor should always make sure all of the sub-cursors are completely exhausted before calculating minVersion. It is not legal to advance a cursor version past an epochEnd (+100 million versions) without also returning the epochEnd mutation, or the storage servers might not be able to rollback far enough because the end of the previous epoch will be made durable 2020-02-19 15:24:32 -08:00
Alex Miller
7798456201 Make TLogs have consistent parallel peek behavior.
TLogServer and LogRouter had some leftover code from me trying to be
more "correct" about parallel peek semantics, but those changes weren't
reflected in the OldTLog* files.  I've reverted the changes, as
realistically, they are more likely to waste CPU than improve TLog behavior.
2020-01-21 18:23:16 -08:00
Alex Miller
858e4e5900 Move the check to a better location.
This way, we avoid some ID randomness, and also avoid the potential for
resetting the randomID and sequence without clearing out the future
vector.
2020-01-21 17:08:42 -08:00
Alex Miller
1cb311fcb8 Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
Alex Miller
0662f8dba0 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2020-01-21 17:07:37 -08:00
Evan Tschannen
afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
dbc5a2393c combineMessages still did not serialize tags correctly 2019-11-05 18:44:30 -08:00
Evan Tschannen
1c873591be fixed a compiler error 2019-11-05 18:32:15 -08:00
Evan Tschannen
86560fe727 fix: tempTags was not used correctly 2019-11-05 18:22:25 -08:00
Evan Tschannen
a8ca47beff optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag> 2019-11-05 18:07:30 -08:00
Evan Tschannen
daac8a2c22 Knobified a few variables 2019-11-04 20:21:38 -08:00
Evan Tschannen
457896b80d remote logs use bufferedCursor when peeking from log routers to improve performance
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Evan Tschannen
3325980c03 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.actor.h
#	fdbserver/worker.actor.cpp
#	versions.target
2019-10-24 17:38:15 -07:00
Evan Tschannen
a7492aab0a fix: poppedVersion can update during a yield, so all work must be done immediately after getMore returns 2019-10-23 23:06:02 -07:00
Alex Miller
1e5b8c74e3 Continuing a parallel peek after a timeout would hang.
This is to guard against the case where

1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId

The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.

Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Alex Miller
c008e7f8b3 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2019-10-22 19:10:58 -07:00
Evan Tschannen
b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
Alex Miller
324289039a When reloading one cursor in a merge cursor, top off the other cursors as well. 2019-09-12 16:22:28 -07:00
Jingyu Zhou
2723922f5f Replace -1 as VERSION_HEADER constant for serialization 2019-09-05 12:45:39 -07:00
Jingyu Zhou
f9357c5ad8 Fix side effect of ArenaReader
ServerPeekCursor::nextMessage() should only consume the message header, because
the reader() directly inherits the current position. The previous commit
changes the positon to the begining of the next message, which breaks storage
server code.
2019-09-05 11:07:07 -07:00
Jingyu Zhou
cd3f1e33d4 Refactor deserialization of TagsAndMessages
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Evan Tschannen
b0480edd15 fix: messageVersion could be larger than poppedVersion, and we will discard messages that are needed 2019-08-06 16:31:05 -07:00
Evan Tschannen
7ac7eb82f2 fix: buffered cursor would start multiple bufferedGetMore actors
advance all of the cursors to the poppedVersion
2019-07-30 14:42:05 -07:00
Evan Tschannen
b5cb7919b6 fix: canDiscardPopped was not reset when necessary in all cases 2019-07-30 13:44:44 -07:00
Evan Tschannen
5d79e4141f fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages 2019-07-30 12:38:44 -07:00
Evan Tschannen
6977e7d2e8 do not return recovered version as popped for txsTags because it could cause recovery to start over
optimized how buffered peek cursor discards popped data
2019-07-30 12:21:48 -07:00
Evan Tschannen
7a932479dd throw away state if we ever read popped data from the disk queue adapter 2019-07-30 10:14:39 -07:00
Evan Tschannen
45f7b41b48 fix: multi-cursor could discard popped commits after already returning data 2019-07-29 21:36:42 -07:00
Evan Tschannen
5bb322b483 implement popped on bufferedCursor 2019-07-29 21:19:47 -07:00
Evan Tschannen
d8948c8be1 Merge branch 'master' into feature-fast-txs-recovery
# Conflicts:
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen
b27a909f3a fix: onDisconnectOrFailure can spuriously trigger 2019-07-09 16:38:59 -07:00
Evan Tschannen
15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Evan Tschannen
cfce1e1705 fix: buffered peek cursor would advance very slowly through large ranges of empty versions 2019-06-28 15:54:08 -07:00
Alex Miller
bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Evan Tschannen
e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00