134 Commits

Author SHA1 Message Date
Xiaoxi Wang
1c4bce17aa revert code refactor 2021-07-30 19:08:22 -07:00
Xiaoxi Wang
12d4f5c261 disable streaming peek for localities < 0 2021-07-28 14:11:25 -07:00
Xiaoxi Wang
c6b0de1264 problem: OOM 2021-07-26 09:36:53 -07:00
Xiaoxi Wang
cd32478b52 memory error(Simple config) 2021-07-22 15:45:59 -07:00
Xiaoxi Wang
5046ee3b07 add stream peek to logRouter 2021-07-20 17:42:00 +00:00
Xiaoxi Wang
f3667ce91a more debug logs; let tryEstablishStream wait until the connection is good 2021-07-19 18:43:51 +00:00
Xiaoxi Wang
227570357a trace log and reset changes; byteAcknownledge overflow 2021-07-15 21:30:14 +00:00
Xiaoxi Wang
066d534194 trivial changes 2021-07-14 16:19:23 +00:00
Xiaoxi Wang
6d1c12899d catch exceptions 2021-07-09 22:46:16 +00:00
Xiaoxi Wang
5a43a8c367 add returnIfBlocked in stream request 2021-07-08 19:32:58 +00:00
Xiaoxi Wang
15347773d9 fix double destruction memory bug 2021-07-07 22:55:49 +00:00
Xiaoxi Wang
b6d5c8a091 implement tLogPeekStream 2021-07-06 23:14:58 +00:00
Xiaoxi Wang
b50fda6b4b add simple streaming peek functions 2021-07-01 23:17:28 +00:00
Sreenath Bodagala
6275adc5a0 Address build failure
LogSystemPeekCursor.actor.cpp:
Check if "interf" is set before referencing it.
2021-05-13 21:38:07 +00:00
Sreenath Bodagala
b0554b4554 Capture how fast an SS is catching up to its tLog-SS lag
Changes:
LogSystem.h, LogSystemPeekCursor.actor.cpp:
Add APIs to find the ID of the tLog from which an SS has fetched the latest
set of versions.

storageserver.actor.cpp:
Capture the number of latest set of versions fetched, the time (in seconds)
in which those versions were fetched, and the tLog from which they were
fetched. Add this information to a TraceLogEvent.

Capture how many versions an SS has fetched in the
2021-05-11 20:03:21 +00:00
FDB Formatster
df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
sfc-gh-tclinkenbeard
5020e3faa1 Make ILogSystem::IPeekCursor const-correct 2020-12-08 09:09:31 -08:00
David Youngworth
d64cf8b9e3 Merge branch 6.3 into master 2020-11-17 11:22:45 -08:00
David Youngworth
d0391db862 Merge branch 'release-6.2' into release-6.3 2020-11-16 10:15:23 -08:00
Markus Pilman
bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard
4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00
Vishesh Yadav
7b28de8a41 Add IDs to ConnectionReset TraceEvents 2020-11-04 14:06:49 -08:00
Vishesh Yadav
22b16302c3 Make ConnectionReset logs easier to query #3977
All TraceLogs that are related to ConnectionReset should be prefixed with
ConnectionReset. This should make it easy to query and aggregate by address and
role.
2020-11-02 15:10:51 -08:00
Evan Tschannen
12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen
29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen
507c67c930 Added additional information to trace events 2020-08-26 11:42:23 -07:00
Meng Xu
ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
A.J. Beamon
b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
33c9b1374a more compile fixes 2020-07-09 22:57:43 -07:00
Evan Tschannen
f6163d0a79 fix compile errors 2020-07-09 22:53:02 -07:00
Evan Tschannen
717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
sfc-gh-ngoyal
693d9e8b89
Merge branch 'master' into fdb_cache_wo_allocator 2020-06-09 15:09:58 -07:00
Alex Miller
ccaac162e2 Resolve performance concerns of nearly-no-op debugMutation being frequently called
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent.  This way, they get entirely compiled out unless
enabled.

Then rewrite all debugMutation uses via sed.
2020-05-13 18:44:15 -07:00
Alex Miller
122762cce1 Add debugMessagesAndTags, and track mutations in more places.
Like:
* Leaving the proxy
* Entering the TLog
* Leaving the TLog
* Being read on a cursor

All of this brought to you by TagsAndMessage!

This also slides in a minor optimization as to how mutations are serialized per target log.
2020-03-27 03:31:04 -07:00
negoyal
acaf91ac47 Merge branch 'master' into fdb_cache_subfeature2 2020-03-26 13:33:08 -07:00
negoyal
8abac91033 Fixed a bug in cache server while peeking at a version lower than popped version and added some logging. 2020-03-26 12:39:07 -07:00
Meng Xu
bd345f85db ConsistencyCheck:Fix failue due to address inconsistency between process and worker
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.

In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.

The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.

This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
Evan Tschannen
303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1076abdee5 fixed crash when interf was not created 2020-03-05 19:09:08 -08:00
Evan Tschannen
1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Evan Tschannen
96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
cf4efca852 fix: buffered cursor should always make sure all of the sub-cursors are completely exhausted before calculating minVersion. It is not legal to advance a cursor version past an epochEnd (+100 million versions) without also returning the epochEnd mutation, or the storage servers might not be able to rollback far enough because the end of the previous epoch will be made durable 2020-02-19 15:24:32 -08:00
Alex Miller
7798456201 Make TLogs have consistent parallel peek behavior.
TLogServer and LogRouter had some leftover code from me trying to be
more "correct" about parallel peek semantics, but those changes weren't
reflected in the OldTLog* files.  I've reverted the changes, as
realistically, they are more likely to waste CPU than improve TLog behavior.
2020-01-21 18:23:16 -08:00
Alex Miller
858e4e5900 Move the check to a better location.
This way, we avoid some ID randomness, and also avoid the potential for
resetting the randomID and sequence without clearing out the future
vector.
2020-01-21 17:08:42 -08:00
Alex Miller
1cb311fcb8 Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
Alex Miller
0662f8dba0 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2020-01-21 17:07:37 -08:00
Evan Tschannen
afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
dbc5a2393c combineMessages still did not serialize tags correctly 2019-11-05 18:44:30 -08:00
Evan Tschannen
1c873591be fixed a compiler error 2019-11-05 18:32:15 -08:00
Evan Tschannen
86560fe727 fix: tempTags was not used correctly 2019-11-05 18:22:25 -08:00