136 Commits

Author SHA1 Message Date
Evan Tschannen
29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen
331a49a62a do not allow a proxy to reset a connection with the logs immediately upon starting up 2020-08-30 18:50:19 -07:00
Evan Tschannen
fd1a4304fa fix: made ConnectionResetInfo reference counted 2020-08-26 10:53:17 -07:00
Evan Tschannen
8ede143941 Track tlog push latencies and reset connections if they are above 500ms 2020-08-18 08:43:14 -07:00
A.J. Beamon
b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
Jingyu Zhou
90b40e1d75 Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak
Resolve Conflicts:
	fdbclient/BackupAgent.actor.h
	fdbserver/BackupWorker.actor.cpp
	fdbserver/RestoreMaster.actor.cpp
	fdbserver/masterserver.actor.cpp
2020-03-23 13:35:33 -07:00
Meng Xu
3f31ebf659 New backup:Revise event name and explain code 2020-03-23 10:55:44 -07:00
Jingyu Zhou
818072f3cb Set oldest backup epoch if not recruiting backup workers
Since tlog is not kept until backup worker has pulled mutations from it, the
old tlogs can only be displaced after oldest backup epoch equals current epoch.
So if master is not recruiting backup workers, it should set the oldest backup
epoch as the current epoch.
2020-03-20 20:16:43 -07:00
Jingyu Zhou
12ed8ad536 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
89d8f13038 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-18 16:41:35 -07:00
Evan Tschannen
303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Jingyu Zhou
1eaea91cb3 Address review comments 2020-01-22 19:42:13 -08:00
Jingyu Zhou
4ed75e37f3 BackupProgress uses old epoch's begin version if no progress found
Get rid of the complex logic of choosing the largest saved version from
previous epoch for the oldest epoch. Instead, use the begin version now
available from log system.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
64052f6349 Check and fill backup gaps for old epochs and tags
Sometimes the backup worker has not updated progress to the system space and a
master recovery happens. As a result, next epoch doesn't know the progress of
previous ones. This change is to check for such missing gaps and fill them with
the whole range [startVersion, endVersion).

The code is refactored into BackupProgress.actor.* to consolidate backup
progress processing for the master server.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
0c08161d8e Remove old backup workers when done
For backup workers working on old epochs, once their work is done, they will
notify the master. Then the master removes them from the log system and
acknowledge back to the backup workers so that they can gracefully shut down.

The popping of a backup worker is stalled if there are workers from older
epochs still working. Otherwise, workers from old epochs will lost data.

However, allowing newer epoch to start backup can cause holes in version ranges.
The restore process must verify the backup progress to make sure there are no
holes, otherwise it has to wait.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
73824faf65 Track pseudo tags popping for individual IDs
For each log router ID, we track the popped version of each pseudo tag so that
the popping only applied to the minimum of these versions.

Also add more tracing for popping and epochs.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
11964733b7 WIP: should be divided into smaller commits. 2020-01-22 19:38:45 -08:00
Jingyu Zhou
a797958af6 Update peekLogRouter for backup workers to peek 2020-01-22 19:37:48 -08:00
Jingyu Zhou
a4d6ebe79e Recruit backup worker in newEpoch 2020-01-22 19:37:48 -08:00
Evan Tschannen
afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
1c873591be fixed a compiler error 2019-11-05 18:32:15 -08:00
Evan Tschannen
a8ca47beff optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag> 2019-11-05 18:07:30 -08:00
Evan Tschannen
457896b80d remote logs use bufferedCursor when peeking from log routers to improve performance
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Jingyu Zhou
cd3f1e33d4 Refactor deserialization of TagsAndMessages
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Jingyu Zhou
4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Evan Tschannen
4c9a392f05 the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped 2019-08-05 17:01:48 -07:00
Meng Xu
c9c50ceff8 Comments:Add comments to DiskQueue
No functional change.
2019-08-01 15:20:01 -07:00
Evan Tschannen
653d9be6e2 we cannot pop old generations because it breaks forced recoveries 2019-07-31 18:27:36 -07:00
Evan Tschannen
1ea3ce8f9c txs pops also go to the old generations of tlogs to reduce the chance we have to restart txnStateStore recovery 2019-07-31 18:06:39 -07:00
Evan Tschannen
7ac7eb82f2 fix: buffered cursor would start multiple bufferedGetMore actors
advance all of the cursors to the poppedVersion
2019-07-30 14:42:05 -07:00
Evan Tschannen
9e3ec2cb33 fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data 2019-07-30 13:25:25 -07:00
Evan Tschannen
1d326e3dc8 removed debugging message 2019-07-30 12:42:50 -07:00
Evan Tschannen
5d79e4141f fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages 2019-07-30 12:38:44 -07:00
Evan Tschannen
45f7b41b48 fix: multi-cursor could discard popped commits after already returning data 2019-07-29 21:36:42 -07:00
Evan Tschannen
5bb322b483 implement popped on bufferedCursor 2019-07-29 21:19:47 -07:00
Evan Tschannen
28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy
9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
Alex Miller
95487861be Make sharded txsTag gated on TLogVersion::V4.
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller
9396eedd11 Const some random functions that are trivially const.
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen
15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Alex Miller
bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller
26343f557a Update getMore() contract.
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen
e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
Alex Miller
51fd42a4d2 Merge remote-tracking branch 'upstream/master' into spilled-only-peek 2019-06-18 17:33:52 -07:00
mpilman
8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec9d5a347b68bf4fa89bf042f5ac312e.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00