Evan Tschannen
29eec30183
Merge branch 'release-6.2' into release-6.3
...
# Conflicts:
# CMakeLists.txt
# build/Dockerfile
# build/Dockerfile.devel
# documentation/sphinx/source/downloads.rst
# fdbserver/Knobs.cpp
# fdbserver/LogSystem.h
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WaitFailure.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen
331a49a62a
do not allow a proxy to reset a connection with the logs immediately upon starting up
2020-08-30 18:50:19 -07:00
Evan Tschannen
fd1a4304fa
fix: made ConnectionResetInfo reference counted
2020-08-26 10:53:17 -07:00
Evan Tschannen
8ede143941
Track tlog push latencies and reset connections if they are above 500ms
2020-08-18 08:43:14 -07:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen
717242a0ee
reset WAN network connections every 5 minutes is responses take more than 500ms
2020-07-09 22:50:47 -07:00
Jingyu Zhou
90b40e1d75
Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak
...
Resolve Conflicts:
fdbclient/BackupAgent.actor.h
fdbserver/BackupWorker.actor.cpp
fdbserver/RestoreMaster.actor.cpp
fdbserver/masterserver.actor.cpp
2020-03-23 13:35:33 -07:00
Meng Xu
3f31ebf659
New backup:Revise event name and explain code
2020-03-23 10:55:44 -07:00
Jingyu Zhou
818072f3cb
Set oldest backup epoch if not recruiting backup workers
...
Since tlog is not kept until backup worker has pulled mutations from it, the
old tlogs can only be displaced after oldest backup epoch equals current epoch.
So if master is not recruiting backup workers, it should set the oldest backup
epoch as the current epoch.
2020-03-20 20:16:43 -07:00
Jingyu Zhou
12ed8ad536
Fix backup worker start version when logset start version is lower
...
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
89d8f13038
Fix backup worker start version when logset start version is lower
...
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-18 16:41:35 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Jingyu Zhou
1eaea91cb3
Address review comments
2020-01-22 19:42:13 -08:00
Jingyu Zhou
4ed75e37f3
BackupProgress uses old epoch's begin version if no progress found
...
Get rid of the complex logic of choosing the largest saved version from
previous epoch for the oldest epoch. Instead, use the begin version now
available from log system.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
64052f6349
Check and fill backup gaps for old epochs and tags
...
Sometimes the backup worker has not updated progress to the system space and a
master recovery happens. As a result, next epoch doesn't know the progress of
previous ones. This change is to check for such missing gaps and fill them with
the whole range [startVersion, endVersion).
The code is refactored into BackupProgress.actor.* to consolidate backup
progress processing for the master server.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
0c08161d8e
Remove old backup workers when done
...
For backup workers working on old epochs, once their work is done, they will
notify the master. Then the master removes them from the log system and
acknowledge back to the backup workers so that they can gracefully shut down.
The popping of a backup worker is stalled if there are workers from older
epochs still working. Otherwise, workers from old epochs will lost data.
However, allowing newer epoch to start backup can cause holes in version ranges.
The restore process must verify the backup progress to make sure there are no
holes, otherwise it has to wait.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
73824faf65
Track pseudo tags popping for individual IDs
...
For each log router ID, we track the popped version of each pseudo tag so that
the popping only applied to the minimum of these versions.
Also add more tracing for popping and epochs.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
11964733b7
WIP: should be divided into smaller commits.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
a797958af6
Update peekLogRouter for backup workers to peek
2020-01-22 19:37:48 -08:00
Jingyu Zhou
a4d6ebe79e
Recruit backup worker in newEpoch
2020-01-22 19:37:48 -08:00
Evan Tschannen
afc9713005
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/FDBTypes.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
1c873591be
fixed a compiler error
2019-11-05 18:32:15 -08:00
Evan Tschannen
a8ca47beff
optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag>
2019-11-05 18:07:30 -08:00
Evan Tschannen
457896b80d
remote logs use bufferedCursor when peeking from log routers to improve performance
...
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Jingyu Zhou
cd3f1e33d4
Refactor deserialization of TagsAndMessages
...
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Jingyu Zhou
4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
...
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Evan Tschannen
4c9a392f05
the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped
2019-08-05 17:01:48 -07:00
Meng Xu
c9c50ceff8
Comments:Add comments to DiskQueue
...
No functional change.
2019-08-01 15:20:01 -07:00
Evan Tschannen
653d9be6e2
we cannot pop old generations because it breaks forced recoveries
2019-07-31 18:27:36 -07:00
Evan Tschannen
1ea3ce8f9c
txs pops also go to the old generations of tlogs to reduce the chance we have to restart txnStateStore recovery
2019-07-31 18:06:39 -07:00
Evan Tschannen
7ac7eb82f2
fix: buffered cursor would start multiple bufferedGetMore actors
...
advance all of the cursors to the poppedVersion
2019-07-30 14:42:05 -07:00
Evan Tschannen
9e3ec2cb33
fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data
2019-07-30 13:25:25 -07:00
Evan Tschannen
1d326e3dc8
removed debugging message
2019-07-30 12:42:50 -07:00
Evan Tschannen
5d79e4141f
fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages
2019-07-30 12:38:44 -07:00
Evan Tschannen
45f7b41b48
fix: multi-cursor could discard popped commits after already returning data
2019-07-29 21:36:42 -07:00
Evan Tschannen
5bb322b483
implement popped on bufferedCursor
2019-07-29 21:19:47 -07:00
Evan Tschannen
28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
...
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy
9afd162e2f
remove snap v1 related code
2019-07-25 17:29:31 -07:00
Alex Miller
95487861be
Make sharded txsTag gated on TLogVersion::V4.
...
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller
9396eedd11
Const some random functions that are trivially const.
...
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen
15e894c724
Merge in master
2019-07-05 15:49:24 -07:00
Alex Miller
bf883d7055
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-06-25 14:26:50 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
...
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller
26343f557a
Update getMore() contract.
...
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen
e0be631414
shard the txs tag so that more transaction logs are involved in its recovery
2019-06-19 18:15:09 -07:00
Alex Miller
51fd42a4d2
Merge remote-tracking branch 'upstream/master' into spilled-only-peek
2019-06-18 17:33:52 -07:00
mpilman
8576665a90
Revert "Revert "Make protocol version a type""
...
This reverts commit 455bf3b3ec9d5a347b68bf4fa89bf042f5ac312e.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec
Revert "Make protocol version a type"
2019-06-18 10:59:17 -07:00