This commit is contained in:
晓楚 2021-06-28 11:14:55 +08:00
parent 79a820fd11
commit 254d7e235c
5 changed files with 16 additions and 16 deletions

View File

@ -15,7 +15,7 @@
There is no way to take snapshot. There is no way to record KV Ranges for the complete key space at a given version. For
a keyspace a-z, its not possible to record KV range (a-z, v0), if keyspace a-z is not small enough. Instead, we can record
KV ranges {(a-b, v0), (c-d, v1), (e-f, v2) ... (y-z, v10)}. With mutation log recorded all along, we can still use
the simple backup-restore scheme described above on sub keyspaces seperately. Assuming we did record mutation log from
the simple backup-restore scheme described above on sub keyspaces separately. Assuming we did record mutation log from
v0 to vn, that allows us to restore
* Keyspace a-b to any version between v0 and vn
@ -23,7 +23,7 @@
* Keyspace y-z to any version between v10 and vn
But, we are not interested in restoring sub keyspaces, we want to restore a-z. Well, we can restore a-z, to any
version between v10 and vn by restoring individual sub spaces seperately.
version between v10 and vn by restoring individual sub spaces separately.
#### Key Value Ranges

View File

@ -249,7 +249,7 @@ This protocol was implemented after another abandoned protocol: the `startedBack
**Pause and Resume Backups**
The command line for pause or resume backups remains the same, but the implementation for the new backup system is different from the old one. This is because in the old backup system, both mutation logs and range logs are handled by `TaskBucket`, an asynchronous task scheduling framework that stores states in the FDB database. Thus, the old backup system simply pauses or resumes the `TaskBucket`. In the new backup system, mutation logs are generated by backup workers, thus the pause or resume command needs to tell all backup workers to pause or resume pulling mutations from TLogs. Specifically,
1. The operator issues a pause or resume request that upates both the `TaskBucket` and `\xff\x02/backupPaused` key.
1. The operator issues a pause or resume request that updates both the `TaskBucket` and `\xff\x02/backupPaused` key.
2. Each backup worker monitors the `\xff\x02/backupPaused` key and notices the change. Then the backup worker pauses or resumes pulling from TLogs.
**Backup Container Changes**

View File

@ -39,7 +39,7 @@ Once coordinators think there is no CC in a cluster, they will start leader elec
Although only one CC can succeed in recovery, which is guaranteed by Paxos algorithm, there exist scenarios when multiple CCs can exist in a transient time period.
Scenario 1: A majority of coordinators reboot at the same time and the current running CC is still alive. When those coordinators reboot, they may likely choose a different process as CC. The new CC will start to recruit a new master and kicks off the recovery. The old CC will know the existance of the new CC when it sends heart-beat to coordinators periodically (in sub-seconds). The old CC will kill itself, once it was told by a majority of coordinators about the existance of the new CC. Old roles (say master) will commit suicide as well after the old CC dies. This prevents the cluster to have two sets of transaction systems. In summary, the cluster may have both the old CC and new CC alive in sub-seconds before the old CC confirms the existance of the new CC.
Scenario 1: A majority of coordinators reboot at the same time and the current running CC is still alive. When those coordinators reboot, they may likely choose a different process as CC. The new CC will start to recruit a new master and kicks off the recovery. The old CC will know the existence of the new CC when it sends heart-beat to coordinators periodically (in sub-seconds). The old CC will kill itself, once it was told by a majority of coordinators about the existence of the new CC. Old roles (say master) will commit suicide as well after the old CC dies. This prevents the cluster to have two sets of transaction systems. In summary, the cluster may have both the old CC and new CC alive in sub-seconds before the old CC confirms the existence of the new CC.
Scenario 2: Network partition makes the current running CC unable to connect to a majority of coordinators. Before the CC detects it, the coordinators can elect a new CC and recovery will happen. Typically, the old CC can quickly realize it cannot connect to a majority of coordinators and kill itself. In the rare situation when the old CC does not die within a short time period *and* the network partition is resolved before the old CC dies, the new CC can recruit a new master, which leads to two masters in the cluster. Only one master can succeed the recovery because only one master can lock the cstate (see Phase 2: LOCKING_CSTATE).
@ -151,7 +151,7 @@ Not every FDB role participates in the recovery phases 1-3. This phase tells the
Storage servers (SSes) are not involved in the recovery phase 1 - 3. To notify SSes about the recovery, the master commits a recovery transaction, the first transaction in the new generation, which contains the txnStateStore information. Once storage servers receive the recovery transaction, it will compare its latest data version and the recovery version, and rollback to the recovery version if its data version is newer. Note that storage servers may have newer data than the recovery version because they pre-fetch mutations from tLogs before the mutations are durable to reduce the latency to read newly written data.
Commit proxies havent recovered the transaction system state and cannot accept transactions yet. The master recovers proxies states by sending the txnStateStore to commit proxies through commit proxies (`txnState`) interfaces in `sendIntialCommitToResolvers()` function. Once commit proxies have recovered their states, they can start processing transactions. The recovery transaction that was waiting on commit proxies will be processed.
Commit proxies havent recovered the transaction system state and cannot accept transactions yet. The master recovers proxies states by sending the txnStateStore to commit proxies through commit proxies (`txnState`) interfaces in `sendInitialCommitToResolvers()` function. Once commit proxies have recovered their states, they can start processing transactions. The recovery transaction that was waiting on commit proxies will be processed.
The resolvers havent known the recovery version either. The master needs to send the lastEpochEnd version (i.e., last commit of the previous generation) to resolvers via resolvers (`resolve`) interface.

View File

@ -37,7 +37,7 @@ To make commits available to storage servers efficiently, a transaction log
maintains a copy of the commit in-memory, and maintains one queue per tag that
indexes the location of each mutation in each commit with the specific tag,
sequentially. This way, responding to a peek from a storage server only
requires sequentailly walking through the queue, and copying each mutation
requires sequentially walking through the queue, and copying each mutation
referenced into the response buffer.
Transaction logs internally handle commits via performing two operations
@ -366,9 +366,9 @@ from the returned data. Therefore, the length of the result will not be the
same as `end-start`, intentionally. For this reason, the API is `(start, end)`
and not `(start, length)`.
Spilled data, when using spill-by-value, was resistent to bitrot via data being
Spilled data, when using spill-by-value, was resistant to bitrot via data being
checksummed interally within SQLite's B-tree. Now that reads can be done
directly, the responsibility for verifing data integrity falls upon the
directly, the responsibility for verifying data integrity falls upon the
DiskQueue. `CheckHashes::YES` will cause the DiskQueue to use the checksum in
each DiskQueue page to verify data integrity. If an externally maintained
checksums exists to verify the returned data, then `CheckHashes::NO` can be
@ -467,7 +467,7 @@ recovery.
In spill-by-value, the DiskQueue only ever contained commits that were also
held in memory, and thus recovery would need to read up to 1.5GB of data. With
spill-by-reference, the DiskQueue could theoretically contain terrabytes of
spill-by-reference, the DiskQueue could theoretically contain terabytes of
data. To keep recovery times boundedly low, FDB must still only read the
commits that need to be loaded back into memory.
@ -560,7 +560,7 @@ minor impacts on recovery times:
1. Larger disk queue file means more file to zero out in the case of recovery.
This should be negligable when fallocate `ZERO_RANGE` is available, because then it's only a metadata operation.
This should be negligible when fallocate `ZERO_RANGE` is available, because then it's only a metadata operation.
2. A larger file means more bisection iterations to find the first page.

View File

@ -16448,7 +16448,7 @@ static int os2Delete(
}
/*
** Check the existance and status of a file.
** Check the existence and status of a file.
*/
static int os2Access(
sqlite3_vfs *pVfs, /* Not used on os2 */
@ -18731,7 +18731,7 @@ static int nolockClose(sqlite3_file *id) {
/******************************************************************************
************************* Begin dot-file Locking ******************************
**
** The dotfile locking implementation uses the existance of separate lock
** The dotfile locking implementation uses the existence of separate lock
** files in order to control access to the database. This works on just
** about every filesystem imaginable. But there are serious downsides:
**
@ -18746,7 +18746,7 @@ static int nolockClose(sqlite3_file *id) {
**
** Dotfile locking works by creating a file in the same directory as the
** database and with the same name but with a ".lock" extension added.
** The existance of a lock file implies an EXCLUSIVE lock. All other lock
** The existence of a lock file implies an EXCLUSIVE lock. All other lock
** types (SHARED, RESERVED, PENDING) are mapped into EXCLUSIVE.
*/
@ -22040,7 +22040,7 @@ static int unixDelete(
}
/*
** Test the existance of or access permissions of file zPath. The
** Test the existence of or access permissions of file zPath. The
** test performed depends on the value of flags:
**
** SQLITE_ACCESS_EXISTS: Return 1 if the file exists
@ -26132,7 +26132,7 @@ static int winDelete(
}
/*
** Check the existance and status of a file.
** Check the existence and status of a file.
*/
static int winAccess(
sqlite3_vfs *pVfs, /* Not used on win32 */
@ -26681,7 +26681,7 @@ SQLITE_API int sqlite3_os_end(void){
/*
** A bitmap is an instance of the following structure.
**
** This bitmap records the existance of zero or more bits
** This bitmap records the existence of zero or more bits
** with values between 1 and iSize, inclusive.
**
** There are three possible representations of the bitmap.