mirror of
https://github.com/apple/foundationdb.git
synced 2025-04-20 02:10:47 +08:00
fix typo
This commit is contained in:
parent
79a820fd11
commit
254d7e235c
@ -15,7 +15,7 @@
|
||||
There is no way to take snapshot. There is no way to record KV Ranges for the complete key space at a given version. For
|
||||
a keyspace a-z, its not possible to record KV range (a-z, v0), if keyspace a-z is not small enough. Instead, we can record
|
||||
KV ranges {(a-b, v0), (c-d, v1), (e-f, v2) ... (y-z, v10)}. With mutation log recorded all along, we can still use
|
||||
the simple backup-restore scheme described above on sub keyspaces seperately. Assuming we did record mutation log from
|
||||
the simple backup-restore scheme described above on sub keyspaces separately. Assuming we did record mutation log from
|
||||
v0 to vn, that allows us to restore
|
||||
|
||||
* Keyspace a-b to any version between v0 and vn
|
||||
@ -23,7 +23,7 @@
|
||||
* Keyspace y-z to any version between v10 and vn
|
||||
|
||||
But, we are not interested in restoring sub keyspaces, we want to restore a-z. Well, we can restore a-z, to any
|
||||
version between v10 and vn by restoring individual sub spaces seperately.
|
||||
version between v10 and vn by restoring individual sub spaces separately.
|
||||
|
||||
#### Key Value Ranges
|
||||
|
||||
|
@ -249,7 +249,7 @@ This protocol was implemented after another abandoned protocol: the `startedBack
|
||||
**Pause and Resume Backups**
|
||||
The command line for pause or resume backups remains the same, but the implementation for the new backup system is different from the old one. This is because in the old backup system, both mutation logs and range logs are handled by `TaskBucket`, an asynchronous task scheduling framework that stores states in the FDB database. Thus, the old backup system simply pauses or resumes the `TaskBucket`. In the new backup system, mutation logs are generated by backup workers, thus the pause or resume command needs to tell all backup workers to pause or resume pulling mutations from TLogs. Specifically,
|
||||
|
||||
1. The operator issues a pause or resume request that upates both the `TaskBucket` and `\xff\x02/backupPaused` key.
|
||||
1. The operator issues a pause or resume request that updates both the `TaskBucket` and `\xff\x02/backupPaused` key.
|
||||
2. Each backup worker monitors the `\xff\x02/backupPaused` key and notices the change. Then the backup worker pauses or resumes pulling from TLogs.
|
||||
|
||||
**Backup Container Changes**
|
||||
|
@ -39,7 +39,7 @@ Once coordinators think there is no CC in a cluster, they will start leader elec
|
||||
|
||||
Although only one CC can succeed in recovery, which is guaranteed by Paxos algorithm, there exist scenarios when multiple CCs can exist in a transient time period.
|
||||
|
||||
Scenario 1: A majority of coordinators reboot at the same time and the current running CC is still alive. When those coordinators reboot, they may likely choose a different process as CC. The new CC will start to recruit a new master and kicks off the recovery. The old CC will know the existance of the new CC when it sends heart-beat to coordinators periodically (in sub-seconds). The old CC will kill itself, once it was told by a majority of coordinators about the existance of the new CC. Old roles (say master) will commit suicide as well after the old CC dies. This prevents the cluster to have two sets of transaction systems. In summary, the cluster may have both the old CC and new CC alive in sub-seconds before the old CC confirms the existance of the new CC.
|
||||
Scenario 1: A majority of coordinators reboot at the same time and the current running CC is still alive. When those coordinators reboot, they may likely choose a different process as CC. The new CC will start to recruit a new master and kicks off the recovery. The old CC will know the existence of the new CC when it sends heart-beat to coordinators periodically (in sub-seconds). The old CC will kill itself, once it was told by a majority of coordinators about the existence of the new CC. Old roles (say master) will commit suicide as well after the old CC dies. This prevents the cluster to have two sets of transaction systems. In summary, the cluster may have both the old CC and new CC alive in sub-seconds before the old CC confirms the existence of the new CC.
|
||||
|
||||
Scenario 2: Network partition makes the current running CC unable to connect to a majority of coordinators. Before the CC detects it, the coordinators can elect a new CC and recovery will happen. Typically, the old CC can quickly realize it cannot connect to a majority of coordinators and kill itself. In the rare situation when the old CC does not die within a short time period *and* the network partition is resolved before the old CC dies, the new CC can recruit a new master, which leads to two masters in the cluster. Only one master can succeed the recovery because only one master can lock the cstate (see Phase 2: LOCKING_CSTATE).
|
||||
|
||||
@ -151,7 +151,7 @@ Not every FDB role participates in the recovery phases 1-3. This phase tells the
|
||||
Storage servers (SSes) are not involved in the recovery phase 1 - 3. To notify SSes about the recovery, the master commits a recovery transaction, the first transaction in the new generation, which contains the txnStateStore information. Once storage servers receive the recovery transaction, it will compare its latest data version and the recovery version, and rollback to the recovery version if its data version is newer. Note that storage servers may have newer data than the recovery version because they pre-fetch mutations from tLogs before the mutations are durable to reduce the latency to read newly written data.
|
||||
|
||||
|
||||
Commit proxies haven’t recovered the transaction system state and cannot accept transactions yet. The master recovers proxies’ states by sending the txnStateStore to commit proxies through commit proxies’ (`txnState`) interfaces in `sendIntialCommitToResolvers()` function. Once commit proxies have recovered their states, they can start processing transactions. The recovery transaction that was waiting on commit proxies will be processed.
|
||||
Commit proxies haven’t recovered the transaction system state and cannot accept transactions yet. The master recovers proxies’ states by sending the txnStateStore to commit proxies through commit proxies’ (`txnState`) interfaces in `sendInitialCommitToResolvers()` function. Once commit proxies have recovered their states, they can start processing transactions. The recovery transaction that was waiting on commit proxies will be processed.
|
||||
|
||||
|
||||
The resolvers haven’t known the recovery version either. The master needs to send the lastEpochEnd version (i.e., last commit of the previous generation) to resolvers via resolvers’ (`resolve`) interface.
|
||||
|
@ -37,7 +37,7 @@ To make commits available to storage servers efficiently, a transaction log
|
||||
maintains a copy of the commit in-memory, and maintains one queue per tag that
|
||||
indexes the location of each mutation in each commit with the specific tag,
|
||||
sequentially. This way, responding to a peek from a storage server only
|
||||
requires sequentailly walking through the queue, and copying each mutation
|
||||
requires sequentially walking through the queue, and copying each mutation
|
||||
referenced into the response buffer.
|
||||
|
||||
Transaction logs internally handle commits via performing two operations
|
||||
@ -366,9 +366,9 @@ from the returned data. Therefore, the length of the result will not be the
|
||||
same as `end-start`, intentionally. For this reason, the API is `(start, end)`
|
||||
and not `(start, length)`.
|
||||
|
||||
Spilled data, when using spill-by-value, was resistent to bitrot via data being
|
||||
Spilled data, when using spill-by-value, was resistant to bitrot via data being
|
||||
checksummed interally within SQLite's B-tree. Now that reads can be done
|
||||
directly, the responsibility for verifing data integrity falls upon the
|
||||
directly, the responsibility for verifying data integrity falls upon the
|
||||
DiskQueue. `CheckHashes::YES` will cause the DiskQueue to use the checksum in
|
||||
each DiskQueue page to verify data integrity. If an externally maintained
|
||||
checksums exists to verify the returned data, then `CheckHashes::NO` can be
|
||||
@ -467,7 +467,7 @@ recovery.
|
||||
|
||||
In spill-by-value, the DiskQueue only ever contained commits that were also
|
||||
held in memory, and thus recovery would need to read up to 1.5GB of data. With
|
||||
spill-by-reference, the DiskQueue could theoretically contain terrabytes of
|
||||
spill-by-reference, the DiskQueue could theoretically contain terabytes of
|
||||
data. To keep recovery times boundedly low, FDB must still only read the
|
||||
commits that need to be loaded back into memory.
|
||||
|
||||
@ -560,7 +560,7 @@ minor impacts on recovery times:
|
||||
|
||||
1. Larger disk queue file means more file to zero out in the case of recovery.
|
||||
|
||||
This should be negligable when fallocate `ZERO_RANGE` is available, because then it's only a metadata operation.
|
||||
This should be negligible when fallocate `ZERO_RANGE` is available, because then it's only a metadata operation.
|
||||
|
||||
2. A larger file means more bisection iterations to find the first page.
|
||||
|
||||
|
@ -16448,7 +16448,7 @@ static int os2Delete(
|
||||
}
|
||||
|
||||
/*
|
||||
** Check the existance and status of a file.
|
||||
** Check the existence and status of a file.
|
||||
*/
|
||||
static int os2Access(
|
||||
sqlite3_vfs *pVfs, /* Not used on os2 */
|
||||
@ -18731,7 +18731,7 @@ static int nolockClose(sqlite3_file *id) {
|
||||
/******************************************************************************
|
||||
************************* Begin dot-file Locking ******************************
|
||||
**
|
||||
** The dotfile locking implementation uses the existance of separate lock
|
||||
** The dotfile locking implementation uses the existence of separate lock
|
||||
** files in order to control access to the database. This works on just
|
||||
** about every filesystem imaginable. But there are serious downsides:
|
||||
**
|
||||
@ -18746,7 +18746,7 @@ static int nolockClose(sqlite3_file *id) {
|
||||
**
|
||||
** Dotfile locking works by creating a file in the same directory as the
|
||||
** database and with the same name but with a ".lock" extension added.
|
||||
** The existance of a lock file implies an EXCLUSIVE lock. All other lock
|
||||
** The existence of a lock file implies an EXCLUSIVE lock. All other lock
|
||||
** types (SHARED, RESERVED, PENDING) are mapped into EXCLUSIVE.
|
||||
*/
|
||||
|
||||
@ -22040,7 +22040,7 @@ static int unixDelete(
|
||||
}
|
||||
|
||||
/*
|
||||
** Test the existance of or access permissions of file zPath. The
|
||||
** Test the existence of or access permissions of file zPath. The
|
||||
** test performed depends on the value of flags:
|
||||
**
|
||||
** SQLITE_ACCESS_EXISTS: Return 1 if the file exists
|
||||
@ -26132,7 +26132,7 @@ static int winDelete(
|
||||
}
|
||||
|
||||
/*
|
||||
** Check the existance and status of a file.
|
||||
** Check the existence and status of a file.
|
||||
*/
|
||||
static int winAccess(
|
||||
sqlite3_vfs *pVfs, /* Not used on win32 */
|
||||
@ -26681,7 +26681,7 @@ SQLITE_API int sqlite3_os_end(void){
|
||||
/*
|
||||
** A bitmap is an instance of the following structure.
|
||||
**
|
||||
** This bitmap records the existance of zero or more bits
|
||||
** This bitmap records the existence of zero or more bits
|
||||
** with values between 1 and iSize, inclusive.
|
||||
**
|
||||
** There are three possible representations of the bitmap.
|
||||
|
Loading…
x
Reference in New Issue
Block a user