135 Commits

Author SHA1 Message Date
Syed Paymaan Raza
1f8b72c066
Gray failure machine readable status (#11758) 2024-11-12 15:50:08 -08:00
neethuhaneesha
adf0e8fa18
Rocksdb metrics in status json (#11321) 2024-04-18 22:00:58 -07:00
Sreenath Bodagala
bd99c12aa4 - Address simulation test failures and update status related documentation 2023-11-08 21:18:57 +00:00
William Dowling
0f752473be
Merge branch 'main' into radixtree-production 2023-09-25 09:52:20 +02:00
Zhe Wu
cb1e792169 Fix status json schemas perpetual_storage_wiggle_engine 2023-09-05 11:10:53 -07:00
Zhe Wu
314d1b66a5 Fix StatusWorkload after adding perpetual_storage_wiggle_engine 2023-08-29 13:58:23 -07:00
William Dowling
3ea1ba1648 Remove beta status from RadixTree storage engine 2023-07-05 17:54:54 +02:00
Josh Slocum
2916a11a86
New ConsistencyScan (#10265)
* Remove duplicate getRange() for DB handles and update existing GetRange to accept DB handles.

* Initial progress checkpoint on new ConsistencyScan role.

* Updated TODOs, finished most if not all state updates.

* placeholder

* Add more TODOs, documentation and comment improvements.

* Checkpoint round state to avoid advancing progress if commit fails.

* Bug fix, check is supposed to be for overlap, not lack of overlap.

* Added more TODO's and added faked read results / exceptions and faked DB size retrieval to prove the consistencyScanCore logic works.

* Update JSON schemas and command help.

* Add comment about lifetime stats reset.

* More TODO comments and some renames for clarity, some bug fixes.

* properly stopping consistency scan in simulation so that it doesn't run forever and cause quiet database to fail

* removing trailing comma from consistency_scan json schema

* Making CC inconsistency not an error if it's intentional tss corruption

* consistency scan actually reads storage locations

* added check that consistency scan actually completes a round in simulation, fixed bug and added debugging around consistency scan getting stuck

* made consistency scan properly fetch database size

* refactoring data check to be used in both consistency scan and consistency check

* checking that consistency scan always completes at least one round and doesn't get stuck

* cleanup

* fixing ide build

* consistencyscan fdbcli command wasn't actually changing db state

* consistencyscan fdbcli command always said enabled even when it wasn't

---------

Co-authored-by: Steve Atherton <steve.atherton@snowflake.com>
2023-05-18 15:02:41 -05:00
Xiaoxi Wang
3605d8c74c populate storage metadata for tss 2023-05-01 18:08:08 -07:00
A.J. Beamon
b258159d3a Change enum capitalization. Improve error reporting if we cannot read metacluster registration when fetching metacluster metrics. Improve timeliness of metacluster metrics updates. 2023-05-01 11:21:42 -07:00
Steve Atherton
50d567b5a5 Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a configure command. Refactored fileconfigure and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to ssd-redwood-1 canonically but the experimental name is still supported for backward compatibility. 2023-03-04 20:52:31 -08:00
Lukas Joswiak
2b5c0ebe7b Add version epoch to status json
Adds a new `version_epoch` object to `status json`, which includes the
status of the feature, and the current epoch if it is enabled. If the
version epoch is disabled, the `epoch` field will not be present.

```
{
    "client" : {
        ...
    },
    "cluster" : {
        ...
        "version_epoch" : {
            "enabled" : "true",
            "epoch" : "100000"
        },
        ...
    }
}
```
2023-01-30 13:21:19 -08:00
Andrew Noyes
91a2010a34 Add .cluster.idempotency_ids to status json 2022-12-14 07:37:44 -08:00
Jon Fu
50d616bb69 add metacluster status details to schemas 2022-11-17 12:59:59 -08:00
Zhe Wu
550e1e86e8 Also add fetch_consistency_scan_info_timeout to mr-status-json-schemas.rst.inc 2022-10-17 15:35:32 -07:00
Ata E Husain Bohra
28e608e717
Encryption data at-rest db-config (#7929)
* Encryption data at-rest db-config

Description

 diff-1: Handle 'force' updates to encryption_at_rest db-config

Major changes proposed:
1. Introduce 'encryption_data_at_rest_mode" 'configure new'
option to enable Encryption data at-rest. The feature is disabled
by default.
2. The configuration is meant to be set at the time of database
creation, addition checks will be done to avoid updating the config
in subsequent PR.
3. DatabaseConfiguration validity check to account for "tenant_mode"
set to `required` if Encryption data at-rest is selected given
EncryptionDomain matches Tenant boundaries.

Testing

devCorrectness - 100K
2022-09-02 14:11:38 -07:00
Evan Tschannen
a9d3c9f9b3
Added throttling when a blob worker falls behind (#7751)
* throttle the cluster when blob workers fall behind

* do not throttle on blob workers if they are not enabled

* remove an unnecessary actor

* fixed a compile error

* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently

* fixed another compilation bug

* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.

* fixed a number of problems

* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient

* fix: do not let desired go backwards

* fix: track the version of notAtLatest changefeeds for throttling

* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers

* added metrics for blob worker change feeds

* added a knob to disable bw throttling

* fixed the transaction options in blob manager
2022-08-12 13:15:56 -07:00
Xiaoge Su
0326d53965 Split proxy_memory_limit_exceeded to commit/grv specific exceptions
Currently GRV is reporting proxy_memory_limit_exceeded error which has
error message claiming Commit proxy failing. This split should remove
such confusion.
2022-08-12 00:45:57 -07:00
Jingyu Zhou
16519a9e5f Update status json doc with fetch_storage_wiggler_stats_timeout error 2022-07-24 15:24:21 -07:00
Jingyu Zhou
217ba24b6f Add rss_bytes to process memory and fix available_bytes calculation
Since memory is now limited with RSS size, add RSS size in status json for
reporting. Also change how available_bytes is calculated from:
  (available + virtual memory) * process_limit / machine_limit
to:
  (available memory) * process_limit / machine_limit
2022-06-07 16:44:14 -07:00
Sagar Vemuri
ebf11d5d48 Update schemas with the tenants information 2022-05-16 11:09:10 -07:00
Zhe Wang
6c9ff6ee5e
Add sharded rocksdb type (#6862)
* add-sharded-rocksdb-type

* address comments

Co-authored-by: Zhe Wang <zhewang@Zhes-MacBook-Pro.local>
2022-04-21 22:53:14 -04:00
Xiaoxi Wang
5d5dae9a0c update release note 2022-04-20 21:27:22 -07:00
He Liu
dd15489605 rename ssd-rocksdb-experimental as ssd-rocksdb-v1. 2022-03-29 10:53:38 -07:00
A.J. Beamon
72a34945ce Add the ability to disable tenants. Server processes verify the ID of tenants being read or written. 2022-03-06 21:54:21 -08:00
A.J. Beamon
ea273291c7 Add new tenant mode configuration field to status documentation 2022-03-06 21:54:21 -08:00
Steve Atherton
c53f5aa110 Renamed redwood to redwood-1-experimental and file extension to .redwood-v1. 2021-11-16 02:15:22 -08:00
Neethu Haneesha Bingi
3ea7209013 Simulation changes to randomly wiggle with locality filter and review comments. 2021-09-30 10:00:33 -07:00
Neethu Haneesha Bingi
3e79299898 Locality filter support to perpetual storage wiggler feature. 2021-09-30 10:00:33 -07:00
Hari Bhaskaran
ee1056cacd Remove incorrect comment
As described in this comment, https://forums.foundationdb.org/t/questions-on-status-json/2843/3?u=harikb , removing the comment that will make reader think this is about RAM. No new comment is necessary since the key is already "disk"
2021-09-14 09:39:23 -07:00
Josh Slocum
9992a7b33f Added StorageMigrationType and cli commands 2021-09-14 09:55:41 -05:00
A.J. Beamon
67acb48208 Updated the status documentation to match the schema used to validate status. 2021-08-19 11:16:40 -07:00
Neethu Haneesha Bingi
66f2518405 exclude to work with any locality data match. 2021-06-23 18:03:27 -07:00
Xiaoxi Wang
454f9e9c89 update json schemas 2021-06-17 20:20:39 +00:00
Daniel Smith
ac92d84fce Update documentation 2021-06-08 14:36:26 -04:00
Josh Slocum
ce82c9653e Testing Storage Server implementation 2021-05-25 20:28:50 +00:00
Sreenath Bodagala
2fa80e7912 Address review comments 2021-05-19 22:04:43 +00:00
Sreenath Bodagala
622f43474a Expose "bounce impact" and Storage Server "version catch-up rate" metrics
Changes:

Schemas.cpp: Extend the JSON schema to report the new metrics that have
been added.

mr-status-json-schemas.rst.inc: Update the schema to reflect the changes
made to the JSON schema.

release-notes-700.rst: Add a note about the new metrics in "Status"
section.
2021-05-19 19:54:49 +00:00
Sreenath Bodagala
99f6032239 Report bounce impact info as part of cluster JSON object. 2021-05-13 16:47:05 +00:00
Sreenath Bodagala
160293bd54 Report bounce impact in fdbcli status
Changes:

Schemas.cpp: Extend the JSON schema to report whether the cluster is
bounceable and if not, report the reason for why it is not bounceable.

Status.actor.cpp: Extend recoveryStateStatusFetcher() to populate the
bounce related field(s).

mr-status-json-schemas.rst.inc: Update the schema to reflect the change
made in Schemas.cpp.

release-notes-700.rst: Add a note about the new status fields in "Status"
section.
2021-05-13 14:28:06 +00:00
Sreenath Bodagala
336a9bff66 Provide "time since last full recovery" in fdbcli status
Changes:

Schemas.cpp: Extend the JSON schema to include a new field that reports
the number of seconds since last full recovery.

Status.actor.cpp: Extend recoveryStateStatusFetcher() to populate the
new field that has been added to Schemas.cpp.

mr-status-json-schemas.rst.inc: Update the schema to reflect the change
made in Schemas.cpp.
2021-05-05 19:43:44 +00:00
Sreenath Bodagala
a9532c7e79 Expose CommitBatchingWindowSize metric to fdbcli status
Changes:

mr-status-json-schemas.rst.inc: Update schema to reflect the change made
to Schemas.cpp (to include statistics about CommitBatchingWindowSize).

release-notes-700.rst: Add a note about the new metric in the Status section.
2021-05-03 22:25:04 +00:00
Markus Pilman
340f012e1a
Merge pull request #4695 from sfc-gh-etschannen/fix-rewrite-bme
rewrote tlog recruitment logic so that it is deterministic
2021-04-27 10:19:25 -06:00
Evan Tschannen
f1559a2203 use the stateless process class instead of master or resolution in simulation because it is the recommended process class, and the others are not deterministic when recruited in a constrained process situation 2021-04-26 09:49:26 -07:00
Dan Lambright
715c98572c bit more documentation 2021-04-21 10:48:35 -04:00
Dan Lambright
cabf192f57 Respond to review comments 3/23 2021-04-06 13:05:09 -04:00
Dan Lambright
48a475366c Log latency metrics for batch GRV requests 2021-04-06 13:05:09 -04:00
Hao Fu
fb9632297e Add txnRejectedForQueuedTooLong in ProxyStats 2021-02-12 13:04:58 -08:00
A.J. Beamon
aaf0a9aa7b Merge branch 'release-6.3' into merge-release-6.3-into-master
# Conflicts:
#	build/docker-compose.yaml
#	cmake/ConfigureCompiler.cmake
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/IAsyncFile.h
#	fdbrpc/IRateControl.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbservice/ServiceBase.cpp
2021-02-08 12:58:34 -08:00
A.J. Beamon
67e783acf8 Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/CompileBoost.cmake
#	cmake/FDBComponents.cmake
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/storageserver.actor.cpp
#	flow/Knobs.h
#	flow/network.h
2021-02-08 09:20:28 -08:00