foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-06-03 03:41:53 +08:00

Author	SHA1	Message	Date
Alex Miller	4a7e0319c7	Refactor away pushlock. Pushing was already a serialized, sequential operation. Instead make it explicit that there are two waits as part of a push: 1. The setup work to reserve a spot on in the file 2. The work of writing and sync'ing the data And we return a Future<Future<Void>> to force these to be done sequentially.	2019-05-10 20:30:52 -10:00
Alex Miller	ea12a54946	Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES So as to not make filesystem assumptions. This knob did technically appear in (only the) 6.1.5 release, but this feature was broken 6.1.5, so thus impossible to use anyway.	2019-05-10 18:26:22 -10:00
Alex Miller	c95d09f9fd	Convert truncate(0) to truncate(4KB) on Windows. Blindly, in case Windows doesn't like 0 length truncates too.	2019-05-10 14:55:11 -10:00
Alex Miller	c502ed3d15	Fix a variety of problems stemming from a wait() being added to push(). And that this code was previously insufficiently tested.	2019-05-10 14:55:11 -10:00
Alex Miller	510b0b2fcd	Fix DiskQueue not replaceFile'ing frequently enough for the final time.	2019-05-08 23:08:25 -10:00
Alex Miller	c6c33a4daa	Make replaceFile more likely to be tested.	2019-05-08 21:23:42 -10:00
Alex Miller	0d0f54d1e6	Fix IAsyncFileSystem::open() flags to stop a crash. OPEN_ATOMIC_WRITE_AND_CREATE was missing a required OPEN_CREATE. I'm honestly baffled how this was missed in testing.	2019-05-08 21:22:40 -10:00
Alex Miller	b50926c792	replaceFile is truncate(0) on windows	2019-05-08 21:22:14 -10:00
Alex Miller	e4ba2f5788	Add an ending TraceEvent.	2019-05-08 12:35:12 -10:00
Alex Miller	c093017c2f	Add a TraceEvent and release note.	2019-05-08 12:34:25 -10:00
Alex Miller	0685e6c1c7	Avoid large truncates in the DiskQueue. And instead create a new file while incrementally truncating the old one down. This avoids queueing up a massive number of filesystem metadata operations in one call, thus flooding the disk with requests and stalling out all other filesystem operations. This sets the knobs so that a truncate of >10GB causes us to create a new file rather than trying to truncate the old one.	2019-05-08 12:33:31 -10:00
Alex Miller	36dfbf4fb3	Only truncate DiskQueues down to TLOG_HARD_LIMIT2. DiskQueue shrinking was implemented for spill-by-reference, as now a DiskQueue could grow "unboundedly" large. Without a minimum file size, write burst workloads would cause the DiskQueue to shrink down to 100MB, and then grow back to its usual ~4GB size in a cycle. File growth means filesystem metadata mutations, which we'd prefer to avoid if possible since they're more unpredicatble in terms of latency. In a healthy cluster, the TLog never spills, so the disk of a single DiskQueue file should stay less than 2TLOG_SPILL_THRESHOLD. In the worst case of spill-by-value, the DiskQueue could grow to 2*TLOG_HARD_LIMIT. Therefore, having this limit will cause DiskQueue shrinking to never behave sub-optimally for spill-by-value, and will cause the DiskQueue files to return to the optimal size with spill-by-reference.	2019-05-08 12:33:31 -10:00
Alex Miller	a269a784cc	Convert push() into an actor.	2019-05-08 12:33:31 -10:00
Evan Tschannen	68c773987c	Merge pull request #1544 from etschannen/release-6.1 The team tracker does not provide data movement priority information for non-failure related data movement	2019-05-08 11:39:17 -07:00
Balachandar Namasivayam	d45e7bf0b1	Addressed review comments	2019-05-07 17:19:59 -07:00
Evan Tschannen	d9a4553270	fix: The team tracker does not provide data movement priority information for non-failure related data movement	2019-05-07 17:06:54 -07:00
Balachandar Namasivayam	5d824f5fbc	Address review comments	2019-05-07 17:06:52 -07:00
Balachandar Namasivayam	a0cc3d98a1	Add a workload to trigger repeated recoveries.	2019-05-06 18:16:44 -07:00
Evan Tschannen	93eb2a9395	Merge pull request #1527 from alexmiller-apple/tstlog-6.1 Spill-by-reference knob + TLog6.0 Spilled Peek deprioritization	2019-05-03 17:19:45 -07:00
Alex Miller	c918b21137	Deprioritize spilled peeks in spill-by-value, and improve its logic. This deprioritizes before calling peekMessagesFromMemory, which should improve the memory usage of the TLog, and makes sure to keep txsTag peeks at a high priority to help recoveries stay fast.	2019-05-03 15:27:11 -07:00
Alex Miller	4052f3826a	Add a knob to limit the number of commits indexed per key. Theoretically, we could spill 20MB of 22B mutations for one key, which would generate a very long value being stored in SQLite, and very inefficiently read back. This stops that from being a problem, at the cost of some extra write calls.	2019-05-03 15:27:10 -07:00
Evan Tschannen	12088119d2	Merge pull request #1517 from alexmiller-apple/tstlog-6.1 Add a knob to limit amount of data read from sqlite for one PeekRequest.	2019-05-03 11:01:11 -07:00
Alex Miller	f4e48c3851	Add a knob to limit amount of data read from sqlite for one PeekRequest. This prevents peeking from degrading over time if there are a very large number of SpilledData entries for one particular tag.	2019-05-02 17:26:45 -07:00
Evan Tschannen	c91ac03ec6	LogRouterStats did not need to be a separate struct	2019-05-02 17:24:39 -07:00
Evan Tschannen	8590b710bf	added additional logging on the logs and log routers	2019-05-02 17:24:39 -07:00
Evan Tschannen	cacd82758e	Reduced data distribution speeds	2019-04-26 13:54:49 -07:00
Evan Tschannen	9ff8aca1da	Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)	2019-04-26 13:53:56 -07:00
Evan Tschannen	1f37f82b87	invalid knob overrides do not prevent fdbserver from starting	2019-04-25 17:08:13 -07:00
Evan Tschannen	6c77864731	separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests	2019-04-25 17:07:35 -07:00
A.J. Beamon	253d2400ef	Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning # Conflicts: # documentation/sphinx/source/release-notes.rst	2019-04-23 14:38:52 -07:00
A.J. Beamon	ea7abff9df	Clean up from review	2019-04-23 14:16:52 -07:00
A.J. Beamon	4ad0496b39	Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.	2019-04-23 14:01:51 -07:00
Stephen Atherton	df0548503d	Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger	2019-04-23 13:43:58 -07:00
Stephen Atherton	83db547306	Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.	2019-04-23 04:50:58 -07:00
Evan Tschannen	e0f7ec96aa	Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers	2019-04-22 17:29:46 -07:00
A.J. Beamon	43533b3d72	Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability.	2019-04-17 11:01:23 -07:00
Balachandar Namasivayam	04e9aa6afd	For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time.	2019-04-10 17:13:37 -07:00
Evan Tschannen	d126730b4d	fixed a spurious test error where process_behind was treated as an error	2019-04-08 17:09:54 -07:00
A.J. Beamon	538b431656	Apply suggestions from code review	2019-04-08 14:55:58 -07:00
A.J. Beamon	a7288e1325	Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate.	2019-04-08 14:21:24 -07:00
Evan Tschannen	05869a8383	do not log a degraded reset message if the previous reset was more than a week ago	2019-04-07 23:00:58 -07:00
Evan Tschannen	390ab9cfed	A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy	2019-04-04 14:11:12 -07:00
Evan Tschannen	30133a30e0	Merge pull request #1403 from etschannen/release-6.1 Ported a bug fix to the 6.0 log system, and updated documentation	2019-04-02 17:56:18 -07:00
Evan Tschannen	31ed73d9f5	Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0	2019-04-02 15:27:37 -07:00
Evan Tschannen	1d4a6ab551	cleaned up status to keep the healthyZone read separated from relicaFutures	2019-04-02 14:46:56 -07:00
Evan Tschannen	a38c396283	made all maintenance transactions lock aware	2019-04-02 14:27:48 -07:00
Evan Tschannen	628fec8c8b	updated status with information about ongoing maintenance clear the maintenance zone if a different storage server is detected failed	2019-04-02 14:15:51 -07:00
Evan Tschannen	781cf9b5a0	added the ability to make a zoneId for maintenance in fdbcli	2019-04-01 17:55:13 -07:00
Evan Tschannen	f5de52de91	fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time	2019-04-01 16:38:25 -07:00
Evan Tschannen	8ebf771392	cleanup cluster controller trace events	2019-03-30 14:17:18 -07:00

1 2 3 4 5 ...

1719 Commits