foundationdb

mirror of https://github.com/apple/foundationdb.git synced 2025-05-21 22:33:17 +08:00

Author	SHA1	Message	Date
Meng Xu	cf935ff9e6	Remove debug message and format code	2019-07-11 22:05:20 -07:00
Meng Xu	cd28a0b604	Reenable check each server must have at least 1 team	2019-07-11 17:58:14 -07:00
Meng Xu	221e6945db	TeamTracker:Fix bug in counting optimalTeamCount When a teamTracker is cancelled, e.g, by redundant teamRemover or badTeamRemover, we should decrease the optimalTeamCount if the team is considered as an optimal team, i.e., all members' machine fitness is no worse than unset, and the team is healthy.	2019-07-11 17:22:41 -07:00
Meng Xu	4c32593f59	QuietDB:Do not check when machineId is not zoneID	2019-07-11 10:37:16 -07:00
Meng Xu	4fae510633	AddBestMachineTeams:BugFix:Must build team when it has remainingMachineTeamBudget	2019-07-10 11:55:06 -07:00
Meng Xu	9816fb6aca	ConsistencyCheck:Check minServerTeamOnServer larger than 0	2019-07-10 11:53:47 -07:00
Meng Xu	522230f050	ConsistencyCheck:getTeamCollectionValid tries 10 times before return false Because serverTeamRemover takes time to remove teams, getTeamCollectionValid() need to wait for a while before concluding that the number of server teams is larger than the desired number.	2019-07-09 11:46:57 -07:00
Meng Xu	cf03b274a2	TeamTracker:Add traceTeamCollectionInfo	2019-07-08 23:01:25 -07:00
Meng Xu	08d76a7bbe	ServerTeamRemover:Bug fix and clang-format	2019-07-08 17:08:32 -07:00
Meng Xu	9cc11e88c5	TeamBuilder:Reduce unnecessary calculation of remainingTeamBudget	2019-07-08 16:56:06 -07:00
Meng Xu	08a721b320	Merge branch 'master' into mengxu/server-team-remover-PR	2019-07-08 16:30:32 -07:00
A.J. Beamon	2a56e011ea	Merge branch 'release-6.1' into merge-release-6.1-into-master # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/DataDistribution.actor.cpp	2019-07-05 13:52:29 -07:00
Meng Xu	599fcb2e6d	Add serverTeamRemover to remove redundant server teams	2019-07-02 17:40:37 -07:00
Meng Xu	716494ed9f	ConsistencyCheck:Check serverTeamNumber larger than desired number	2019-07-02 17:40:37 -07:00
Meng Xu	875cb877ac	TeamCollection: Apply clang-format	2019-06-28 16:01:05 -07:00
Meng Xu	0baae134f6	TeamCollectionInfo: Resolve review comments	2019-06-28 15:59:47 -07:00
Meng Xu	4da345f7d2	TeamCollectionTest:Remove test on minTeamOnServer	2019-06-27 19:05:10 -07:00
Meng Xu	f889843332	Change traceTeamCollectionInfo to actor There are cases where traceTeamCollectionInfo was called within the same execution block, i.e., no wait between the two traceTeamCollectionInfo calls. Because simulation uses the same time for all execution instructions in the same execution block, having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics. When one of them is always chosen by simulator, simulation test will report false positive error. Changing this function to actor and adding a small delay inside the function can solve this problem.	2019-06-27 18:24:20 -07:00
Meng Xu	4fe3c7f749	TeamCollectionInfo:Revert to original version where it is	2019-06-27 17:09:21 -07:00
Meng Xu	42620e4831	TeamCollectionTest:GetTeamCollectionValid wait until values are correct	2019-06-27 16:52:36 -07:00
Meng Xu	8d5e848808	QuitDatabase test: Check each server has at least 1 team	2019-06-27 14:22:41 -07:00
Meng Xu	53324e4db7	TeamCollectionInfo: clang format	2019-06-27 11:27:29 -07:00
Meng Xu	cc6a0e9bcd	TeamCollectionTest:Do not enforce minServerTeamOnServer larger than 0 In ConfigureTest, one server may be left with 0 server teams, even if we call buildTeams in the storageServerTracker.	2019-06-27 11:27:29 -07:00
Meng Xu	02cdcc0b0c	TeamCollectionTest: Only ensure each server and machine have a team	2019-06-27 11:27:29 -07:00
Meng Xu	21664742a6	TeamCollection:Desired team number may be larger than the max possible team number For example, we have 3 servers for replica factor 3. We can have only 1 team but the desired team number is 3 times 5 equal to 15. Instead of sanity checking the absolute team number per server, we check the difference between the minServerTeamOnServer and maxServerTeamOnServer.	2019-06-27 11:15:06 -07:00
Meng Xu	08f28e99f9	TeamCollection:Test no server or machine has incorrect team number Add test for simulation test which make sure the server team number per server will be no less than the desired_teams_per_server defined in knobs and no larger than the max_teams_per_server. Add similar test for machine teams number per machine as well.	2019-06-27 11:15:06 -07:00
A.J. Beamon	f417e60264	Merge branch 'merge-release-6.1-into-master' into thread-safe-random-number-generation # Conflicts: # fdbserver/QuietDatabase.actor.cpp	2019-05-23 09:52:00 -07:00
A.J. Beamon	d29c7e4c9b	Merge branch 'release-6.1' into merge-release-6.1-into-master # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbserver/QuietDatabase.actor.cpp # versions.target	2019-05-23 09:28:45 -07:00
Evan Tschannen	f4b18f2c4f	fixed whitespace	2019-05-21 11:31:34 -07:00
Evan Tschannen	23091a7d96	fixed review comments	2019-05-21 10:53:36 -07:00
Evan Tschannen	4059d68348	fix: the tlog would not pop data from the disk queue after a storage server was removed, because the tag still exists in memory on the logs fix: we could incorrectly make data durable if eraseMessagesFromMemory was in progress while running updatePersistentData the quiet database check now ensure that tlogs have no more than 30 seconds of versions unpopped from the disk queue	2019-05-20 23:58:45 -07:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Austin Seipp	bf378952cb	fdbserver: fix some print/scan format warnings Signed-off-by: Austin Seipp <aseipp@pobox.com>	2019-05-06 13:35:29 -07:00
Evan Tschannen	710a64dc4e	replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails	2019-03-08 11:25:07 -05:00
Evan Tschannen	d008de576e	Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR Add background actor to remove redundant teams	2019-02-22 14:22:07 -08:00
mpilman	999ea09bfd	Use correct fwd decls in TesterInterface Also TesterInterface.h -> TesterInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3f0fd2a20c	Use fwd decls in WorkerInterface Also WorkerInterface.h -> WorkerInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	0bb60e5a3b	Use proper fwd decl in NativeAPI Also NativeAPI.h -> NativeAPI.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3cb2391b58	use proper fwd declarations in ManagementAPI Also ManagementAPI.h -> ManagementAPI.actor.h	2019-02-19 15:16:59 -08:00
Meng Xu	ed1d4635bc	TeamRemover: Format cleaning Use clang-format and remove debug messages for the code that fixes bugs in merging the PR of adding a DataDistributor role	2019-02-19 08:13:10 -08:00
Meng Xu	b35631365f	TeamRemover: Solve confict when merge with PR 1061 The previous commit merge with the master, which just merges the pull request #1062 from jzhou77/PR that adds a new DataDistribution role. The merge causes conflicts and errors in simulation tests. This commit resolves the code conflicts and tries to fix the new errors after incorporating the new DataDistribution role	2019-02-19 08:13:10 -08:00
Meng Xu	6d09ac483c	Merge with master	2019-02-15 17:03:40 -08:00
Jingyu Zhou	5e6577cc82	Final cleanup per review comments Make distributor interface optional in ServerDBInfo and many other small changes.	2019-02-14 16:37:17 -08:00
Jingyu Zhou	07dab56133	Fix a data movement stuck bug When moving keys to a team, if one of the server in the target team died, then the move can become stuck. This is because the DDTeamCollection waits for all the data movement of the failed server to be completed. However, in this case, because the movement has not finished yet, checking the database tells us there is no key assocated with this server and it is safe to go ahead. In reality, only the in-memory structure knows there is pending movement, i.e., unfinished move causes some keys to be attributed to the failed server. Thus, the server can't be removed yet. Fix by adding a check with in-memory structure in waitForAllDataRemoved(). Use const& to optimize a few function parameters.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	b3d1633114	Fix bugs of missing request The quite database can fail to send out requests and report timeout. This seems to be caused by reusing a request that uses the same ReplyPromise. Another bug is Proxy can wait for unneeded time for a dabase change, while the distributor is already known to itself.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	3135f1d84b	Cluster controller ignores distrobutor rejoin After controller starts one, it will wait for that one and ignore any rejoins received later. Add remoteRecovered() to data distribution for remote team collection.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	ef868f599c	Add DataDistributorInterface to ServerDBInfo Also change the Proxy and QuietDatabase to use the DataDistributorInterface.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	0490160714	Fix according to Evan's comments Use getRateInfo's endpoint as the ID for the DataDistributorInterface. For now, added a "rejoined" flag for ClusterControllerData and Proxy. TODO: move DataDistributorInterface into ServerDBInfo.	2019-02-14 16:30:13 -08:00
Jingyu Zhou	886e7ab2ba	Add a new DataDistributor role. Let cluster controller to start a new data distributor role by sending a message to a chosen worker. Change MasterInterface usage in DataDistribution to masterId Add DataDistributor rejoin handling. This allows the data distributor to tell the new cluster controller of its existence so that the controller doesn't spawn a new one. I.e., there should be only ONE data distributor in the cluster. If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries to recruit one as DD. CC also monitors DD and restarts one if it failed. The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for the new DD. Add GetRecoveryInfo RPC to master server, which is called by data distributor to obtain the recovery Transaction version from the master server.	2019-02-14 16:30:13 -08:00
Andrew Noyes	067a445e06	Replace unused _ variables with wait(success(...))	2019-02-12 17:30:30 -08:00

1 2

93 Commits