628 Commits

Author SHA1 Message Date
Evan Tschannen
b61a911685 removed an ASSERT that was for debugging purposed, and increased the max commit latency, because it can be spuriously triggered by dummy transactions that take 5+ seconds each 2021-04-21 14:30:06 -07:00
Evan Tschannen
e18c9961b4 rewrote tlog recruitment logic so that it is deterministic, to prevent better master exists from triggering spuriously 2021-04-21 00:22:33 -07:00
Lukas Joswiak
c81e1e9519 Add sampling profiler frequency to global config 2021-04-19 22:46:57 -07:00
RenxuanW
4bf7218e8f
Merge pull request #4635 from RenxuanW/priority_logging
Log a warning when remote dc is disabled (priority < 0)
2021-04-15 17:00:41 -07:00
Lukas Joswiak
7de23918c0 Add comments, fix erase bug, make optimizations 2021-04-14 10:56:33 -07:00
Lukas Joswiak
c38ddf5eb7 Add comments 2021-04-14 10:56:33 -07:00
Lukas Joswiak
7ba7257cd2 Store global config data on heap 2021-04-14 10:56:33 -07:00
Lukas Joswiak
1c60653c2a Add fix to conditionally set global config history 2021-04-14 10:56:33 -07:00
Lukas Joswiak
6de28dd916 clang-format 2021-04-14 10:56:33 -07:00
Lukas Joswiak
1260385965 Use object to wrap global configuration history 2021-04-14 10:56:32 -07:00
Lukas Joswiak
fb9a929780 Fix issue with freed memory being accessed 2021-04-14 10:56:32 -07:00
Lukas Joswiak
c3f68831af Move existing ClientDBInfo variables to global configuration 2021-04-14 10:56:32 -07:00
Lukas Joswiak
7bb0b3d899 Use commit version for global configuration updates
FIXME: There is a memory issue where the underlying data for values set
in the `data` field of GlobalConfig will be freed shortly after being
set.
2021-04-14 10:56:32 -07:00
Lukas Joswiak
f1415412f1 Add global configuration framework implementation 2021-04-14 10:56:32 -07:00
Evan Tschannen
bd6db9ca7c
Update fdbserver/ClusterController.actor.cpp
Co-authored-by: Markus Pilman <markus.pilman@snowflake.com>
2021-04-13 15:13:45 -07:00
RenxuanW
7be8dab045 Change DcPriorityNegative to CCDcPriorityNegative 2021-04-08 16:00:37 -07:00
RenxuanW
738e7402f7 Log a warning when remote dc is disabled (priority < 0) 2021-04-08 15:36:52 -07:00
RenxuanW
f3d5fa4750 Revert "Log a warning when remote dc's priority doesn't match the original primary."
This reverts commit 1d701e8bcfcd01b31949f92e095fd405b4826cfd.
2021-04-08 15:19:43 -07:00
RenxuanW
1d701e8bcf Log a warning when remote dc's priority doesn't match the original primary. 2021-04-08 14:38:37 -07:00
Evan Tschannen
a90c26f1d0 The master, proxies, and resolver all need to have the same machine class fitness function besides best fit to ensure recruitment is deterministic
if the first GRV proxy or resolver is forced to share a process, it should prefer to share with the commit proxy so that the commit proxy has more potential options it can share with
2021-04-08 14:29:12 -07:00
Evan Tschannen
5695a1816f fix: requiredFitness was being set to one higher than the actual requirement 2021-04-07 21:31:14 -07:00
Evan Tschannen
1b1f73ea16 added comments 2021-04-07 20:40:42 -07:00
Evan Tschannen
4d8dd0b0a0 fix: desired must be greater than or equal to required 2021-04-07 20:32:45 -07:00
Evan Tschannen
14213b0151 code cleanup 2021-04-07 20:06:30 -07:00
Evan Tschannen
15e8b43961 rewrote getWorkersForTLogs to do a much better job of avoiding degraded processes and processes in the same DC as the cluster controller 2021-04-07 19:57:24 -07:00
Evan Tschannen
c27d82cecd tlog recruitment used a degraded LogClass process over a non-degraded TransactionClass process
tlog recruitment would not use TransactionClass processes if it fulfulled the required amount with LogClass processes
Better master exists did not account for how many times a process had been used when comparing recruitments
Better master exists did not account for the fact that tlogs prefer to be in a different dc than the cluster controller
RoleFitness comparison did not properly order count before degraded or bestFit
betterCount was returning worstFit when worstIsDegraded did not match
backupWorker recruitment did not attempt to avoid sharing processes with other roles
If any of the commit_proxy, grv_proxy, or resolver are forced to share a process, allow the recruitment for all of them to share to an equal degree, this change allows BetterMasterExists to be refactors as a tuple comparison
2021-04-07 16:04:08 -07:00
Markus Pilman
50342b5082 fix a second low-latency bug 2021-03-29 13:31:26 -06:00
Markus Pilman
8555723b98 removing testing case 2021-03-26 15:46:54 -06:00
Markus Pilman
43bed1d9dd Fix bug where betterMasterExist and recruitment disagree 2021-03-26 15:06:59 -06:00
Evan Tschannen
10b6b5d710 If the current configuration does not have a satellite fallback policy we do not care if the old configuration is in fallback mode 2021-03-23 13:02:31 -07:00
A.J. Beamon
99f3bb6d7d
Merge pull request #4509 from sfc-gh-etschannen/feature-bme-count
Do not trigger BetterMasterExists if it lowers the number of processes
2021-03-22 13:43:24 -07:00
Zhe Wu
15f3699e22 Add targeting DC ids in the tlog recruitment event trace. 2021-03-19 14:10:38 -07:00
Meng Xu
0cedef123b
Merge pull request #4518 from halfprice/zhewu/log-tlog-recruitment-failure-reason
Logging more detailed information during Tlog recruitment
2021-03-19 11:36:05 -07:00
Zhe Wu
58d9f47782 log fitness for excluded workers as well 2021-03-19 11:04:53 -07:00
Zhe Wu
4c00361f1c Add comment for 'getWorkersForTlogs' method, and addressed TraceEvent formatting comments. 2021-03-18 21:33:43 -07:00
Zhe Wu
9419387295 Update logging field. 2021-03-18 14:53:43 -07:00
Evan Tschannen
2ff63f544e
Update fdbserver/ClusterController.actor.cpp
Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com>
2021-03-18 13:45:51 -07:00
Zhe Wu
451b14af09 Log detailed information when a worker is considered as unavailable by the cluster controller for TLog recruitment. 2021-03-18 12:18:03 -07:00
Zhe Wu
6468c5aed6 Fix string join 2021-03-17 23:46:11 -07:00
Zhe Wu
1205650a69 Log the dcid during TLog recruitment, so that we can tell in which DC the recruitment is happening 2021-03-17 23:22:42 -07:00
Evan Tschannen
9aeb69ca1c added a comment 2021-03-16 14:19:23 -07:00
Evan Tschannen
d0f134c20e added a comment 2021-03-16 13:17:56 -07:00
Evan Tschannen
2a272e525f fix compile error 2021-03-16 12:21:21 -07:00
Evan Tschannen
10fd094920 Better master exists should not trigger if it will lower the total number of processes being recruited 2021-03-16 12:14:19 -07:00
FDB Formatster
df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Evan Tschannen
346a4e3ecd Merge branch 'release-6.3'
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/MultiInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2021-03-01 18:52:06 -08:00
Meng Xu
33eb1de00e Add some comment to log system
and resolve review comment by deleting my questions.
2021-02-19 21:44:13 -08:00
Meng Xu
9122be4d81 Add comments to HA code and loadBalance code 2021-02-10 13:51:36 -08:00
Richard Chen
c77d9e4abe merge conflicts 2020-12-02 21:53:19 +00:00
Markus Pilman
bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00