64 Commits

Author SHA1 Message Date
A.J. Beamon
fa6e45a852 Separate AsioReactor sleep and react into two different functions. Track slow tasks and time spent in react, track time spent in launch. Don't track react time at priority 0. 2019-08-28 14:35:48 -07:00
Evan Tschannen
da8163fd5a allow one requests every second to skip there all_alteratives_failed delay, because if a client has a timeout longer than the delay we will never invalidate the key servers cache 2019-08-09 13:03:40 -07:00
mpilman
370ba8b841 Remove --object-serializer flag from executables 2019-08-06 09:25:40 -07:00
Evan Tschannen
98c3b24036
Merge pull request #1869 from alexmiller-apple/sharded-txs-performance
Raise the priority of TLogRejoin above TLogPeekReply
2019-07-26 13:30:13 -07:00
Alex Miller
df7f0cffa1 Raise the priority of TLogRejoin above the default work priority.
With sharded txs tags, the master now receives data from transaction
logs at an order of magnitude higher rate.  This is the intentional
desires result of sharding the txs tag.  With a sufficient number of
TLogs, the master will saturate its CPU time handling the peek
responses.

Performance tests revealed some unstable oddities in how long a recovery
would take, which was eventually root caused to a priority inversion
between TLogRejoin requests and TLog peek replies.

Once peek replies saturate the CPU, the master would proceed to ignore
further TLogRejoin messages.  TLogRejoin is what marks a TLog as
available to the failure monitor, which is also what decides between a
ServerPeekCursor and a MergePeekCursor for a SetPeekCursor.  Ignoring
TLogRejoins meant that the sharded txs locality tags for those servers
would be merge peeked over all TLogs.  This is much less efficient than
just peeking one copy of data from the one preferred server.

Depending on the race between TLogPeek replies saturating the CPU and
TLogRejoin requests being submitted, a variable number of tags would be
affected, and thus the performance test would have some variance in its
results.
2019-07-19 16:55:04 -07:00
Alex Miller
c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
mpilman
6a4a129cf5 fixed silly boost visitor bug 2019-07-16 15:10:55 -07:00
mpilman
b18666d942 statically link libstdc++ on Linux and remove std::variant
this will hopefully fix #1610
2019-07-16 14:53:16 -07:00
Jon Fu
f707d186fe added new priority for fetchkeys requests and adjusted ddmetrics workload to run parallel with mako 2019-07-11 09:56:58 -07:00
A.J. Beamon
69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
Alex Miller
888f4f92e0 Fix errors and TaskPriority more priorities. 2019-07-03 21:03:58 -07:00
Alex Miller
ea6898144d Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-07-03 20:44:15 -07:00
Jingyu Zhou
5ea2e69016 Remove a fdbprc header from flow library
Flow should be an independent library.
2019-07-03 19:56:38 -07:00
Evan Tschannen
3fb0999e10 revert storage server priority changes 2019-07-02 16:54:47 -07:00
Alex Miller
8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen
4cef1d3937 Experimental change of storage write priority 2019-06-28 16:54:22 -07:00
A.J. Beamon
7f23814841 Track run loop busyness and report it in status. 2019-06-26 14:03:02 -07:00
Alex Miller
bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Evan Tschannen
0fe6edc254
Merge pull request #1678 from mpilman/features/external-workload
Features/external workload
2019-06-25 13:53:19 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
mpilman
2eff2b7e21 First simple test is working (but very buggy) 2019-06-19 13:03:41 -07:00
mpilman
68ce9a5e75 ProtocolVersion type - second try 2019-06-18 17:55:27 -07:00
Vishesh Yadav
a8e408e268 run clang-format on changes 2019-06-10 14:10:24 -07:00
Vishesh Yadav
6b4d30c3ae failmon: Identify client vs server when starting failure monitoring client 2019-06-09 00:43:12 -07:00
mpilman
f5fa3a65b4 some more fixes 2019-05-13 14:15:23 -07:00
mpilman
44db3450ec Several flatbuffers bug fixes 2019-05-13 14:15:23 -07:00
mpilman
69fa3d3903 fixed compilation issues after rebase 2019-05-13 14:15:23 -07:00
mpilman
9eeb48c43d Allow to turn on object serializer
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:

- On incoming connections, a process will detect
  whether the client supports the object serializer
  and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
  to determine whether the object serializer should be used
  to send data.

This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.

This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
  and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
  process it starts up.
2019-05-13 14:15:22 -07:00
mpilman
fe81454ec2 basic functionality for object serializer
This commit includes:
- The flatbuffers implementation
- A draft on how it should be used for network messages
- A serializer that can be used independently

What is missing:
- All root objects will need a file identifier
- Many special classes can not be serialized yet as the
  corresponding traits are not yet implemented
- Object serialization can not yet be turned on (this will
  need a network option)
2019-05-13 14:15:22 -07:00
Evan Tschannen
2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen
4fa1c008f9 Highly prioritize storageServerRejoin messages on the proxy, so that storage servers can rejoin the cluster even when a proxy is CPU saturated 2019-04-23 20:56:01 -07:00
Alex Miller
c25fac9421 Lower the priority of spilled peeks to below that of spilling.
This prevents peeking from starving the TLog of CPU time to spill, and
thus impacting write throughput at prolonged saturation.
2019-04-22 18:39:21 -07:00
mpilman
c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Evan Tschannen
aa368c08a2 changed NetworkAddress hash function to use more bytes from the IP address 2019-03-28 17:47:13 -07:00
Evan Tschannen
80ecb12190 change the IPv6 hash function to be more efficient 2019-03-28 14:07:46 -07:00
Evan Tschannen
34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen
6997075917 changed back to isV6addr instead of isV4addr for compatibility 2019-03-27 19:55:36 -07:00
Evan Tschannen
e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
A.J. Beamon
71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
Evan Tschannen
1fc6937802 changed NetworkAddressList to at most two addresses for performance 2019-03-23 17:54:46 -07:00
Alex Miller
7f5bc2981f Checksum DiskQueue pages on read, but at a lower priority.
If a server has its data spilled, then it's behind the 5s window.
Feeding it data is less important than committing, so we can hide the
extra CPU usage from checksumming the read amplified disk queue pages.
2019-03-15 21:01:19 -07:00
Jingyu Zhou
3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Vishesh Yadav
96ee95b9ad fix: macOS build #963
Use the boost representation of IPv6 address internally
and make sure it uses std::array.
2019-03-05 14:03:14 -08:00
Vishesh Yadav
592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
82b2da4b78 net: Add IPAddress::parse() util #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
25daabdc02 net: TraceEvent and toIPVectorString for new IPAddress structure #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Evan Tschannen
e9ddd94e27 The failure monitor is given a list of all IP addresses associated with a process
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Vishesh Yadav
42dffd4dff Take a vector of network addresses from CLI to start FDB server
Extends the CLI interface to take multiple public and listen addresses.
We however do not do anything with those extra addresses and just
consider the first one for now.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
e8e01b2406 Remove unused localAddress parameter from newNet2 and Net2 classes 2018-12-13 13:36:52 -08:00