190 Commits

Author SHA1 Message Date
Xin Dong
288e95c7e1 Reallocate the issues set after each get. Changed an issues name to be accurate 2020-02-25 15:39:09 -08:00
Xin Dong
1c346fcfb0 Added the new issues into Status Schema. Remove the issue reporting in lastError since:
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.

Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Xin Dong
f4f860bfa8 Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe. 2020-02-25 15:38:14 -08:00
Xin Dong
a6580dc15f Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)' 2020-02-25 15:37:53 -08:00
Xin Dong
0b0414fb94 Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way. 2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42 Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
    - A successful flush will reset the accumulated counter.
    Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
A.J. Beamon
1c6aef76b5 When one of the sqlite reader or writer thread pools fail, fail the other with the same error. 2020-02-24 12:39:04 -08:00
Alvin Moore
0f64505d0b Merge branch 'release-6.2' of github.com:apple/foundationdb
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
A.J. Beamon
dfa5f76c01 Remove unused parameter. Don't put check for g_network presence in ASSERT_WE_THINK. 2020-02-21 16:28:03 -08:00
A.J. Beamon
2431d4d788 Always compute the time for a trace event when it is being logged rather than when it is being created. Usually these are the same, but if they aren't, doing the opposite can lead to out of order trace events. 2020-02-21 13:57:04 -08:00
A.J. Beamon
6810a03283 Add more logging to valley filler and mountain chopper 2020-02-21 10:55:14 -08:00
Paul J. Davis
32e285a761 Add network option for the trace clock source
This option allows clients to select the clock source for trace events
similar to the `--traceclock` command line parameter for `fdbserver`.
Using the `realtime` clock sources makes loading event data into
OpenTracing systems like Jaeger more useful.
2020-02-15 11:30:43 -06:00
Evan Tschannen
a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
Evan Tschannen
16b5af067c changed trace event name 2020-01-03 16:03:29 -08:00
Evan Tschannen
deb032745a fix: do not set logged until then end of the function 2020-01-03 12:45:23 -08:00
Evan Tschannen
1867d30017 added asserts to protect against future actions on a trace event that has been logged 2020-01-03 12:31:06 -08:00
Evan Tschannen
7152469cc3 log the base trace event before the endpoint messages 2020-01-03 12:15:38 -08:00
A.J. Beamon
9866d1ce27 Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time. 2019-12-06 10:15:49 -08:00
A.J. Beamon
b5d2234a13 Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
#	flow/FastAlloc.h
#	versions.target
2019-07-30 16:23:42 -07:00
A.J. Beamon
e98cee016d Fix unsafe usage of now() function from multiple threads in trace logging. 2019-07-22 22:31:38 -07:00
A.J. Beamon
d981de18e4 Restrict huge arena sampling to the network thread. Revert removal of thread_local definitions. 2019-07-17 16:23:17 -07:00
A.J. Beamon
d5051b08dd Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated. 2019-07-12 16:12:35 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
f417e60264 Merge branch 'merge-release-6.1-into-master' into thread-safe-random-number-generation
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
2019-05-23 09:52:00 -07:00
A.J. Beamon
d29c7e4c9b Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/QuietDatabase.actor.cpp
#	versions.target
2019-05-23 09:28:45 -07:00
A.J. Beamon
603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
A.J. Beamon
d3c1f9afbd Suppress 'TraceEventFieldNotFound' event. Don't suppress simultaneous FDBLibTLSVerifyFailure events (this requires including flow.h and removing the copied version of Reference/ReferenceCounted in FDBLibTLS). 2019-05-20 15:12:28 -07:00
A.J. Beamon
ab4618c3f9 Don't access TraceEvent::eventCounts outside the network thread. Make net2liveness an atomic counter. 2019-05-10 12:35:11 -10:00
A.J. Beamon
5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen
22499666d0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/LogRouter.actor.cpp
#	flow/Trace.cpp
#	versions.target
2019-05-08 18:19:35 -07:00
Evan Tschannen
aefd68e1e7 The memory tracking trace events can crash if the memory that is being allocated is coming from a trace event itself. Specifically TDMetrics within trace events uses fast allocated memory. Trace events are supposed to be short lived, so this commits adds a global count of outstanding trace events, and disables memory tracking when a trace event exists. 2019-05-06 17:41:32 -07:00
Andrew Noyes
6c7fd1593f Avoid potential index-out-of-bounds 2019-04-15 15:20:09 -07:00
Andrew Noyes
dced9232d0 ASSERT_WE_THINK trailing byte is '\0' 2019-04-15 09:43:26 -07:00
mpilman
d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman
8120a33bda Fix >-direction when initializing TraceEvent 2019-04-05 13:12:19 -07:00
mpilman
c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Jingyu Zhou
38c6681349 Fix some signed and unsigned mismatch warnings. 2019-03-26 14:54:11 -07:00
Vishesh Yadav
c37291a366 fix: Use '_' instead of ':' in IPv6 tracefile names
':' is not acceptable on Windows. Reason to choose '_' instead of '.'
is to differentiate b/w IPv4 and IPv6 easily, and also '..1' in
filename looks weirder than '__1', which would happen with shortened
IPv6 addresses. And non-shortened IPv6 addreses are just lots of 0s.
2019-03-20 14:00:33 -07:00
Vishesh Yadav
592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
25daabdc02 net: TraceEvent and toIPVectorString for new IPAddress structure #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
A.J. Beamon
655c9d82c7 Various cleanup from review 2019-03-01 14:06:47 -08:00
A.J. Beamon
eb629d87a5 Add information about batch ratekeeper to status. Make it possible to track latencies in the ReadWrite workload for concurrently run instances separately. 2019-02-28 09:53:16 -08:00
Andrew Noyes
e055fdab03 Better diagnostics for unrecognized trace format 2019-01-29 13:08:47 -08:00
Andrew Noyes
3f399dd9be Indent within anonymous namespace 2019-01-03 09:21:09 -08:00
Andrew Noyes
80dd1efaf6 Assign to g_traceLog.formatter 2019-01-02 16:42:00 -08:00
anoyes
b8df5acc15 Add --trace_format flag to fdbserver 2018-12-20 15:02:01 -08:00
anoyes
33a4aac249 Fix build 2018-12-20 11:55:59 -08:00
Markus Pilman
cdc7e83993
Update flow/Trace.cpp
Co-Authored-By: alexmiller-apple <35046903+alexmiller-apple@users.noreply.github.com>
2018-12-14 11:30:40 -08:00
Markus Pilman
40890e9dbe Implemented a json log formatter 2018-12-13 21:46:02 -08:00