134 Commits

Author SHA1 Message Date
Daniel Smith
179dea5a1b Name the RocksDB background threads 2021-03-01 20:35:55 +00:00
Steve Atherton
f4c9b88908 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	cmake/CompileBoost.cmake
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Error.cpp
2021-02-15 02:05:03 -08:00
Andrew Noyes
4e184fe236 Fix memory errors 2021-02-11 02:58:21 +00:00
Andrew Noyes
dc2bac5670 Resolve conflicts 2020-11-24 19:09:42 +00:00
Andrew Noyes
1f541f02be Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
b1256b5dcd Address review comments, simplify DateTime 2020-11-18 16:55:47 -08:00
David Youngworth
5ade54f767 Fix windows build, DateTime to use UTC 2020-11-18 16:55:12 -08:00
David Youngworth
490fe61032 Fix bug in rolled Trace code 2020-11-18 16:55:06 -08:00
David Youngworth
50e515c29a Add DateTime to trace, initial commit 2020-11-18 16:54:49 -08:00
David Youngworth
d0391db862 Merge branch 'release-6.2' into release-6.3 2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard
392f18a2de Fix retrieveTraceLogIssues function name 2020-11-04 22:39:56 -08:00
Russell Sears
32c87bbb33 Lightweight, power of two spaced histogram implementation + automatic reporting 2020-11-02 11:13:16 -08:00
Daniel Smith
2243ee0033
s/NULL/nullptr/
Co-authored-by: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-06-11 12:06:16 -04:00
Daniel Smith
8088bf7212 Move flush call onto main thread to make TraceBatch not thread hostile 2020-06-11 15:53:29 +00:00
A.J. Beamon
47477382b1 Fix invalid trace event detail named "Type" 2020-05-07 11:22:32 -07:00
Evan Tschannen
a3598a7616
Merge pull request #2738 from ajbeamon/fix-assertion-failure-on-io-error
Fix assertion failure in SQLite thread pools on io_error
2020-04-14 16:48:22 -07:00
A.J. Beamon
c851ee4031
Merge pull request #2897 from tclinken/fix-trace-batch-loggroup-and-role
Annotate trace batch events before dumping
2020-04-13 11:22:51 -07:00
tclinken
8ef5a04896 Guard all of annotateEvent with mutex 2020-04-10 13:03:15 -07:00
tclinken
01285f3374 Delay annotation of trace batch events created before trace file is opened 2020-04-09 14:09:00 -07:00
tclinken
10fee8fafc Annotate trace batch events before dumping 2020-04-02 19:34:02 -07:00
Xin Dong
6820167d77
Merge branch 'master' into feature/1689/allow-custome-trace-log-file-identifier 2020-03-31 16:50:46 -07:00
Xin Dong
2805111a32 When provided with a custome identifier, use that string instead of the port/PID as the last part of the baseName. 2020-03-31 11:02:02 -07:00
Xin Dong
03e2102a21 Fix macOS build failure. 2020-03-26 11:41:36 -07:00
Xin Dong
a0177a9335 Allow the user to provide a custome trace log file identifier that will be used as the prefix of all trace log files created at the client side. 2020-03-26 11:25:05 -07:00
tclinken
baf0fe956c Take trace mutex in setLogGroup 2020-03-26 09:55:03 -07:00
tclinken
7d5ed53215 Allow trace log group to be set after database is created 2020-03-25 13:40:43 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen
e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Xin Dong
31a9f0a26c Fix the segfault 2020-03-17 11:03:46 -07:00
A.J. Beamon
f1523bd472 Setting the network thread more than once is a no-op 2020-03-16 15:37:06 -07:00
A.J. Beamon
7769218303 Move an increment after an ASSERT. 2020-03-16 14:11:07 -07:00
A.J. Beamon
d8cfabe73b Extend the allocation tracing disabling flag to cover more parts of trace logging as a precaution. Make it possible to disable via knob. 2020-03-16 13:59:31 -07:00
Xin Dong
89861c661e Fix the random crash. Use a thread safe 'ThreadReturnPromise' instead of the ThreadFuture. 2020-03-16 13:36:55 -07:00
Xin Dong
5967ef5eab Added back the changes that report trace log flush failures and fix the random crash 2020-03-12 14:34:19 -07:00
A.J. Beamon
2466749648 Don't disallow allocation tracking when a trace event is open because we now have state trace events. Instead, only block allocation tracking while we are in the middle of allocation tracking already to prevent recursion. 2020-03-12 11:17:49 -07:00
Evan Tschannen
303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Xin Dong
39610d15f8 Revert this change since it somehow introduced a random crash detected on circus 2020-03-04 16:14:38 -08:00
Xin Dong
16575ae94d Address review comments 2020-02-27 11:54:15 -08:00
Xin Dong
4ac7b36e44 Added back the mutex holder that was removed accidentally 2020-02-27 10:19:17 -08:00
Xin Dong
7b51ab6b63 Rebased with master 2020-02-25 15:43:33 -08:00
Xin Dong
f20619c9fb Resolve review comments. Changed how issues got cleared 2020-02-25 15:39:51 -08:00
Xin Dong
3f24ae93f2 Remove the unused variable 2020-02-25 15:39:38 -08:00
Xin Dong
090c89e90a Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request. 2020-02-25 15:39:38 -08:00
Xin Dong
288e95c7e1 Reallocate the issues set after each get. Changed an issues name to be accurate 2020-02-25 15:39:09 -08:00
Xin Dong
1c346fcfb0 Added the new issues into Status Schema. Remove the issue reporting in lastError since:
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.

Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Xin Dong
f4f860bfa8 Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe. 2020-02-25 15:38:14 -08:00
Xin Dong
a6580dc15f Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)' 2020-02-25 15:37:53 -08:00
Xin Dong
0b0414fb94 Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way. 2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42 Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
    - A successful flush will reset the accumulated counter.
    Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00