Daniel Smith
179dea5a1b
Name the RocksDB background threads
2021-03-01 20:35:55 +00:00
Steve Atherton
f4c9b88908
Merge branch 'release-6.2' into release-6.3
...
# Conflicts:
# cmake/CompileBoost.cmake
# fdbserver/DataDistribution.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Error.cpp
2021-02-15 02:05:03 -08:00
Andrew Noyes
4e184fe236
Fix memory errors
2021-02-11 02:58:21 +00:00
Andrew Noyes
dc2bac5670
Resolve conflicts
2020-11-24 19:09:42 +00:00
Andrew Noyes
1f541f02be
Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
...
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
b1256b5dcd
Address review comments, simplify DateTime
2020-11-18 16:55:47 -08:00
David Youngworth
5ade54f767
Fix windows build, DateTime to use UTC
2020-11-18 16:55:12 -08:00
David Youngworth
490fe61032
Fix bug in rolled Trace code
2020-11-18 16:55:06 -08:00
David Youngworth
50e515c29a
Add DateTime to trace, initial commit
2020-11-18 16:54:49 -08:00
David Youngworth
d0391db862
Merge branch 'release-6.2' into release-6.3
2020-11-16 10:15:23 -08:00
sfc-gh-tclinkenbeard
392f18a2de
Fix retrieveTraceLogIssues function name
2020-11-04 22:39:56 -08:00
Russell Sears
32c87bbb33
Lightweight, power of two spaced histogram implementation + automatic reporting
2020-11-02 11:13:16 -08:00
Daniel Smith
2243ee0033
s/NULL/nullptr/
...
Co-authored-by: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-06-11 12:06:16 -04:00
Daniel Smith
8088bf7212
Move flush call onto main thread to make TraceBatch not thread hostile
2020-06-11 15:53:29 +00:00
A.J. Beamon
47477382b1
Fix invalid trace event detail named "Type"
2020-05-07 11:22:32 -07:00
Evan Tschannen
a3598a7616
Merge pull request #2738 from ajbeamon/fix-assertion-failure-on-io-error
...
Fix assertion failure in SQLite thread pools on io_error
2020-04-14 16:48:22 -07:00
A.J. Beamon
c851ee4031
Merge pull request #2897 from tclinken/fix-trace-batch-loggroup-and-role
...
Annotate trace batch events before dumping
2020-04-13 11:22:51 -07:00
tclinken
8ef5a04896
Guard all of annotateEvent with mutex
2020-04-10 13:03:15 -07:00
tclinken
01285f3374
Delay annotation of trace batch events created before trace file is opened
2020-04-09 14:09:00 -07:00
tclinken
10fee8fafc
Annotate trace batch events before dumping
2020-04-02 19:34:02 -07:00
Xin Dong
6820167d77
Merge branch 'master' into feature/1689/allow-custome-trace-log-file-identifier
2020-03-31 16:50:46 -07:00
Xin Dong
2805111a32
When provided with a custome identifier, use that string instead of the port/PID as the last part of the baseName.
2020-03-31 11:02:02 -07:00
Xin Dong
03e2102a21
Fix macOS build failure.
2020-03-26 11:41:36 -07:00
Xin Dong
a0177a9335
Allow the user to provide a custome trace log file identifier that will be used as the prefix of all trace log files created at the client side.
2020-03-26 11:25:05 -07:00
tclinken
baf0fe956c
Take trace mutex in setLogGroup
2020-03-26 09:55:03 -07:00
tclinken
7d5ed53215
Allow trace log group to be set after database is created
2020-03-25 13:40:43 -07:00
Balachandar Namasivayam
58a9bfa78b
Merge pull request #2820 from dongxinEric/fix/1977/add-back-trace-event-flush-failure-report
...
Fix/1977/add back trace event flush failure report
2020-03-18 16:11:44 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Xin Dong
31a9f0a26c
Fix the segfault
2020-03-17 11:03:46 -07:00
A.J. Beamon
f1523bd472
Setting the network thread more than once is a no-op
2020-03-16 15:37:06 -07:00
A.J. Beamon
7769218303
Move an increment after an ASSERT.
2020-03-16 14:11:07 -07:00
A.J. Beamon
d8cfabe73b
Extend the allocation tracing disabling flag to cover more parts of trace logging as a precaution. Make it possible to disable via knob.
2020-03-16 13:59:31 -07:00
Xin Dong
89861c661e
Fix the random crash. Use a thread safe 'ThreadReturnPromise' instead of the ThreadFuture.
2020-03-16 13:36:55 -07:00
Xin Dong
5967ef5eab
Added back the changes that report trace log flush failures and fix the random crash
2020-03-12 14:34:19 -07:00
A.J. Beamon
2466749648
Don't disallow allocation tracking when a trace event is open because we now have state trace events. Instead, only block allocation tracking while we are in the middle of allocation tracking already to prevent recursion.
2020-03-12 11:17:49 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
1128666840
added additional logging on the log router
2020-03-05 18:17:06 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
Xin Dong
16575ae94d
Address review comments
2020-02-27 11:54:15 -08:00
Xin Dong
4ac7b36e44
Added back the mutex holder that was removed accidentally
2020-02-27 10:19:17 -08:00
Xin Dong
7b51ab6b63
Rebased with master
2020-02-25 15:43:33 -08:00
Xin Dong
f20619c9fb
Resolve review comments. Changed how issues got cleared
2020-02-25 15:39:51 -08:00
Xin Dong
3f24ae93f2
Remove the unused variable
2020-02-25 15:39:38 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
288e95c7e1
Reallocate the issues set after each get. Changed an issues name to be accurate
2020-02-25 15:39:09 -08:00
Xin Dong
1c346fcfb0
Added the new issues into Status Schema. Remove the issue reporting in lastError since:
...
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.
Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
Xin Dong
f4f860bfa8
Changed issue reporting to be thread safe. Also changed the liveness ping to be thread safe.
2020-02-25 15:38:14 -08:00
Xin Dong
a6580dc15f
Added the ability to ping a trace log writer thread and the monitoring in worker.actor.cpp. The current solution is simple a loose check. We can change this to be accurate check by using 'pthread_kill(writer_thread, 0)'
2020-02-25 15:37:53 -08:00
Xin Dong
0b0414fb94
Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way.
2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42
Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
...
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
- A successful flush will reset the accumulated counter.
Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00