foundationdb/documentation/sphinx/source/request-tracing.rst

110 lines
5.4 KiB
ReStructuredText

.. _request-tracing:
###############
Request Tracing
###############
The request tracing framework adds the ability to monitor transactions as they
move through FoundationDB. Tracing provides a detailed view into where
transactions spend time with data exported in near real-time, enabling fast
performance debugging. The FoundationDB tracing framework is based off the
`OpenTracing <https://opentracing.io/>`_ specification.
*Disambiguation:* :ref:`Trace files <administration-managing-trace-files>` are
local log files containing debug and error output from a local ``fdbserver``
binary. Request tracing produces similarly named *traces* which record the
amount of time a transaction spent in a part of the system. This document uses
the term tracing (or trace) to refer to these request traces, not local debug
information, unless otherwise specified.
*Note*: Full request tracing capability requires at least ``TLogVersion::V6``.
==============
Recording data
==============
The request tracing framework produces no data by default. To enable collection
of traces, specify the collection type using the ``--tracer`` command line
option for ``fdbserver`` and the ``DISTRIBUTED_CLIENT_TRACER`` :ref:`network
option <network-options-using-environment-variables>` for clients. Both client
and server must have the same trace value set to perform correctly.
========================= ===============
**Option** **Description**
------------------------- ---------------
none No tracing data is collected.
file, logfile, log_file Write tracing data to FDB trace files, specified with ``--logdir``.
network_lossy Send tracing data as UDP packets. Data is sent to ``localhost:8889``, but the default port can be changed by setting the ``TRACING_UDP_LISTENER_PORT`` knob. This option is useful if you have a log aggregation program to collect trace data.
========================= ===============
-----------
Data format
-----------
Spans are the building blocks of traces. A span represents an operation in the
life of a transaction, including the start and end timestamp and an operation.
A collection of spans make up a trace, representing a single transaction. The
tracing framework outputs individual spans, which can be reconstructed into
traces through their parent relationships.
Trace data sent as UDP packets when using the ``network_lossy`` option is
serialized using `MessagePack <https://msgpack.org>`_. To save on the amount of
data sent, spans are serialized as an array of length 8 (if the span has one or
more parents), or length 7 (if the span has no parents).
The fields of a span are specified below. The index at which the field appears
in the serialized msgpack array is also specified, for those using the UDP
collection format.
================== ========= ======== ===============
**Field** **Index** **Type** **Description**
------------------ --------- -------- ---------------
Source IP:port 0 string The IP and port of the machine where the span originated.
Trace ID 1 uint64 The 64-bit identifier of the trace. All spans in a trace share the same trace ID.
Span ID 2 uint64 The 64-bit identifier of the span. All spans have a unique identifier.
Start timestamp 3 double The timestamp when the operation represented by the span began.
Duration 4 double The duration in seconds of the operation represented by the span.
Operation name 5 string The name of the operation the span represents.
Tags 6 map User defined tags, added manually to specify additional information.
Parent span IDs 7 vector (Optional) A list of span IDs representing parents of this span.
================== ========= ======== ===============
^^^^^^^^^^^^^^^^^^^^^
Multiple parent spans
^^^^^^^^^^^^^^^^^^^^^
Unlike traditional distributed tracing frameworks, FoundationDB spans can have
multiple parents. Because many FDB transactions are batched into a single
transaction, to continue tracing the request, the batched transaction must
treat all its component transactions as parents.
---------------
Control options
---------------
In addition to the command line parameter described above, tracing can be set
at a database and transaction level.
Tracing can be controlled by setting the
``distributed_transaction_trace_sample_rate`` database option. Set this option
to a value between 0 and 1,000,000, representing a fraction of traces to
record. For example, to completely disable distributed tracing, set the value
to 0. To trace half of all transactions, set the value to 500,000. Note that
the value must be an integer. The initial distributed trace sample fraction is
set by the knob ``TRACING_SAMPLE_RATE``.
Tracing can be enabled or disabled for individual transactions. The special key
space exposes an API to set a custom trace ID for a transaction, or to disable
tracing for the transaction. See the special key space :ref:`tracing module
documentation <special-key-space-tracing-module>` to learn more.
^^^^^^^^^^^^^^
Trace sampling
^^^^^^^^^^^^^^
By default, all traces are recorded. If tracing is producing too much data,
adjust the trace sample rate with the ``TRACING_SAMPLE_RATE`` knob. Set the
knob to 0.0 to record no traces, to 1.0 to record all traces, or somewhere in
the middle. Traces are sampled as a unit. All individual spans in the trace
will be included in the sample.