LTTNG-CONCEPTS(7) | LTTng Manual | LTTNG-CONCEPTS(7) |
NAME¶
lttng-concepts - LTTng concepts
DESCRIPTION¶
This manual page documents the concepts of LTTng.
Many other LTTng manual pages refer to this one so that you can understand what are the various LTTng objects and how they relate to each other.
The concepts of LTTng 2.13.11 are:
INSTRUMENTATION POINT, EVENT RULE, AND EVENT¶
An instrumentation point is a point, within a piece of software, which, when executed, creates an LTTng event.
LTTng offers various types of instrumentation; see the “Instrumentation point types” section below to learn about them.
An event rule is a set of conditions to match a set of events.
When LTTng creates an event E, an event rule ER is said to match E when E satisfies all the conditions of ER. This concept is similar to a regular expression which matches a set of strings.
When an event rule matches an event, LTTng emits the event, therefore attempting to execute one or more actions.
Important
The event creation and emission processes are documentation concepts to help understand the journey from an instrumentation point to the execution of actions.
The actual creation of an event can be costly because LTTng needs to evaluate the arguments of the instrumentation point.
In practice, LTTng implements various optimizations for the Linux kernel and user space tracing domains (see the “TRACING DOMAIN” section below) to avoid actually creating an event when the tracer knows, thanks to properties which are independent from the event payload and current context, that it would never emit such an event. Those properties are:
In other words: if, for a given instrumentation point IP, the LTTng tracer knows that it would never emit an event, executing IP represents a simple boolean variable check and, for a Linux kernel recording event rule, a few process attribute checks.
As of LTTng 2.13.11, there are two places where you can find an event rule:
Recording event rule
See the “RECORDING EVENT RULE AND EVENT RECORD” section below.
Create or enable a recording event rule with the lttng-enable-event(1) command.
List the recording event rules of a specific recording session and/or channel with the lttng-list(1) and lttng-status(1) commands.
“Event rule matches” trigger condition (since LTTng 2.13)
See lttng-add-trigger(1) and lttng-event-rule(7).
For LTTng to emit an event E, E must satisfy all the basic conditions of an event rule ER, that is:
See the “Instrumentation point types” section below.
A recording event rule has additional, implicit conditions to satisfy. See the “RECORDING EVENT RULE AND EVENT RECORD” section below to learn more.
Instrumentation point types¶
As of LTTng 2.13.11, the available instrumentation point types are, depending on the tracing domain (see the “TRACING DOMAIN” section below):
Linux kernel
LTTng tracepoint
List the available Linux kernel tracepoints with lttng list --kernel. See lttng-list(1) to learn more.
Linux kernel system call
List the available Linux kernel system call instrumentation points with lttng list --kernel --syscall. See lttng-list(1) to learn more.
Linux kprobe
When you create such an instrumentation point, you set its memory address or symbol name.
Linux user space probe
When you create such an instrumentation point, you set:
With the ELF method
With the USDT method
“USDT” stands for SystemTap User-level Statically Defined Tracing, a DTrace-style marker.
As of LTTng 2.13.11, LTTng only supports USDT probes which are NOT reference-counted.
Linux kretprobe
When you create such an instrumentation point, you set the memory address or symbol name of its function.
User space
LTTng tracepoint
List the available Linux kernel tracepoints with lttng list --userspace. See lttng-list(1) to learn more.
java.util.logging, Apache log4j, and Python
Java or Python logging statement
List the available Java and Python loggers with lttng list --jul, lttng list --log4j, and lttng list --python. See lttng-list(1) to learn more.
TRIGGER¶
A trigger associates a condition to one or more actions.
When the condition of a trigger is satisfied, LTTng attempts to execute its actions.
As of LTTng 2.13.11, the available trigger conditions and actions are:
Conditions
As of LTTng 2.13.11, this is the only available condition when you add a trigger with the lttng-add-trigger(1) command. The other ones are available through the liblttng-ctl C API.
Actions
A trigger belongs to a session daemon (see lttng-sessiond(8)), not to a specific recording session. For a given session daemon, each Unix user has its own, private triggers. Note, however, that the root Unix user may, for the root session daemon:
For a given session daemon and Unix user, a trigger has a unique name.
Add a trigger to a session daemon with the lttng-add-trigger(1) command.
List the triggers of your Unix user (or of all users if your Unix user is root) with the lttng-list-triggers(1) command.
Remove a trigger with the lttng-remove-trigger(1) command.
RECORDING SESSION¶
A recording session (named “tracing session” prior to LTTng 2.13) is a stateful dialogue between you and a session daemon (see lttng-sessiond(8)) for everything related to event recording.
Everything that you do when you control LTTng tracers to record events happens within a recording session. In particular, a recording session:
An active recording session is an implicit recording event rule condition (see the “RECORDING EVENT RULE AND EVENT RECORD” section below).
See the “Recording session modes” section below to learn more.
Those attributes and objects are completely isolated between different recording sessions.
A recording session is like an ATM session: the operations you do on the banking system through the ATM don’t alter the data of other users of the same system. In the case of the ATM, a session lasts as long as your bank card is inside. In the case of LTTng, a recording session lasts from the lttng-create(1) command to the lttng-destroy(1) command.
A recording session belongs to a session daemon (see lttng-sessiond(8)). For a given session daemon, each Unix user has its own, private recording sessions. Note, however, that the root Unix user may operate on or destroy another user’s recording session.
Create a recording session with the lttng-create(1) command.
List the recording sessions of the connected session daemon with the lttng-list(1) command.
Start and stop a recording session with the lttng-start(1) and lttng-stop(1) commands.
Save and load a recording session with the lttng-save(1) and lttng-load(1) commands.
Archive the current trace chunk of (rotate) a recording session with the lttng-rotate(1) command.
Destroy a recording session with the lttng-destroy(1) command.
Current recording session¶
When you run the lttng-create(1) command, LTTng creates the $LTTNG_HOME/.lttngrc file if it doesn’t exist ($LTTNG_HOME defaults to $HOME).
$LTTNG_HOME/.lttngrc contains the name of the current recording session.
When you create a new recording session with the create command, LTTng updates the current recording session.
The following lttng(1) commands select the current recording session if you don’t specify one:
Set the current recording session manually with the lttng-set-session(1) command, without having to edit the .lttngrc file.
Recording session modes¶
LTTng offers four recording session modes:
Local mode
Network streaming mode
Snapshot mode
LTTng forces all the channels (see the “CHANNEL AND RING BUFFER” section below) to be created to be configured to be snapshot-ready.
LTTng takes a snapshot of such a recording session when:
Live mode
An LTTng live reader (for example, babeltrace2(1)) can connect to the same relay daemon to receive trace data while the recording session is active.
Recording session rotation¶
A recording session rotation is the action of archiving the current trace chunk of the recording session to the file system.
Once LTTng archives a trace chunk, it does NOT manage it anymore: you can read it, modify it, move it, or remove it.
An archived trace chunk is a collection of metadata and data stream files which form a self-contained LTTng trace. See the “Trace chunk naming” section below to learn how LTTng names a trace chunk archive directory.
The current trace chunk of a given recording session includes:
Trace chunk archive naming¶
A trace chunk archive is a subdirectory of the archives subdirectory within the output directory of a recording session (see the --output option of the lttng-create(1) command and of lttng-relayd(8)).
A trace chunk archive contains, through tracing domain and possibly UID/PID subdirectories, metadata and data stream files.
A trace chunk archive is, at the same time:
In other words, an LTTng trace reader can read both the recording session output directory (all the trace chunk archives), or a single trace chunk archive.
When LTTng performs a recording session rotation, it names the resulting trace chunk archive as such, relative to the output directory of the recording session:
archives/BEGIN-END-ID
BEGIN
Example: 20171119T152407-0500
END
Example: 20180118T152407+0930
ID
Trace chunk archive name example:
archives/20171119T152407-0500-20171119T151422-0500-3
TRACING DOMAIN¶
A tracing domain identifies a type of LTTng tracer.
A tracing domain has its own properties and features.
There are currently five available tracing domains:
Tracing domain | “Event rule matches” trigger condition option | Option for other CLI commands |
Linux kernel | --type option starts with kernel: | --kernel |
User space | --type option starts with user: | --userspace |
java.util.logging (JUL) | --type option starts with jul: | --jul |
Apache log4j | --type option starts with log4j: | --log4j |
Python | --type option starts with python: | --python |
You must specify a tracing domain to target a type of LTTng tracer when using some lttng(1) commands to avoid ambiguity. For example, because the Linux kernel and user space tracing domains support named tracepoints as instrumentation points (see the “INSTRUMENTATION POINT, EVENT RULE, AND EVENT” section above), you need to specify a tracing domain when you create an event rule because both tracing domains could have tracepoints sharing the same name.
You can create channels (see the “CHANNEL AND RING BUFFER” section below) in the Linux kernel and user space tracing domains. The other tracing domains have a single, default channel.
CHANNEL AND RING BUFFER¶
A channel is an object which is responsible for a set of ring buffers.
Each ring buffer is divided into multiple sub-buffers. When a recording event rule (see the “RECORDING EVENT RULE AND EVENT RECORD” section below) matches an event, LTTng can record it to one or more sub-buffers of one or more channels.
When you create a channel with the lttng-enable-channel(1) command, you set its final attributes, that is:
See the “Buffering scheme” section below.
See the “Event record loss mode” section below.
See the “Sub-buffer size and count” section below.
See the “Maximum trace file size and count” section below.
See the “Timers” section below.
See the --output option of the lttng-enable-channel(1) command.
See the --blocking-timeout option of the lttng-enable-channel(1) command.
Note that the lttng-enable-event(1) command can automatically create a default channel with sane defaults when no channel exists for the provided tracing domain.
A channel is always associated to a tracing domain (see the “TRACING DOMAIN” section below). The java.util.logging (JUL), log4j, and Python tracing domains each have a default channel which you can’t configure.
A channel owns recording event rules.
List the channels of a given recording session with the lttng-list(1) and lttng-status(1) commands.
Disable an enabled channel with the lttng-disable-channel(1) command.
Buffering scheme¶
A channel has at least one ring buffer per CPU. LTTng always records an event to the ring buffer dedicated to the CPU which emits it.
The buffering scheme of a user space channel determines what has its own set of per-CPU ring buffers:
Per-user buffering (--buffers-uid option of the lttng-enable-channel(1) command)
If your Unix user is root
Otherwise
Per-process buffering (--buffers-pid option of the lttng-enable-channel(1) command)
If your Unix user is root
Otherwise
The per-process buffering scheme tends to consume more memory than the per-user option because systems generally have more instrumented processes than Unix users running instrumented processes. However, the per-process buffering scheme ensures that one process having a high event throughput won’t fill all the shared sub-buffers of the same Unix user, only its own.
The buffering scheme of a Linux kernel channel is always to allocate a single set of ring buffers for the whole system. This scheme is similar to the per-user option, but with a single, global user “running” the kernel.
Event record loss mode¶
When LTTng emits an event, LTTng can record it to a specific, available sub-buffer within the ring buffers of specific channels. When there’s no space left in a sub-buffer, the tracer marks it as consumable and another, available sub-buffer starts receiving the following event records. An LTTng consumer daemon eventually consumes the marked sub-buffer, which returns to the available state.
In an ideal world, sub-buffers are consumed faster than they are filled. In the real world, however, all sub-buffers can be full at some point, leaving no space to record the following events.
By default, LTTng-modules and LTTng-UST are non-blocking tracers: when there’s no available sub-buffer to record an event, it’s acceptable to lose event records when the alternative would be to cause substantial delays in the execution of the instrumented application. LTTng privileges performance over integrity; it aims at perturbing the instrumented application as little as possible in order to make the detection of subtle race conditions and rare interrupt cascades possible.
Since LTTng 2.10, the LTTng user space tracer, LTTng-UST, supports a blocking mode. See the --blocking-timeout of the lttng-enable-channel(1) command to learn how to use the blocking mode.
When it comes to losing event records because there’s no available sub-buffer, or because the blocking timeout of the channel is reached, the event record loss mode of the channel determines what to do. The available event record loss modes are:
Discard mode
This is the only available mode when you specify a blocking timeout.
With this mode, LTTng increments a count of lost event records when an event record is lost and saves this count to the trace. A trace reader can use the saved discarded event record count of the trace to decide whether or not to perform some analysis even if trace data is known to be missing.
Overwrite mode
This mode is sometimes called flight recorder mode because it’s similar to a flight recorder <https://en.wikipedia.org/wiki/Flight_recorder>: always keep a fixed amount of the latest data. It’s also similar to the roll mode of an oscilloscope.
Since LTTng 2.8, with this mode, LTTng writes to a given sub-buffer its sequence number within its data stream. With a local, network streaming, or live recording session (see the “Recording session modes” section above), a trace reader can use such sequence numbers to report lost packets. A trace reader can use the saved discarded sub-buffer (packet) count of the trace to decide whether or not to perform some analysis even if trace data is known to be missing.
With this mode, LTTng doesn’t write to the trace the exact number of lost event records in the lost sub-buffers.
Which mechanism you should choose depends on your context: prioritize the newest or the oldest event records in the ring buffer?
Beware that, in overwrite mode, the tracer abandons a whole sub-buffer as soon as a there’s no space left for a new event record, whereas in discard mode, the tracer only discards the event record that doesn’t fit.
Set the event record loss mode of a channel with the --discard and --overwrite options of the lttng-enable-channel(1) command.
There are a few ways to decrease your probability of losing event records. The “Sub-buffer size and count” section below shows how to fine-tune the sub-buffer size and count of a channel to virtually stop losing event records, though at the cost of greater memory usage.
Sub-buffer size and count¶
A channel has one or more ring buffer for each CPU of the target system.
See the “Buffering scheme” section above to learn how many ring buffers of a given channel are dedicated to each CPU depending on its buffering scheme.
Set the size of each sub-buffer the ring buffers of a channel contain with the --subbuf-size option of the lttng-enable-channel(1) command.
Set the number of sub-buffers each ring buffer of a channel contains with the --num-subbuf option of the lttng-enable-channel(1) command.
Note that LTTng switching the current sub-buffer of a ring buffer (marking a full one as consumable and switching to an available one for LTTng to record the next events) introduces noticeable CPU overhead. Knowing this, the following list presents a few practical situations along with how to configure the sub-buffer size and count for them:
High event throughput
Having larger sub-buffers also ensures a lower sub-buffer switching frequency (see the “Timers” section below).
The sub-buffer count is only meaningful if you create the channel in overwrite mode (see the “Event record loss mode” section above): in this case, if LTTng overwrites a sub-buffer, then the other sub-buffers are left unaltered.
Low event throughput
Because LTTng emits events less frequently, the sub-buffer switching frequency should remain low and therefore the overhead of the tracer shouldn’t be a problem.
Low memory system
Even if the system is limited in memory, you want to keep the sub-buffers as large as possible to avoid a high sub-buffer switching frequency.
Note that LTTng uses CTF <https://diamon.org/ctf/> as its trace format, which means event record data is very compact. For example, the average LTTng kernel event record weights about 32 bytes. Therefore, a sub-buffer size of 1 MiB is considered large.
The previous scenarios highlight the major trade-off between a few large sub-buffers and more, smaller sub-buffers: sub-buffer switching frequency vs. how many event records are lost in overwrite mode. Assuming a constant event throughput and using the overwrite mode, the two following configurations have the same ring buffer total size:
Two sub-buffers of 4 MiB each
Eight sub-buffers of 1 MiB each
In discard mode, the sub-buffer count parameter is pointless: use two sub-buffers and set their size according to your requirements.
Maximum trace file size and count¶
By default, trace files can grow as large as needed.
Set the maximum size of each trace file that LTTng writes of a given channel with the --tracefile-size option of the lttng-enable-channel(1) command.
When the size of a trace file reaches the fixed maximum size of the channel, LTTng creates another file to contain the next event records. LTTng appends a file count to each trace file name in this case.
If you set the trace file size attribute when you create a channel, the maximum number of trace files that LTTng creates is unlimited by default. To limit them, use the --tracefile-count option of lttng-enable-channel(1). When the number of trace files reaches the fixed maximum count of the channel, LTTng overwrites the oldest trace file. This mechanism is called trace file rotation.
Important
Even if you don’t limit the trace file count, always assume that LTTng manages all the trace files of the recording session.
In other words, there’s no safe way to know if LTTng still holds a given trace file open with the trace file rotation feature.
The only way to obtain an unmanaged, self-contained LTTng trace before you destroy the recording session is with the recording session rotation feature (see the “Recording session rotation” section above), which is available since LTTng 2.11.
Timers¶
Each channel can have up to three optional timers:
Switch timer
A switch timer is useful to ensure that LTTng consumes and commits trace data to trace files or to a distant relay daemon (lttng-relayd(8)) periodically in case of a low event throughput.
Such a timer is also convenient when you use large sub-buffers (see the “Sub-buffer size and count” section above) to cope with a sporadic high event throughput, even if the throughput is otherwise low.
Set the period of the switch timer of a channel, or disable the timer altogether, with the --switch-timer option of the lttng-enable-channel(1) command.
Read timer
By default, the LTTng tracers use an asynchronous message mechanism to signal a full sub-buffer so that a consumer daemon can consume it.
When such messages must be avoided, for example in real-time applications, use this timer instead.
Set the period of the read timer of a channel, or disable the timer altogether, with the --read-timer option of the lttng-enable-channel(1) command.
Monitor timer
If you disable the monitor timer of a channel C:
See the “TRIGGER” section above to learn more about triggers.
Set the period of the monitor timer of a channel, or disable the timer altogether, with the --monitor-timer option of the lttng-enable-channel(1) command.
RECORDING EVENT RULE AND EVENT RECORD¶
A recording event rule is a specific type of event rule (see the “INSTRUMENTATION POINT, EVENT RULE, AND EVENT” section above) of which the action is to serialize and record the matched event as an event record.
Set the explicit conditions of a recording event rule when you create it with the lttng-enable-event(1) command. A recording event rule also has the following implicit conditions:
A recording event rule is enabled on creation.
A channel is enabled on creation.
See the “CHANNEL AND RING BUFFER” section above.
A recording session is inactive (stopped) on creation.
See the “RECORDING SESSION” section above.
All processes are allowed to record events on recording session creation.
Use the lttng-track(1) and lttng-untrack(1) commands to select which processes are allowed to record events based on specific process attributes.
You always attach a recording event rule to a channel, which belongs to a recording session, when you create it.
When a recording event rule ER matches an event E, LTTng attempts to serialize and record E to one of the available sub-buffers of the channel to which E is attached.
When multiple matching recording event rules are attached to the same channel, LTTng attempts to serialize and record the matched event once. In the following example, the second recording event rule is redundant when both are enabled:
$ lttng enable-event --userspace hello:world $ lttng enable-event --userspace hello:world --loglevel=INFO
List the recording event rules of a specific recording session and/or channel with the lttng-list(1) and lttng-status(1) commands.
Disable a recording event rule with the lttng-disable-event(1) command.
As of LTTng 2.13.11, you cannot remove a recording event rule: it exists as long as its recording session exists.
RESOURCES¶
COPYRIGHT¶
This program is part of the LTTng-tools project.
LTTng-tools is distributed under the GNU General Public License version 2 <http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html>. See the LICENSE <https://github.com/lttng/lttng-tools/blob/master/LICENSE> file for details.
THANKS¶
Special thanks to Michel Dagenais and the DORSAL laboratory <http://www.dorsal.polymtl.ca/> at École Polytechnique de Montréal for the LTTng journey.
Also thanks to the Ericsson teams working on tracing which helped us greatly with detailed bug reports and unusual test cases.
SEE ALSO¶
14 June 2021 | LTTng 2.13.11 |