stapprobes
STAPPROBES(5) STAPPROBES(5)
NAME
stapprobes - systemtap probe points
DESCRIPTION
The following sections enumerate the variety of probe points supported
by the systemtap translator, and additional aliases defined by stan-
dard tapset scripts.
The general probe point syntax is a dotted-symbol sequence. This
allows a breakdown of the event namespace into parts, somewhat like
the Domain Name System does on the Internet. Each component identi-
fier may be parametrized by a string or number literal, with a syntax
like a function call. A component may include a "*" character, to
expand to a set of matching probe points. Probe aliases likewise
expand to other probe points. Each and every resulting probe point is
normally resolved to some low-level system instrumentation facility
(e.g., a kprobe address, marker, or a timer configuration), otherwise
the elaboration phase will fail.
However, a probe point may be followed by a "?" character, to indicate
that it is optional, and that no error should result if it fails to
resolve. Optionalness passes down through all levels of alias/wild-
card expansion. Alternately, a probe point may be followed by a "!"
character, to indicate that it is both optional and sufficient.
(Think vaguely of the prolog cut operator.) If it does resolve, then
no further probe points in the same comma-separated list will be
resolved. Therefore, the "!" sufficiency mark only makes sense in a
list of probe point alternatives.
Additionally, a probe point may be followed by a "if (expr)" state-
ment, in order to enable/disable the probe point on-the-fly. With the
"if" statement, if the "expr" is false when the probe point is hit,
the whole probe body including alias’s body is skipped. The condition
is stacked up through all levels of alias/wildcard expansion. So the
final condition becomes the logical-and of conditions of all expanded
alias/wildcard.
These are all syntactically valid probe points:
kernel.function("foo").return
syscall(22)
user.inode("/bin/vi").statement(0x2222)
end
syscall.*
kernel.function("no_such_function") ?
module("awol").function("no_such_function") !
signal.*? if (switch)
Probes may be broadly classified into "synchronous" and "asyn-
chronous". A "synchronous" event is deemed to occur when any proces-
sor executes an instruction matched by the specification. This gives
these probes a reference point (instruction address) from which more
contextual data may be available. Other families of probe points re-
fer to "asynchronous" events such as timers/counters rolling over,
where there is no fixed reference point that is related. Each probe
point specification may match multiple locations (for example, using
wildcards or aliases), and all them are then probed. A probe declara-
tion may also contain several comma-separated specifications, all of
which are probed.
BEGIN/END/ERROR
The probe points begin and end are defined by the translator to refer
to the time of session startup and shutdown. All "begin" probe han-
dlers are run, in some sequence, during the startup of the session.
All global variables will have been initialized prior to this point.
All "end" probes are run, in some sequence, during the normal shutdown
of a session, such as in the aftermath of an exit () function call, or
an interruption from the user. In the case of an error-triggered
shutdown, "end" probes are not run. There are no target variables
available in either context.
If the order of execution among "begin" or "end" probes is signifi-
cant, then an optional sequence number may be provided:
begin(N)
end(N)
The number N may be positive or negative. The probe handlers are run
in increasing order, and the order between handlers with the same se-
quence number is unspecified. When "begin" or "end" are given without
a sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that each
such probe handler run when the session ends after errors have oc-
curred. In such cases, "end" probes are skipped, but each "error"
prober is still attempted. This kind of probe can be used to clean up
or emit a "final gasp". It may also be numerically parametrized to
set a sequence.
NEVER
The probe point never is specially defined by the translator to mean
"never". Its probe handler is never run, though its statements are
analyzed for symbol / type correctness as usual. This probe point may
be useful in conjunction with optional probes.
TIMERS
Intervals defined by the standard kernel "jiffies" timer may be used
to trigger probe handlers asynchronously. Two probe point variants
are supported by the translator:
timer.jiffies(N)
timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit of
time, typically between 1 and 60 ms). If the "randomize" component is
given, a linearly distributed random value in the range [-M..+M] is
added to N every time the handler is run. N is restricted to a rea-
sonable range (1 to around a million), and M is restricted to be
smaller than N. There are no target variables provided in either con-
text. It is possible for such probes to be run concurrently on a mul-
ti-processor computer.
Alternatively, intervals may be specified in units of time. There are
two probe point variants similar to the jiffies timer:
timer.ms(N)
timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options for
units are seconds (s/sec), milliseconds (ms/msec), microseconds
(us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is
not supported for hertz timers.
The actual resolution of the timers depends on the target kernel. For
kernels prior to 2.6.17, timers are limited to jiffies resolution, so
intervals are rounded up to the nearest jiffies interval. After
2.6.17, the implementation uses hrtimers for tighter precision, though
the actual resolution will be arch-dependent. In either case, if the
"randomize" component is given, then the random value will be added to
the interval before any rounding occurs.
Profiling timers are also available to provide probes that execute on
all CPUs at the rate of the system tick. This probe takes no parame-
ters.
timer.profile
Full context information of the interrupted process is available, mak-
ing this probe suitable for a time-based sampling profiler.
DWARF
This family of probe points uses symbolic debugging information for
the target kernel/module/program, as may be found in unstripped exe-
cutables, or the separate debuginfo packages. They allow placement of
probes logically into the execution path of the target program, by
specifying a set of points in the source or object code. When a
matching statement executes on any processor, the probe handler is run
in that context.
Points in a kernel, which are identified by module, source file, line
number, function name, or some combination of these.
Here is a list of probe point families currently supported. The
.function variant places a probe near the beginning of the named func-
tion, so that parameters are available as context variables. The .re-
turn variant places a probe at the moment of return from the named
function, so the return value is available as the "$return" context
variable. The .inline modifier for .function filters the results to
include only instances of inlined functions. The .call modifier se-
lects the opposite subset. Inline functions do not have an identifi-
able return point, so .return is not supported on .inline probes. The
.statement variant places a probe at the exact spot, exposing those
local variables that are visible there.
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
kernel.statement(PATTERN)
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
In the above list, MPATTERN stands for a string literal that aims to
identify the loaded kernel module of interest. It may include "*",
"[]", and "?" wildcards. PATTERN stands for a string literal that
aims to identify a point in the program. It is made up of three
parts:
· The first part is the name of a function, as would appear in the
nm program’s output. This part may use the "*" and "?" wildcard-
ing operators to match multiple names.
· The second part is optional and begins with the "@" character. It
is followed by the path to the source file containing the func-
tion, which may include a wildcard pattern, such as mm/slab*. In
most cases, the path should be relative to the top of the linux
source directory, although an absolute path may be necessary for
some kernels. If a relative pathname doesn’t work, try absolute.
· Finally, the third part is optional if the file name part was giv-
en, and identifies the line number in the source file, preceded by
a ":".
As an alternative, PATTERN may be a numeric constant, indicating an
(module-relative or kernel-_stext-relative) address. In guru mode on-
ly, absolute kernel addresses may be specified with the ".absolute"
suffix.
Some of the source-level variables, such as function parameters, lo-
cals, globals visible in the compilation unit, may be visible to probe
handlers. They may refer to these variables by prefixing their name
with "$" within the scripts. In addition, a special syntax allows
limited traversal of structures, pointers, and arrays.
$var refers to an in-scope variable "var". If it’s an integer-like
type, it will be cast to a 64-bit int for systemtap script use.
String-like pointers (char *) may be copied to systemtap string
values using the kernel_string or user_string functions.
$var->field
traversal to a structure’s field. The indirection operator may
be repeated to follow more levels of pointers.
$var[N]
indexes into an array. The index is given with a literal num-
ber.
USER-SPACE
Early prototype support for user-space probing is available in the
form of a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
is analogous to kernel.statement(ADDRESS).absolute in that both use
raw (unverified) virtual addresses and provide no $variables. The
target PID parameter must identify a running process, and ADDRESS
should identify a valid instruction address. All threads of that pro-
cess will be probed.
PROCFS
These probe points allow procfs "files" in /proc/systemtap/MODNAME to
be created, read and written (MODNAME is the name of the systemtap
module). The proc filesystem is a pseudo-filesystem which is used an
an interface to kernel data structures. There are four probe point
variants supported by the translator:
procfs("PATH").read
procfs("PATH").write
procfs.read
procfs.write
PATH is the file name (relative to /proc/systemtap/MODNAME) to be cre-
ated. If no PATH is specified (as in the last two variants above),
PATH defaults to "command".
When a user reads /proc/systemtap/MODNAME/PATH, the corresponding
procfs read probe is triggered. The string data to be read should be
assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH, the correspond-
ing procfs write probe is triggered. The data the user wrote is
available in the string variable named $value, like this:
procfs("PATH").write { printf("user wrote: %s", $value) }
MARKERS
This family of probe points hooks up to static probing markers insert-
ed into the kernel or modules. These markers are special macro calls
inserted by kernel developers to make probing faster and more reliable
than with DWARF-based probes. Further, DWARF debugging information is
not required to probe markers.
Marker probe points begin with kernel. The next part names the marker
itself: mark("name"). The marker name string, which may contain the
usual wildcard characters, is matched against the names given to the
marker macros when the kernel and/or module was compiled. Optional-
ly, you can specify format("format"). Specifying the marker format
string allows differentation between two markers with the same name
but different marker format strings.
The handler associated with a marker-based probe may read the optional
parameters specified at the macro call site. These are named $arg1
through $argNN, where NN is the number of parameters supplied by the
macro. Number and string parameters are passed in a type-safe manner.
The marker format string associated with a marker is available in
$format.
PERFORMANCE MONITORING HARDWARE
The perfmon family of probe points is used to access the performance
monitoring hardware available in modern processors. This family of
probes points needs the perfmon2 support in the kernel to access the
performance monitoring hardware.
Performance monitor hardware points begin with a perfmon. The next
part of the names the event being counted counter("event"). The event
names are processor implementation specific with the execption of the
generic cycles and instructions events, which are available on all
processors. This sets up a counter on the processor to count the num-
ber of events occuring on the processor. For more details on the per-
formance monitoring events available on a specific processor use the
command perfmon2 command:
pfmon -l
$counter
is a handle used in the body of the probe for operations in-
volving the counter associated with the probe.
read_counter
is a function that is passed the handle for the perfmon probe
and returns the current count for the event.
EXAMPLES
Here are some example probe points, defining the associated events.
begin, end, end
refers to the startup and normal shutdown of the session. In
this case, the handler would run once during startup and twice
during shutdown.
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/- 200 jiffies.
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in the
name.
kernel.function("*@kernel/sched.c:240")
refers to any functions within the "kernel/sched.c" file that
span line 240.
kernel.mark("getuid")
refers to an STAP_MARK(getuid, ...) macro call in the kernel.
module("usb*").function("*sync*").return
refers to the moment of return from all functions with "sync"
in the name in any of the USB drivers.
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled in-
structions include the given address in the kernel.
kernel.statement("*@kernel/sched.c:2917")
refers to the statement of line 2917 within the "ker-
nel/sched.c".
syscall.*.return
refers to the group of probe aliases with any name in the third
position
SEE ALSO
stap(1), stapprobes.iosched(5), stapprobes.netdev(5), stap-
probes.nfs(5), stapprobes.nfsd(5), stapprobes.pagefault(5), stap-
probes.process(5), stapprobes.rpc(5), stapprobes.scsi(5), stap-
probes.signal(5), stapprobes.socket(5), stapprobes.tcp(5), stap-
probes.udp(5), proc(5)
Red Hat 2009-04-20 STAPPROBES(5)