stapprobes

TriggerTek Logo
abcdefghijklmnopqrstuvwxyz_
STAPPROBES(5)							STAPPROBES(5)



NAME
       stapprobes - systemtap probe points



DESCRIPTION
       The following sections enumerate the variety of probe points supported
       by the systemtap translator, and additional aliases defined  by	stan-
       dard tapset scripts.

       The  general  probe  point  syntax  is a dotted-symbol sequence.	 This
       allows a breakdown of the event namespace into  parts,  somewhat	 like
       the  Domain  Name System does on the Internet.  Each component identi-
       fier may be parametrized by a string or number literal, with a  syntax
       like  a	function  call.	  A component may include a "*" character, to
       expand to a set of matching  probe  points.   Probe  aliases  likewise
       expand to other probe points.  Each and every resulting probe point is
       normally resolved to some low-level  system  instrumentation  facility
       (e.g.,  a kprobe address, marker, or a timer configuration), otherwise
       the elaboration phase will fail.

       However, a probe point may be followed by a "?" character, to indicate
       that  it	 is  optional, and that no error should result if it fails to
       resolve.	 Optionalness passes down through all levels  of  alias/wild-
       card  expansion.	  Alternately, a probe point may be followed by a "!"
       character, to indicate  that  it	 is  both  optional  and  sufficient.
       (Think  vaguely	of the prolog cut operator.) If it does resolve, then
       no further probe points in  the	same  comma-separated  list  will  be
       resolved.   Therefore, the "!"  sufficiency mark only makes sense in a
       list of probe point alternatives.

       Additionally, a probe point may be followed by a	 "if  (expr)"  state-
       ment,  in order to enable/disable the probe point on-the-fly. With the
       "if" statement, if the "expr" is false when the probe  point  is	 hit,
       the  whole probe body including alias’s body is skipped. The condition
       is stacked up through all levels of alias/wildcard expansion.  So  the
       final  condition becomes the logical-and of conditions of all expanded
       alias/wildcard.

       These are all syntactically valid probe points:

	      kernel.function("foo").return
	      syscall(22)
	      user.inode("/bin/vi").statement(0x2222)
	      end
	      syscall.*
	      kernel.function("no_such_function") ?
	      module("awol").function("no_such_function") !
	      signal.*? if (switch)

       Probes  may  be	broadly	 classified  into  "synchronous"  and  "asyn-
       chronous".   A "synchronous" event is deemed to occur when any proces-
       sor executes an instruction matched by the specification.  This	gives
       these  probes  a reference point (instruction address) from which more
       contextual data may be available.  Other families of probe points  re-
       fer  to	"asynchronous"	events	such as timers/counters rolling over,
       where there is no fixed reference point that is related.	  Each	probe
       point  specification  may match multiple locations (for example, using
       wildcards or aliases), and all them are then probed.  A probe declara-
       tion  may  also contain several comma-separated specifications, all of
       which are probed.


   BEGIN/END/ERROR
       The probe points begin and end are defined by the translator to	refer
       to  the	time of session startup and shutdown.  All "begin" probe han-
       dlers are run, in some sequence, during the startup  of	the  session.
       All  global  variables will have been initialized prior to this point.
       All "end" probes are run, in some sequence, during the normal shutdown
       of a session, such as in the aftermath of an exit () function call, or
       an interruption from the user.  In  the	case  of  an  error-triggered
       shutdown,  "end"	 probes	 are  not run.	There are no target variables
       available in either context.

       If the order of execution among "begin" or "end"	 probes	 is  signifi-
       cant, then an optional sequence number may be provided:

	      begin(N)
	      end(N)

       The  number N may be positive or negative.  The probe handlers are run
       in increasing order, and the order between handlers with the same  se-
       quence number is unspecified.  When "begin" or "end" are given without
       a sequence, they are effectively sequence zero.

       The error probe point is similar to the end probe,  except  that	 each
       such  probe  handler  run  when the session ends after errors have oc-
       curred.	In such cases, "end" probes are	 skipped,  but	each  "error"
       prober is still attempted.  This kind of probe can be used to clean up
       or emit a "final gasp".	It may also be	numerically  parametrized  to
       set a sequence.


   NEVER
       The  probe  point never is specially defined by the translator to mean
       "never".	 Its probe handler is never run, though	 its  statements  are
       analyzed for symbol / type correctness as usual.	 This probe point may
       be useful in conjunction with optional probes.


   TIMERS
       Intervals defined by the standard kernel "jiffies" timer may  be	 used
       to  trigger  probe  handlers asynchronously.  Two probe point variants
       are supported by the translator:

	      timer.jiffies(N)
	      timer.jiffies(N).randomize(M)

       The probe handler is run every N jiffies	 (a  kernel-defined  unit  of
       time, typically between 1 and 60 ms).  If the "randomize" component is
       given, a linearly distributed random value in the  range	 [-M..+M]  is
       added  to  N every time the handler is run.  N is restricted to a rea-
       sonable range (1 to around a million),  and  M  is  restricted  to  be
       smaller than N.	There are no target variables provided in either con-
       text.  It is possible for such probes to be run concurrently on a mul-
       ti-processor computer.

       Alternatively, intervals may be specified in units of time.  There are
       two probe point variants similar to the jiffies timer:

	      timer.ms(N)
	      timer.ms(N).randomize(M)

       Here, N and M are specified in milliseconds, but the full options  for
       units   are  seconds  (s/sec),  milliseconds  (ms/msec),	 microseconds
       (us/usec), nanoseconds (ns/nsec), and hertz  (hz).   Randomization  is
       not supported for hertz timers.

       The actual resolution of the timers depends on the target kernel.  For
       kernels prior to 2.6.17, timers are limited to jiffies resolution,  so
       intervals  are  rounded	up  to	the  nearest jiffies interval.	After
       2.6.17, the implementation uses hrtimers for tighter precision, though
       the  actual resolution will be arch-dependent.  In either case, if the
       "randomize" component is given, then the random value will be added to
       the interval before any rounding occurs.

       Profiling  timers are also available to provide probes that execute on
       all CPUs at the rate of the system tick.	 This probe takes no  parame-
       ters.

	      timer.profile

       Full context information of the interrupted process is available, mak-
       ing this probe suitable for a time-based sampling profiler.


   DWARF
       This family of probe points uses symbolic  debugging  information  for
       the  target  kernel/module/program, as may be found in unstripped exe-
       cutables, or the separate debuginfo packages.  They allow placement of
       probes  logically  into	the  execution path of the target program, by
       specifying a set of points in the  source  or  object  code.   When  a
       matching statement executes on any processor, the probe handler is run
       in that context.

       Points in a kernel, which are identified by module, source file,	 line
       number, function name, or some combination of these.

       Here  is	 a  list  of  probe  point families currently supported.  The
       .function variant places a probe near the beginning of the named func-
       tion, so that parameters are available as context variables.  The .re-
       turn variant places a probe at the moment of  return  from  the	named
       function,  so  the  return value is available as the "$return" context
       variable.  The .inline modifier for .function filters the  results  to
       include	only  instances of inlined functions.  The .call modifier se-
       lects the opposite subset.  Inline functions do not have an  identifi-
       able  return point, so .return is not supported on .inline probes. The
       .statement variant places a probe at the exact  spot,  exposing	those
       local variables that are visible there.

	      kernel.function(PATTERN)
	      kernel.function(PATTERN).call
	      kernel.function(PATTERN).return
	      kernel.function(PATTERN).inline
	      module(MPATTERN).function(PATTERN)
	      module(MPATTERN).function(PATTERN).call
	      module(MPATTERN).function(PATTERN).return
	      module(MPATTERN).function(PATTERN).inline
	      kernel.statement(PATTERN)
	      kernel.statement(ADDRESS).absolute
	      module(MPATTERN).statement(PATTERN)

       In  the	above list, MPATTERN stands for a string literal that aims to
       identify the loaded kernel module of interest.  It  may	include	 "*",
       "[]",  and  "?"	wildcards.   PATTERN stands for a string literal that
       aims to identify a point in the program.	  It  is  made	up  of	three
       parts:

       ·   The	first  part is the name of a function, as would appear in the
	   nm program’s output.	 This part may use the "*" and "?"  wildcard-
	   ing operators to match multiple names.

       ·   The second part is optional and begins with the "@" character.  It
	   is followed by the path to the source file  containing  the	func-
	   tion,  which may include a wildcard pattern, such as mm/slab*.  In
	   most cases, the path should be relative to the top  of  the	linux
	   source  directory,  although an absolute path may be necessary for
	   some kernels.  If a relative pathname doesn’t work, try  absolute.

       ·   Finally, the third part is optional if the file name part was giv-
	   en, and identifies the line number in the source file, preceded by
	   a ":".

       As  an  alternative,  PATTERN may be a numeric constant, indicating an
       (module-relative or kernel-_stext-relative) address.  In guru mode on-
       ly,  absolute  kernel  addresses may be specified with the ".absolute"
       suffix.

       Some of the source-level variables, such as function  parameters,  lo-
       cals, globals visible in the compilation unit, may be visible to probe
       handlers.  They may refer to these variables by prefixing  their	 name
       with  "$"  within  the  scripts.	 In addition, a special syntax allows
       limited traversal of structures, pointers, and arrays.

       $var   refers to an in-scope variable "var".  If it’s an	 integer-like
	      type, it will be cast to a 64-bit int for systemtap script use.
	      String-like pointers (char *) may be copied to systemtap string
	      values using the kernel_string or user_string functions.

       $var->field
	      traversal to a structure’s field.	 The indirection operator may
	      be repeated to follow more levels of pointers.

       $var[N]
	      indexes into an array.  The index is given with a literal	 num-
	      ber.


   USER-SPACE
       Early  prototype	 support  for  user-space probing is available in the
       form of a non-symbolic probe point:
	      process(PID).statement(ADDRESS).absolute
       is analogous to kernel.statement(ADDRESS).absolute in  that  both  use
       raw  (unverified)  virtual  addresses  and provide no $variables.  The
       target PID parameter must identify  a  running  process,	 and  ADDRESS
       should identify a valid instruction address.  All threads of that pro-
       cess will be probed.


   PROCFS
       These probe points allow procfs "files" in /proc/systemtap/MODNAME  to
       be  created,  read  and	written (MODNAME is the name of the systemtap
       module). The proc filesystem is a pseudo-filesystem which is  used  an
       an  interface  to  kernel data structures.  There are four probe point
       variants supported by the translator:

	      procfs("PATH").read
	      procfs("PATH").write
	      procfs.read
	      procfs.write

       PATH is the file name (relative to /proc/systemtap/MODNAME) to be cre-
       ated.   If  no  PATH is specified (as in the last two variants above),
       PATH defaults to "command".

       When a  user  reads  /proc/systemtap/MODNAME/PATH,  the	corresponding
       procfs  read probe is triggered.	 The string data to be read should be
       assigned to a variable named $value, like this:

	      procfs("PATH").read { $value = "100\n" }

       When a user writes into /proc/systemtap/MODNAME/PATH, the  correspond-
       ing  procfs  write  probe  is  triggered.   The data the user wrote is
       available in the string variable named $value, like this:

	      procfs("PATH").write { printf("user wrote: %s", $value) }


   MARKERS
       This family of probe points hooks up to static probing markers insert-
       ed  into the kernel or modules.	These markers are special macro calls
       inserted by kernel developers to make probing faster and more reliable
       than with DWARF-based probes.  Further, DWARF debugging information is
       not required to probe markers.

       Marker probe points begin with kernel.  The next part names the marker
       itself:	mark("name").	The marker name string, which may contain the
       usual wildcard characters, is matched against the names given  to  the
       marker macros when the kernel and/or module was compiled.    Optional-
       ly, you can specify format("format").  Specifying  the  marker  format
       string  allows  differentation  between two markers with the same name
       but different marker format strings.

       The handler associated with a marker-based probe may read the optional
       parameters  specified  at  the macro call site.	These are named $arg1
       through $argNN, where NN is the number of parameters supplied  by  the
       macro.  Number and string parameters are passed in a type-safe manner.

       The marker format string associated with	 a  marker  is	available  in
       $format.


   PERFORMANCE MONITORING HARDWARE
       The  perfmon  family of probe points is used to access the performance
       monitoring hardware available in modern	processors.  This  family  of
       probes  points  needs the perfmon2 support in the kernel to access the
       performance monitoring hardware.

       Performance monitor hardware points begin with a	 perfmon.   The	 next
       part of the names the event being counted counter("event").  The event
       names are processor implementation specific with the execption of  the
       generic	cycles	and  instructions  events, which are available on all
       processors. This sets up a counter on the processor to count the	 num-
       ber  of events occuring on the processor. For more details on the per-
       formance monitoring events available on a specific processor  use  the
       command perfmon2 command:

	      pfmon -l

       $counter
	      is  a  handle  used in the body of the probe for operations in-
	      volving the counter associated with the probe.

       read_counter
	      is a function that is passed the handle for the  perfmon	probe
	      and returns the current count for the event.


EXAMPLES
       Here are some example probe points, defining the associated events.

       begin, end, end
	      refers  to  the startup and normal shutdown of the session.  In
	      this case, the handler would run once during startup and	twice
	      during shutdown.

       timer.jiffies(1000).randomize(200)
	      refers to a periodic interrupt, every 1000 +/- 200 jiffies.

       kernel.function("*init*"), kernel.function("*exit*")
	      refers  to  all  kernel  functions with "init" or "exit" in the
	      name.

       kernel.function("*@kernel/sched.c:240")
	      refers to any functions within the "kernel/sched.c"  file	 that
	      span line 240.

       kernel.mark("getuid")
	      refers to an STAP_MARK(getuid, ...) macro call in the kernel.

       module("usb*").function("*sync*").return
	      refers  to  the moment of return from all functions with "sync"
	      in the name in any of the USB drivers.

       kernel.statement(0xc0044852)
	      refers to the first byte of the statement	 whose	compiled  in-
	      structions include the given address in the kernel.

       kernel.statement("*@kernel/sched.c:2917")
	      refers   to  the	statement  of  line  2917  within  the	"ker-
	      nel/sched.c".

       syscall.*.return
	      refers to the group of probe aliases with any name in the third
	      position


SEE ALSO
       stap(1),	    stapprobes.iosched(5),     stapprobes.netdev(5),	stap-
       probes.nfs(5),  stapprobes.nfsd(5),   stapprobes.pagefault(5),	stap-
       probes.process(5),    stapprobes.rpc(5),	  stapprobes.scsi(5),	stap-
       probes.signal(5),   stapprobes.socket(5),   stapprobes.tcp(5),	stap-
       probes.udp(5), proc(5)



Red Hat				  2009-04-20			STAPPROBES(5)