pcrecallout

TriggerTek Logo
abcdefghijklmnopqrstuvwxyz_
PCRECALLOUT(3)						       PCRECALLOUT(3)



NAME
       PCRE - Perl-compatible regular expressions

PCRE CALLOUTS

       int (*pcre_callout)(pcre_callout_block *);

       PCRE  provides  a  feature  called "callout", which is a means of tem-
       porarily passing control to the caller of PCRE in the middle  of	 pat-
       tern  matching.	The  caller  of PCRE provides an external function by
       putting its entry  point	 in  the  global  variable  pcre_callout.  By
       default,	 this variable contains NULL, which disables all calling out.

       Within a regular expression, (?C) indicates the points  at  which  the
       external	 function  is  to  be called. Different callout points can be
       identified by putting a number less than 256 after the letter  C.  The
       default	value  is  zero.   For	example, this pattern has two callout
       points:

	 (?C1)abc(?C2)def

       If the PCRE_AUTO_CALLOUT option bit  is	set  when  pcre_compile()  is
       called,	PCRE  automatically  inserts  callouts,	 all with number 255,
       before each item in the pattern. For example, if PCRE_AUTO_CALLOUT  is
       used with the pattern

	 A(\d{2}|--)

       it is processed as if it were

       (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)

       Notice  that  there is a callout before and after each parenthesis and
       alternation bar. Automatic callouts  can	 be  used  for	tracking  the
       progress	 of pattern matching. The pcretest command has an option that
       sets automatic callouts; when it is used, the output indicates how the
       pattern	is matched. This is useful information when you are trying to
       optimize the performance of a particular pattern.

MISSING CALLOUTS

       You should be aware that, because of optimizations  in  the  way	 PCRE
       matches	patterns,  callouts  sometimes do not happen. For example, if
       the pattern is

	 ab(?C4)cd

       PCRE knows that any matching string must contain the  letter  "d".  If
       the  subject  string  is	 "abyz",  the lack of "d" means that matching
       doesn’t ever start, and the callout is never  reached.  However,	 with
       "abyd", though the result is still no match, the callout is obeyed.

THE CALLOUT INTERFACE

       During matching, when PCRE reaches a callout point, the external func-
       tion defined by pcre_callout is called (if it is set). This applies to
       both  the  pcre_exec() and the pcre_dfa_exec() matching functions. The
       only argument to the callout function is a pointer to  a	 pcre_callout
       block. This structure contains the following fields:

	 int	      version;
	 int	      callout_number;
	 int	     *offset_vector;
	 const char  *subject;
	 int	      subject_length;
	 int	      start_match;
	 int	      current_position;
	 int	      capture_top;
	 int	      capture_last;
	 void	     *callout_data;
	 int	      pattern_position;
	 int	      next_item_length;

       The  version  field is an integer containing the version number of the
       block format. The initial version was 0; the current version is 1. The
       version	number	will  change again in future if additional fields are
       added, but the intention is  never  to  remove  any  of	the  existing
       fields.

       The  callout_number  field contains the number of the callout, as com-
       piled into the pattern (that is, the number after ?C for manual	call-
       outs, and 255 for automatically generated callouts).

       The offset_vector field is a pointer to the vector of offsets that was
       passed  by  the	caller	to  pcre_exec()	 or   pcre_dfa_exec().	 When
       pcre_exec() is used, the contents can be inspected in order to extract
       substrings that have been matched so far,  in  the  same	 way  as  for
       extracting substrings after a match has completed. For pcre_dfa_exec()
       this field is not useful.

       The subject and subject_length fields contain  copies  of  the  values
       that were passed to pcre_exec().

       The  start_match field normally contains the offset within the subject
       at which the current match attempt started.  However,  if  the  escape
       sequence \K has been encountered, this value is changed to reflect the
       modified starting point. If the pattern is not anchored,	 the  callout
       function	 may  be called several times from the same point in the pat-
       tern for different starting points in the subject.

       The current_position field contains the offset within the  subject  of
       the current match pointer.

       When  the pcre_exec() function is used, the capture_top field contains
       one more than the number of the highest numbered captured substring so
       far.  If no substrings have been captured, the value of capture_top is
       one. This is always the case when pcre_dfa_exec() is used, because  it
       does not support captured substrings.

       The  capture_last  field contains the number of the most recently cap-
       tured substring. If no substrings have been captured, its value is -1.
       This is always the case when pcre_dfa_exec() is used.

       The  callout_data field contains a value that is passed to pcre_exec()
       or pcre_dfa_exec() specifically so that it can be passed back in call-
       outs.  It  is  passed in the pcre_callout field of the pcre_extra data
       structure. If no such data was passed, the value of callout_data in  a
       pcre_callout  block  is NULL. There is a description of the pcre_extra
       structure in the pcreapi documentation.

       The pattern_position field is present from version 1 of the pcre_call-
       out  structure.	It contains the offset to the next item to be matched
       in the pattern string.

       The next_item_length field is present from version 1 of the pcre_call-
       out  structure.	It contains the length of the next item to be matched
       in the pattern string. When the callout immediately precedes an alter-
       nation  bar,  a	closing	 parenthesis,  or the end of the pattern, the
       length is zero. When the callout precedes an opening parenthesis,  the
       length is that of the entire subpattern.

       The  pattern_position and next_item_length fields are intended to help
       in distinguishing between different automatic callouts, which all have
       the same callout number. However, they are set for all callouts.

RETURN VALUES

       The external callout function returns an integer to PCRE. If the value
       is zero, matching proceeds as normal. If the  value  is	greater	 than
       zero,  matching	fails  at the current point, but the testing of other
       matching possibilities goes ahead, just as if  a	 lookahead  assertion
       had  failed.  If	 the value is less than zero, the match is abandoned,
       and pcre_exec() (or pcre_dfa_exec()) returns the negative value.

       Negative	 values	 should	 normally  be  chosen	from   the   set   of
       PCRE_ERROR_xxx  values.	In  particular,	 PCRE_ERROR_NOMATCH  forces a
       standard "no match" failure.  The error number  PCRE_ERROR_CALLOUT  is
       reserved	 for  use by callout functions; it will never be used by PCRE
       itself.

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION

       Last updated: 29 May 2007
       Copyright (c) 1997-2007 University of Cambridge.



							       PCRECALLOUT(3)