pcrecompat

TriggerTek Logo
abcdefghijklmnopqrstuvwxyz_
PCRECOMPAT(3)							PCRECOMPAT(3)



NAME
       PCRE - Perl-compatible regular expressions

DIFFERENCES BETWEEN PCRE AND PERL

       This document describes the differences in the ways that PCRE and Perl
       handle regular expressions. The differences described here are  mainly
       with  respect  to Perl 5.8, though PCRE versions 7.0 and later contain
       some features that are expected to be in the forthcoming Perl 5.10.

       1. PCRE has only a subset of Perl’s UTF-8 and Unicode support. Details
       of  what it does have are given in the section on UTF-8 support in the
       main pcre page.

       2. PCRE does not allow repeat  quantifiers  on  lookahead  assertions.
       Perl  permits  them,  but  they	do not mean what you might think. For
       example, (?!a){3} does not assert that the next three  characters  are
       not  "a".  It  just  asserts  that the next character is not "a" three
       times.

       3. Capturing subpatterns that occur inside negative  lookahead  asser-
       tions  are  counted, but their entries in the offsets vector are never
       set. Perl sets its numerical variables from any such patterns that are
       matched	before	the  assertion fails to match something (thereby suc-
       ceeding), but only if the negative lookahead assertion  contains	 just
       one branch.

       4.  Though binary zero characters are supported in the subject string,
       they are not allowed in a pattern string because it  is	passed	as  a
       normal  C  string,  terminated  by zero. The escape sequence \0 can be
       used in the pattern to represent a binary zero.

       5. The following Perl escape sequences are not supported: \l, \u,  \L,
       \U,  and	 \N.  In fact these are implemented by Perl’s general string-
       handling and are not part of its pattern matching engine.  If  any  of
       these are encountered by PCRE, an error is generated.

       6. The Perl escape sequences \p, \P, and \X are supported only if PCRE
       is built with Unicode character property support. The properties	 that
       can be tested with \p and \P are limited to the general category prop-
       erties such as Lu and Nd, script names such as Greek or Han,  and  the
       derived properties Any and L&.

       7.  PCRE does support the \Q...\E escape for quoting substrings. Char-
       acters in between are treated as literals. This is slightly  different
       from  Perl  in  that  $	and @ are also handled as literals inside the
       quotes. In Perl, they cause variable interpolation (but of course PCRE
       does not have variables). Note the following examples:

	   Pattern	      PCRE matches	Perl matches

	   \Qabc$xyz\E	      abc$xyz		abc followed by the
						  contents of $xyz
	   \Qabc\$xyz\E	      abc\$xyz		abc\$xyz
	   \Qabc\E\$\Qxyz\E   abc$xyz		abc$xyz

       The  \Q...\E  sequence is recognized both inside and outside character
       classes.

       8.  Fairly  obviously,  PCRE  does  not	support	 the  (?{code})	  and
       (??{code}) constructions. However, there is support for recursive pat-
       terns. This is not available in Perl 5.8, but will be  in  Perl	5.10.
       Also,  the  PCRE	 "callout"  feature allows an external function to be
       called during pattern matching. See the pcrecallout documentation  for
       details.

       9.  Subpatterns	that  are  called recursively or as "subroutines" are
       always treated as atomic groups in PCRE.	 This  is  like	 Python,  but
       unlike Perl.

       10. There are some differences that are concerned with the settings of
       captured strings when part of a	pattern	 is  repeated.	For  example,
       matching	 "aba"	against	 the  pattern  /^(a(b)?)+$/ in Perl leaves $2
       unset, but in PCRE it is set to "b".

       11. PCRE	 does  support	Perl  5.10’s  backtracking  verbs  (*ACCEPT),
       (*FAIL),	 (*F), (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in
       the forms without an argument.  PCRE  does  not	support	 (*MARK).  If
       (*ACCEPT) is within capturing parentheses, PCRE does not set that cap-
       ture group; this is different to Perl.

       12. PCRE provides some  extensions  to  the  Perl  regular  expression
       facilities.   Perl 5.10 will include new features that are not in ear-
       lier versions, some of which (such as named parentheses) have been  in
       PCRE for some time. This list is with respect to Perl 5.10:

       (a)  Although  lookbehind  assertions must match fixed length strings,
       each alternative branch of a lookbehind assertion can match a  differ-
       ent  length of string. Perl requires them all to have the same length.

       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
       meta-character matches only at the very end of the string.

       (c)  If	PCRE_EXTRA  is	set, a backslash followed by a letter with no
       special meaning is faulted. Otherwise, like  Perl,  the	backslash  is
       quietly ignored.	 (Perl can be made to issue a warning.)

       (d)  If PCRE_UNGREEDY is set, the greediness of the repetition quanti-
       fiers is inverted, that is, by default they are	not  greedy,  but  if
       followed by a question mark they are.

       (e)  PCRE_ANCHORED  can be used at matching time to force a pattern to
       be tried only at the first matching position in the subject string.

       (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP-
       TURE options for pcre_exec() have no Perl equivalents.

       (g)  The \R escape sequence can be restricted to match only CR, LF, or
       CRLF by the PCRE_BSR_ANYCRLF option.

       (h) The callout facility is PCRE-specific.

       (i) The partial matching facility is PCRE-specific.

       (j) Patterns compiled by PCRE can be saved  and	re-used	 at  a	later
       time, even on different hosts that have the other endianness.

       (k)  The	 alternative matching function (pcre_dfa_exec()) matches in a
       different way and is not Perl-compatible.

       (l) PCRE recognizes some special sequences such as (*CR) at the	start
       of  a  pattern  that set overall options that cannot be changed within
       the pattern.

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION

       Last updated: 11 September 2007
       Copyright (c) 1997-2007 University of Cambridge.



								PCRECOMPAT(3)