pcrebuild

TriggerTek Logo
abcdefghijklmnopqrstuvwxyz_
PCREBUILD(3)							 PCREBUILD(3)



NAME
       PCRE - Perl-compatible regular expressions

PCRE BUILD-TIME OPTIONS

       This  document  describes  the  optional	 features of PCRE that can be
       selected when the library is compiled. It assumes use of the configure
       script, where the optional features are selected or deselected by pro-
       viding options to configure before running the make command.  However,
       the  same  options can be selected in both Unix-like and non-Unix-like
       environments using the GUI facility of CMakeSetup  if  you  are	using
       CMake instead of configure to build PCRE.

       The  complete  list of options for configure (which includes the stan-
       dard ones such as the selection of the installation directory) can  be
       obtained by running

	 ./configure --help

       The  following  sections	 include  descriptions of options whose names
       begin with --enable or --disable. These settings	 specify  changes  to
       the  defaults  for the configure command. Because of the way that con-
       figure works, --enable and --disable always come in pairs, so the com-
       plementary  option  always  exists  as  well,  but as it specifies the
       default, it is not described.

C++ SUPPORT

       By default, the configure script will search for a  C++	compiler  and
       C++  header  files.  If it finds them, it automatically builds the C++
       wrapper library for PCRE. You can disable this by adding

	 --disable-cpp

       to the configure command.

UTF-8 SUPPORT

       To build PCRE with support for UTF-8 character strings, add

	 --enable-utf8

       to the configure command. Of itself, this does  not  make  PCRE	treat
       strings as UTF-8. As well as compiling PCRE with this option, you also
       have have to set the PCRE_UTF8 option when you call the pcre_compile()
       function.

UNICODE CHARACTER PROPERTY SUPPORT

       UTF-8 support allows PCRE to process character values greater than 255
       in the strings that it handles. On its own, however, it does not	 pro-
       vide  any  facilities for accessing the properties of such characters.
       If you want to be able to use the pattern  escapes  \P,	\p,  and  \X,
       which refer to Unicode character properties, you must add

	 --enable-unicode-properties

       to the configure command. This implies UTF-8 support, even if you have
       not explicitly requested it.

       Including Unicode property support adds around 30K of  tables  to  the
       PCRE  library.  Only the general category properties such as Lu and Nd
       are supported. Details are given in the pcrepattern documentation.

CODE VALUE OF NEWLINE

       By default, PCRE interprets character 10 (linefeed, LF) as  indicating
       the  end	 of a line. This is the normal newline character on Unix-like
       systems. You can compile PCRE to use character  13  (carriage  return,
       CR) instead, by adding

	 --enable-newline-is-cr

       to  the	configure  command.  There  is	also a --enable-newline-is-lf
       option, which explicitly specifies linefeed as the newline  character.

       Alternatively,  you  can specify that line endings are to be indicated
       by the two character sequence CRLF. If you want this, add

	 --enable-newline-is-crlf

       to the configure command. There is a fourth option, specified by

	 --enable-newline-is-anycrlf

       which causes PCRE to recognize any of the three sequences CR,  LF,  or
       CRLF  as	 indicating a line ending. Finally, a fifth option, specified
       by

	 --enable-newline-is-any

       causes PCRE to recognize any Unicode newline sequence.

       Whatever line ending convention is selected when PCRE is built can  be
       overridden  when the library functions are called. At build time it is
       conventional to use the standard for your operating system.

WHAT \R MATCHES

       By default, the sequence \R in a pattern matches any  Unicode  newline
       sequence,  whatever  has been selected as the line ending sequence. If
       you specify

	 --enable-bsr-anycrlf

       the default is changed so that \R matches only CR, LF, or CRLF.	What-
       ever is selected when PCRE is built can be overridden when the library
       functions are called.

BUILDING SHARED AND STATIC LIBRARIES

       The PCRE building process uses libtool to build both shared and static
       Unix libraries by default. You can suppress one of these by adding one
       of

	 --disable-shared
	 --disable-static

       to the configure command, as required.

POSIX MALLOC USAGE

       When PCRE is called through the POSIX  interface	 (see  the  pcreposix
       documentation), additional working storage is required for holding the
       pointers to capturing substrings, because PCRE requires three integers
       per  substring,	whereas the POSIX interface provides only two. If the
       number of expected substrings is	 small,	 the  wrapper  function	 uses
       space  on  the  stack,  because this is faster than using malloc() for
       each call. The default threshold above which the stack  is  no  longer
       used is 10; it can be changed by adding a setting such as

	 --with-posix-malloc-threshold=20

       to the configure command.

HANDLING VERY LARGE PATTERNS

       Within  a  compiled  pattern, offset values are used to point from one
       part to another (for example, from an opening parenthesis to an alter-
       nation  metacharacter). By default, two-byte values are used for these
       offsets, leading to a maximum size for a compiled  pattern  of  around
       64K.  This is sufficient to handle all but the most gigantic patterns.
       Nevertheless, some people do want to process enormous patterns, so  it
       is  possible to compile PCRE to use three-byte or four-byte offsets by
       adding a setting such as

	 --with-link-size=3

       to the configure command. The value given must be 2, 3,	or  4.	Using
       longer offsets slows down the operation of PCRE because it has to load
       additional bytes when handling them.

AVOIDING EXCESSIVE STACK USAGE

       When matching with the pcre_exec()  function,  PCRE  implements	back-
       tracking	 by  making  recursive	calls  to an internal function called
       match(). In environments where the size of the stack is limited,	 this
       can  severely  limit  PCRE’s operation. (The Unix environment does not
       usually suffer from this problem, but it may sometimes be necessary to
       increase	 the  maximum  stack  size.   There  is	 a  discussion in the
       pcrestack documentation.) An alternative approach  to  recursion	 that
       uses memory from the heap to remember data, instead of using recursive
       function calls, has been implemented to work round the problem of lim-
       ited  stack  size.  If  you want to build a version of PCRE that works
       this way, add

	 --disable-stack-for-recursion

       to the configure command. With this configuration, PCRE will  use  the
       pcre_stack_malloc and pcre_stack_free variables to call memory manage-
       ment functions. By default these point to malloc() and free(), but you
       can replace the pointers so that your own functions are used.

       Separate	 functions  are	 provided  rather  than using pcre_malloc and
       pcre_free because the usage  is	very  predictable:  the	 block	sizes
       requested  are  always  the  same,  and the blocks are always freed in
       reverse order. A calling program might be able to implement  optimized
       functions  that	perform	 better	 than  malloc() and free(). PCRE runs
       noticeably more slowly when built in this  way.	This  option  affects
       only  the  pcre_exec()  function;  it  is  not  relevant	 for  the the
       pcre_dfa_exec() function.

LIMITING PCRE RESOURCE USAGE

       Internally, PCRE has a function called match(), which it calls repeat-
       edly   (sometimes  recursively)	when  matching	a  pattern  with  the
       pcre_exec() function. By controlling the maximum number of times	 this
       function may be called during a single matching operation, a limit can
       be placed on the resources used by a single call to  pcre_exec().  The
       limit can be changed at run time, as described in the pcreapi documen-
       tation. The default is 10 million, but this can be changed by adding a
       setting such as

	 --with-match-limit=500000

       to   the	 configure  command.  This  setting  has  no  effect  on  the
       pcre_dfa_exec() matching function.

       In some environments it is desirable to limit the depth	of  recursive
       calls  of  match()  more	 strictly  than the total number of calls, in
       order to restrict the maximum amount of stack (or heap, if  --disable-
       stack-for-recursion  is	specified)  that is used. A second limit con-
       trols this; it defaults to the value that  is  set  for	--with-match-
       limit, which imposes no additional constraints. However, you can set a
       lower limit by adding, for example,

	 --with-match-limit-recursion=10000

       to the configure command. This value can also  be  overridden  at  run
       time.

CREATING CHARACTER TABLES AT BUILD TIME

       PCRE uses fixed tables for processing characters whose code values are
       less than 256. By default, PCRE is built with a set of tables that are
       distributed  in	the file pcre_chartables.c.dist. These tables are for
       ASCII codes only. If you add

	 --enable-rebuild-chartables

       to the configure command, the distributed tables are no	longer	used.
       Instead,	 a  program called dftables is compiled and run. This outputs
       the source for new set of tables, created in  the  default  locale  of
       your  C	runtime system. (This method of replacing the tables does not
       work if you are cross compiling, because dftables is run on the	local
       host.  If  you need to create alternative tables when cross compiling,
       you will have to do so "by hand".)

USING EBCDIC CODE

       PCRE assumes by default that it will run in an environment  where  the
       character  code	is  ASCII (or Unicode, which is a superset of ASCII).
       This is the case for most computer operating systems. PCRE  can,	 how-
       ever, be compiled to run in an EBCDIC environment by adding

	 --enable-ebcdic

       to  the	configure  command.  This  setting  implies --enable-rebuild-
       chartables. You should only use it if you know  that  you  are  in  an
       EBCDIC environment (for example, an IBM mainframe operating system).

SEE ALSO

       pcreapi(3), pcre_config(3).

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION

       Last updated: 21 September 2007
       Copyright (c) 1997-2007 University of Cambridge.



								 PCREBUILD(3)