Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PCREBUILD(3)		   Library Functions Manual		  PCREBUILD(3)

NAME
       PCRE - Perl-compatible regular expressions

BUILDING PCRE
       PCRE  is	 distributed with a configure script that can be used to build
       the library in Unix-like	environments using the applications  known  as
       Autotools.   Also in the	distribution are files to support building us-
       ing CMake instead of configure. The text	file README  contains  general
       information  about  building  with Autotools (some of which is repeated
       below), and also	has some comments about	building on various  operating
       systems.	 There	is  a lot more information about building PCRE without
       using Autotools (including information about using CMake	 and  building
       "by  hand")  in	the  text file called NON-AUTOTOOLS-BUILD.  You	should
       consult this file as well as the	README file if you are building	 in  a
       non-Unix-like environment.

PCRE BUILD-TIME	OPTIONS
       The  rest of this document describes the	optional features of PCRE that
       can be selected when the	library	is compiled. It	 assumes  use  of  the
       configure  script,  where  the  optional	features are selected or dese-
       lected by providing options to configure	before running the  make  com-
       mand.  However,	the same options can be	selected in both Unix-like and
       non-Unix-like environments using	the GUI	facility of cmake-gui  if  you
       are using CMake instead of configure to build PCRE.

       If  you	are not	using Autotools	or CMake, option selection can be done
       by editing the config.h file, or	by passing parameter settings  to  the
       compiler, as described in NON-AUTOTOOLS-BUILD.

       The complete list of options for	configure (which includes the standard
       ones such as the	selection of the installation directory)  can  be  ob-
       tained by running

	 ./configure --help

       The  following sections include descriptions of options whose names be-
       gin with	--enable or --disable. These settings specify changes  to  the
       defaults	 for  the configure command. Because of	the way	that configure
       works, --enable and --disable always come in pairs, so  the  complemen-
       tary  option always exists as well, but as it specifies the default, it
       is not described.

BUILDING 8-BIT,	16-BIT AND 32-BIT LIBRARIES
       By default, a library called libpcre  is	 built,	 containing  functions
       that  take  string  arguments  contained	in vectors of bytes, either as
       single-byte characters, or interpreted as UTF-8 strings.	You  can  also
       build  a	 separate library, called libpcre16, in	which strings are con-
       tained in vectors of 16-bit data	units and interpreted either  as  sin-
       gle-unit	characters or UTF-16 strings, by adding

	 --enable-pcre16

       to  the	configure command. You can also	build yet another separate li-
       brary, called libpcre32,	in which strings are contained in  vectors  of
       32-bit  data  units and interpreted either as single-unit characters or
       UTF-32 strings, by adding

	 --enable-pcre32

       to the configure	command. If you	do not want the	8-bit library, add

	 --disable-pcre8

       as well.	At least one of	the three libraries must be built.  Note  that
       the  C++	 and  POSIX  wrappers are for the 8-bit	library	only, and that
       pcregrep	is an 8-bit program. None of these are	built  if  you	select
       only the	16-bit or 32-bit libraries.

BUILDING SHARED	AND STATIC LIBRARIES
       The  Autotools  PCRE building process uses libtool to build both	shared
       and static libraries by default.	You  can  suppress  one	 of  these  by
       adding one of

	 --disable-shared
	 --disable-static

       to the configure	command, as required.

C++ SUPPORT
       By  default,  if	the 8-bit library is being built, the configure	script
       will search for a C++ compiler and C++ header files. If it finds	 them,
       it  automatically  builds  the C++ wrapper library (which supports only
       8-bit strings). You can disable this by adding

	 --disable-cpp

       to the configure	command.

UTF-8, UTF-16 AND UTF-32 SUPPORT
       To build	PCRE with support for UTF Unicode character strings, add

	 --enable-utf

       to the configure	command. This setting applies to all three  libraries,
       adding  support	for  UTF-8 to the 8-bit	library, support for UTF-16 to
       the 16-bit library, and support for UTF-32 to the  to  the  32-bit  li-
       brary.  There  are  no  separate	options	for enabling UTF-8, UTF-16 and
       UTF-32 independently because that would allow ridiculous	settings  such
       as  requesting UTF-16 support while building only the 8-bit library. It
       is not possible to build	one library with UTF support and another with-
       out  in the same	configuration. (For backwards compatibility, --enable-
       utf8 is a synonym of --enable-utf.)

       Of itself, this setting does not	make  PCRE  treat  strings  as	UTF-8,
       UTF-16  or UTF-32. As well as compiling PCRE with this option, you also
       have have to set	the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32	option (as ap-
       propriate) when you call	one of the pattern compiling functions.

       If  you	set --enable-utf when compiling	in an EBCDIC environment, PCRE
       expects its input to be either ASCII or UTF-8 (depending	 on  the  run-
       time option). It	is not possible	to support both	EBCDIC and UTF-8 codes
       in the same version of  the  library.  Consequently,  --enable-utf  and
       --enable-ebcdic are mutually exclusive.

UNICODE	CHARACTER PROPERTY SUPPORT
       UTF  support allows the libraries to process character codepoints up to
       0x10ffff	in the strings that they handle. On its	own, however, it  does
       not provide any facilities for accessing	the properties of such charac-
       ters. If	you want to be able to use the pattern escapes \P, \p, and \X,
       which refer to Unicode character	properties, you	must add

	 --enable-unicode-properties

       to  the	configure  command. This implies UTF support, even if you have
       not explicitly requested	it.

       Including Unicode property support adds around 30K  of  tables  to  the
       PCRE  library.  Only  the general category properties such as Lu	and Nd
       are supported. Details are given	in the pcrepattern documentation.

JUST-IN-TIME COMPILER SUPPORT
       Just-in-time compiler support is	included in the	build by specifying

	 --enable-jit

       This support is available only for certain hardware  architectures.  If
       this  option is set for an unsupported architecture, a compile time er-
       ror occurs.  See	the pcrejit documentation for a	discussion of JIT  us-
       age.  When  JIT support is enabled, pcregrep automatically makes	use of
       it, unless you add

	 --disable-pcregrep-jit

       to the "configure" command.

CODE VALUE OF NEWLINE
       By default, PCRE	interprets the linefeed	(LF) character	as  indicating
       the  end	 of  a line. This is the normal	newline	character on Unix-like
       systems.	You can	compile	PCRE to	use carriage return (CR)  instead,  by
       adding

	 --enable-newline-is-cr

       to  the	configure  command. There is also a --enable-newline-is-lf op-
       tion, which explicitly specifies	linefeed as the	newline	character.

       Alternatively, you can specify that line	endings	are to be indicated by
       the two character sequence CRLF.	If you want this, add

	 --enable-newline-is-crlf

       to the configure	command. There is a fourth option, specified by

	 --enable-newline-is-anycrlf

       which  causes  PCRE  to recognize any of	the three sequences CR,	LF, or
       CRLF as indicating a line ending. Finally, a fifth option, specified by

	 --enable-newline-is-any

       causes PCRE to recognize	any Unicode newline sequence.

       Whatever	line ending convention is selected when	PCRE is	built  can  be
       overridden  when	 the library functions are called. At build time it is
       conventional to use the standard	for your operating system.

WHAT \R	MATCHES
       By default, the sequence	\R in a	pattern	matches	 any  Unicode  newline
       sequence,  whatever  has	 been selected as the line ending sequence. If
       you specify

	 --enable-bsr-anycrlf

       the default is changed so that \R matches only CR, LF, or  CRLF.	 What-
       ever  is	selected when PCRE is built can	be overridden when the library
       functions are called.

POSIX MALLOC USAGE
       When the	8-bit library is called	through	the POSIX interface  (see  the
       pcreposix  documentation),  additional  working storage is required for
       holding the pointers to capturing  substrings,  because	PCRE  requires
       three integers per substring, whereas the POSIX interface provides only
       two. If the number of expected substrings is small, the	wrapper	 func-
       tion  uses  space  on the stack,	because	this is	faster than using mal-
       loc() for each call. The	default	threshold above	which the stack	is  no
       longer used is 10; it can be changed by adding a	setting	such as

	 --with-posix-malloc-threshold=20

       to the configure	command.

HANDLING VERY LARGE PATTERNS
       Within  a  compiled  pattern,  offset values are	used to	point from one
       part to another (for example, from an opening parenthesis to an	alter-
       nation  metacharacter).	By default, in the 8-bit and 16-bit libraries,
       two-byte	values are used	for these offsets, leading to a	 maximum  size
       for  a compiled pattern of around 64K. This is sufficient to handle all
       but the most gigantic patterns.	Nevertheless, some people do  want  to
       process	truly  enormous	patterns, so it	is possible to compile PCRE to
       use three-byte or four-byte offsets by adding a setting such as

	 --with-link-size=3

       to the configure	command. The value given must be 2, 3, or 4.  For  the
       16-bit  library,	 a  value of 3 is rounded up to	4. In these libraries,
       using longer offsets slows down the operation of	PCRE because it	has to
       load  additional	 data  when  handling them. For	the 32-bit library the
       value is	always 4 and cannot be overridden; the value  of  --with-link-
       size is ignored.

AVOIDING EXCESSIVE STACK USAGE
       When matching with the pcre_exec() function, PCRE implements backtrack-
       ing by making recursive calls to	an internal function  called  match().
       In  environments	 where	the size of the	stack is limited, this can se-
       verely limit PCRE's operation. (The Unix	environment does  not  usually
       suffer from this	problem, but it	may sometimes be necessary to increase
       the maximum stack size.	There is a discussion in the  pcrestack	 docu-
       mentation.)  An alternative approach to recursion that uses memory from
       the heap	to remember data, instead of using recursive  function	calls,
       has  been  implemented to work round the	problem	of limited stack size.
       If you want to build a version of PCRE that works this way, add

	 --disable-stack-for-recursion

       to the configure	command. With this configuration, PCRE	will  use  the
       pcre_stack_malloc  and pcre_stack_free variables	to call	memory manage-
       ment functions. By default these	point to malloc() and free(), but  you
       can replace the pointers	so that	your own functions are used instead.

       Separate	 functions  are	 provided  rather  than	 using pcre_malloc and
       pcre_free because the usage is very predictable:	the  block  sizes  re-
       quested are always the same, and	the blocks are always freed in reverse
       order. A	calling	program	might be able to implement optimized functions
       that perform better than	malloc() and free(). PCRE runs noticeably more
       slowly when built in this way. This option affects only the pcre_exec()
       function; it is not relevant for	pcre_dfa_exec().

LIMITING PCRE RESOURCE USAGE
       Internally,  PCRE has a function	called match(),	which it calls repeat-
       edly  (sometimes	 recursively)  when  matching  a  pattern   with   the
       pcre_exec()  function.  By controlling the maximum number of times this
       function	may be called during a single matching operation, a limit  can
       be  placed  on  the resources used by a single call to pcre_exec(). The
       limit can be changed at run time, as described in the pcreapi  documen-
       tation.	The default is 10 million, but this can	be changed by adding a
       setting such as

	 --with-match-limit=500000

       to  the	configure  command.  This  setting  has	 no  effect   on   the
       pcre_dfa_exec() matching	function.

       In  some	 environments  it is desirable to limit	the depth of recursive
       calls of	match()	more strictly than the total number of calls, in order
       to  restrict  the maximum amount	of stack (or heap, if --disable-stack-
       for-recursion is	specified) that	is used. A second limit	controls this;
       it  defaults to the value that is set for --with-match-limit, which im-
       poses no	additional constraints.	However, you can set a lower limit  by
       adding, for example,

	 --with-match-limit-recursion=10000

       to  the	configure  command.  This  value can also be overridden	at run
       time.

CREATING CHARACTER TABLES AT BUILD TIME
       PCRE uses fixed tables for processing characters	whose code values  are
       less  than 256. By default, PCRE	is built with a	set of tables that are
       distributed in the file pcre_chartables.c.dist. These  tables  are  for
       ASCII codes only. If you	add

	 --enable-rebuild-chartables

       to  the	configure  command, the	distributed tables are no longer used.
       Instead,	a program called dftables is compiled and  run.	 This  outputs
       the source for new set of tables, created in the	default	locale of your
       C run-time system. (This	method of replacing the	tables does  not  work
       if  you are cross compiling, because dftables is	run on the local host.
       If you need to create alternative tables	when cross compiling, you will
       have to do so "by hand".)

USING EBCDIC CODE
       PCRE  assumes  by  default that it will run in an environment where the
       character code is ASCII (or Unicode, which is  a	 superset  of  ASCII).
       This  is	 the  case for most computer operating systems.	PCRE can, how-
       ever, be	compiled to run	in an EBCDIC environment by adding

	 --enable-ebcdic

       to the configure	command. This setting implies --enable-rebuild-charta-
       bles.  You should only use it if	you know that you are in an EBCDIC en-
       vironment (for example, an IBM mainframe	operating system).  The	 --en-
       able-ebcdic option is incompatible with --enable-utf.

       The EBCDIC character that corresponds to	an ASCII LF is assumed to have
       the value 0x15 by default. However, in some EBCDIC  environments,  0x25
       is used.	In such	an environment you should use

	 --enable-ebcdic-nl25

       as well as, or instead of, --enable-ebcdic. The EBCDIC character	for CR
       has the same value as in	ASCII, namely, 0x0d.  Whichever	 of  0x15  and
       0x25 is not chosen as LF	is made	to correspond to the Unicode NEL char-
       acter (which, in	Unicode, is 0x85).

       The options that	select newline behaviour, such as --enable-newline-is-
       cr, and equivalent run-time options, refer to these character values in
       an EBCDIC environment.

PCREGREP OPTIONS FOR COMPRESSED	FILE SUPPORT
       By default, pcregrep reads all files as plain text. You can build it so
       that it recognizes files	whose names end	in .gz or .bz2,	and reads them
       with libz or libbz2, respectively, by adding one	or both	of

	 --enable-pcregrep-libz
	 --enable-pcregrep-libbz2

       to the configure	command. These options naturally require that the rel-
       evant  libraries	 are installed on your system. Configuration will fail
       if they are not.

PCREGREP BUFFER	SIZE
       pcregrep	uses an	internal buffer	to hold	a "window" on the file	it  is
       scanning, in order to be	able to	output "before"	and "after" lines when
       it finds	a match. The size of the buffer	is controlled by  a  parameter
       whose default value is 20K. The buffer itself is	three times this size,
       but because of the way it is used for holding "before" lines, the long-
       est  line  that	is guaranteed to be processable	is the parameter size.
       You can change the default parameter value by adding, for example,

	 --with-pcregrep-bufsize=50K

       to the configure	command. The caller of pcregrep	can, however, override
       this value by specifying	a run-time option.

PCRETEST OPTION	FOR LIBREADLINE	SUPPORT
       If you add

	 --enable-pcretest-libreadline

       to  the	configure command, pcretest is linked with the libreadline li-
       brary, and when its input is from a terminal, it	 reads	it  using  the
       readline() function. This provides line-editing and history facilities.
       Note that libreadline is	GPL-licensed, so if you	distribute a binary of
       pcretest	linked in this way, there may be licensing issues.

       Setting	this  option  causes  the -lreadline option to be added	to the
       pcretest	build. In many operating environments with  a  sytem-installed
       libreadline this	is sufficient. However,	in some	environments (e.g.  if
       an unmodified distribution version of readline is in use),  some	 extra
       configuration  may  be necessary. The INSTALL file for libreadline says
       this:

	 "Readline uses	the termcap functions, but does	not link with the
	 termcap or curses library itself, allowing applications which link
	 with readline the to choose an	appropriate library."

       If your environment has not been	set up so that an appropriate  library
       is automatically	included, you may need to add something	like

	 LIBS="-ncurses"

       immediately before the configure	command.

DEBUGGING WITH VALGRIND	SUPPORT
       By adding the

	 --enable-valgrind

       option  to to the configure command, PCRE will use valgrind annotations
       to mark certain memory regions as unaddressable.	This allows it to  de-
       tect  invalid  memory accesses, and is mostly useful for	debugging PCRE
       itself.

CODE COVERAGE REPORTING
       If your C compiler is gcc, you can build	a version  of  PCRE  that  can
       generate	a code coverage	report for its test suite. To enable this, you
       must install lcov version 1.6 or	above. Then specify

	 --enable-coverage

       to the configure	command	and build PCRE in the usual way.

       Note that using ccache (a caching C compiler) is	incompatible with code
       coverage	 reporting. If you have	configured ccache to run automatically
       on your system, you must	set the	environment variable

	 CCACHE_DISABLE=1

       before running make to build PCRE, so that ccache is not	used.

       When --enable-coverage is used,	the  following	addition  targets  are
       added to	the Makefile:

	 make coverage

       This  creates  a	 fresh	coverage report	for the	PCRE test suite. It is
       equivalent to running "make coverage-reset", "make  coverage-baseline",
       "make check", and then "make coverage-report".

	 make coverage-reset

       This zeroes the coverage	counters, but does nothing else.

	 make coverage-baseline

       This captures baseline coverage information.

	 make coverage-report

       This creates the	coverage report.

	 make coverage-clean-report

       This  removes the generated coverage report without cleaning the	cover-
       age data	itself.

	 make coverage-clean-data

       This removes the	captured coverage data without removing	 the  coverage
       files created at	compile	time (*.gcno).

	 make coverage-clean

       This  cleans all	coverage data including	the generated coverage report.
       For more	information about code coverage, see the gcov and  lcov	 docu-
       mentation.

SEE ALSO
       pcreapi(3), pcre16, pcre32, pcre_config(3).

AUTHOR
       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION
       Last updated: 12	May 2013
       Copyright (c) 1997-2013 University of Cambridge.

PCRE 8.33			  12 May 2013			  PCREBUILD(3)

NAME | BUILDING PCRE | PCRE BUILD-TIME OPTIONS | BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES | BUILDING SHARED AND STATIC LIBRARIES | C++ SUPPORT | UTF-8, UTF-16 AND UTF-32 SUPPORT | UNICODE CHARACTER PROPERTY SUPPORT | JUST-IN-TIME COMPILER SUPPORT | CODE VALUE OF NEWLINE | WHAT \R MATCHES | POSIX MALLOC USAGE | HANDLING VERY LARGE PATTERNS | AVOIDING EXCESSIVE STACK USAGE | LIMITING PCRE RESOURCE USAGE | CREATING CHARACTER TABLES AT BUILD TIME | USING EBCDIC CODE | PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT | PCREGREP BUFFER SIZE | PCRETEST OPTION FOR LIBREADLINE SUPPORT | DEBUGGING WITH VALGRIND SUPPORT | CODE COVERAGE REPORTING | SEE ALSO | AUTHOR | REVISION

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=pcrebuild&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help