Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PCRECOMPAT(3)		   Library Functions Manual		 PCRECOMPAT(3)

       PCRE - Perl-compatible regular expressions

       This  document describes	the differences	in the ways that PCRE and Perl
       handle regular expressions. The differences described here are with re-
       spect to	Perl versions 5.10 and above.

       1. PCRE has only	a subset of Perl's Unicode support. Details of what it
       does have are given in the pcreunicode page.

       2. PCRE allows repeat quantifiers only on parenthesized assertions, but
       they  do	 not mean what you might think.	For example, (?!a){3} does not
       assert that the next three characters are not "a". It just asserts that
       the next	character is not "a" three times (in principle:	PCRE optimizes
       this to run the assertion just once). Perl allows repeat	quantifiers on
       other assertions	such as	\b, but	these do not seem to have any use.

       3.  Capturing  subpatterns  that	occur inside negative lookahead	asser-
       tions are counted, but their entries in the offsets  vector  are	 never
       set.  Perl sometimes (but not always) sets its numerical	variables from
       inside negative assertions.

       4. Though binary	zero characters	are supported in the  subject  string,
       they are	not allowed in a pattern string	because	it is passed as	a nor-
       mal C string, terminated	by zero. The escape sequence \0	can be used in
       the pattern to represent	a binary zero.

       5.  The	following Perl escape sequences	are not	supported: \l, \u, \L,
       \U, and \N when followed	by a character name or Unicode value.  (\N  on
       its own,	matching a non-newline character, is supported.) In fact these
       are implemented by Perl's general string-handling and are not  part  of
       its  pattern  matching engine. If any of	these are encountered by PCRE,
       an error	is generated by	default. However, if the  PCRE_JAVASCRIPT_COM-
       PAT  option  is set, \U and \u are interpreted as JavaScript interprets

       6. The Perl escape sequences \p,	\P, and	\X are supported only if  PCRE
       is  built  with Unicode character property support. The properties that
       can be tested with \p and \P are	limited	to the general category	 prop-
       erties  such  as	 Lu and	Nd, script names such as Greek or Han, and the
       derived properties Any and L&. PCRE does	 support  the  Cs  (surrogate)
       property,  which	 Perl  does  not; the Perl documentation says "Because
       Perl hides the need for the user	to understand the internal representa-
       tion  of	Unicode	characters, there is no	need to	implement the somewhat
       messy concept of	surrogates."

       7. PCRE does support the	\Q...\E	escape for quoting substrings. Charac-
       ters  in	 between  are  treated as literals. This is slightly different
       from Perl in that $ and @ are  also  handled  as	 literals  inside  the
       quotes.	In Perl, they cause variable interpolation (but	of course PCRE
       does not	have variables). Note the following examples:

	   Pattern	      PCRE matches	Perl matches

	   \Qabc$xyz\E	      abc$xyz		abc followed by	the
						  contents of $xyz
	   \Qabc\$xyz\E	      abc\$xyz		abc\$xyz
	   \Qabc\E\$\Qxyz\E   abc$xyz		abc$xyz

       The \Q...\E sequence is recognized both inside  and  outside  character

       8. Fairly obviously, PCRE does not support the (?{code})	and (??{code})
       constructions. However, there is	support	for recursive  patterns.  This
       is  not	available  in Perl 5.8,	but it is in Perl 5.10.	Also, the PCRE
       "callout" feature allows	an external function to	be called during  pat-
       tern matching. See the pcrecallout documentation	for details.

       9.  Subpatterns	that  are called as subroutines	(whether or not	recur-
       sively) are always treated as atomic  groups  in	 PCRE.	This  is  like
       Python,	but  unlike Perl.  Captured values that	are set	outside	a sub-
       routine call can	be reference from inside in PCRE,  but	not  in	 Perl.
       There is	a discussion that explains these differences in	more detail in
       the section on recursion	differences from Perl in the pcrepattern page.

       10. If any of the backtracking control verbs are	used in	 a  subpattern
       that  is	called as a subroutine (whether	or not recursively), their ef-
       fect is confined	to that	subpattern; it does not	 extend	 to  the  sur-
       rounding	 pattern.  This	is not always the case in Perl.	In particular,
       if (*THEN) is present in	a group	that is	called as  a  subroutine,  its
       action is limited to that group,	even if	the group does not contain any
       | characters. Note that such subpatterns	are processed as  anchored  at
       the point where they are	tested.

       11.  If a pattern contains more than one	backtracking control verb, the
       first one that is backtracked onto acts.	For example,  in  the  pattern
       A(*COMMIT)B(*PRUNE)C  a	failure	in B triggers (*COMMIT), but a failure
       in C triggers (*PRUNE). Perl's behaviour	is more	complex; in many cases
       it is the same as PCRE, but there are examples where it differs.

       12.  Most  backtracking	verbs in assertions have their normal actions.
       They are	not confined to	the assertion.

       13. There are some differences that are concerned with the settings  of
       captured	 strings  when	part  of  a  pattern is	repeated. For example,
       matching	"aba" against the pattern /^(a(b)?)+$/ in Perl leaves  $2  un-
       set, but	in PCRE	it is set to "b".

       14.  PCRE's handling of duplicate subpattern numbers and	duplicate sub-
       pattern names is	not as general as Perl's. This is a consequence	of the
       fact the	PCRE works internally just with	numbers, using an external ta-
       ble to translate	between	numbers	and names. In  particular,  a  pattern
       such  as	 (?|(?<a>A)|(?<b>B),  where the	two capturing parentheses have
       the same	number but different names, is not supported,  and  causes  an
       error  at compile time. If it were allowed, it would not	be possible to
       distinguish which parentheses matched, because both names map  to  cap-
       turing subpattern number	1. To avoid this confusing situation, an error
       is given	at compile time.

       15. Perl	recognizes comments in some places that	PCRE does not, for ex-
       ample, between the ( and	? at the start of a subpattern.	If the /x mod-
       ifier is	set, Perl allows white space between ( and ?  (though  current
       Perls  warn  that  this is deprecated) but PCRE never does, even	if the
       PCRE_EXTENDED option is set.

       16. Perl, when in warning mode, gives warnings  for  character  classes
       such  as	 [A-\d]	or [a-[:digit:]]. It then treats the hyphens as	liter-
       als. PCRE has no	warning	features, so it	gives an error in these	 cases
       because they are	almost certainly user mistakes.

       17.  In	PCRE,  the upper/lower case character properties Lu and	Ll are
       not affected when case-independent matching is specified. For  example,
       \p{Lu} always matches an	upper case letter. I think Perl	has changed in
       this respect; in	the release at the time	of writing (5.16), \p{Lu}  and
       \p{Ll} match all	letters, regardless of case, when case independence is

       18. PCRE	provides some extensions to the	Perl regular expression	facil-
       ities.	Perl  5.10  includes new features that are not in earlier ver-
       sions of	Perl, some of which (such as named parentheses)	have  been  in
       PCRE for	some time. This	list is	with respect to	Perl 5.10:

       (a)  Although  lookbehind  assertions  in  PCRE must match fixed	length
       strings,	each alternative branch	of a lookbehind	assertion can match  a
       different  length  of  string.  Perl requires them all to have the same

       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set,	the  $
       meta-character matches only at the very end of the string.

       (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
       cial meaning is faulted.	Otherwise, like	Perl, the backslash is quietly
       ignored.	 (Perl can be made to issue a warning.)

       (d)  If	PCRE_UNGREEDY is set, the greediness of	the repetition quanti-
       fiers is	inverted, that is, by default they are not greedy, but if fol-
       lowed by	a question mark	they are.

       (e) PCRE_ANCHORED can be	used at	matching time to force a pattern to be
       tried only at the first matching	position in the	subject	string.

       and  PCRE_NO_AUTO_CAPTURE  options for pcre_exec() have no Perl equiva-

       (g) The \R escape sequence can be restricted to match only CR,  LF,  or
       CRLF by the PCRE_BSR_ANYCRLF option.

       (h) The callout facility	is PCRE-specific.

       (i) The partial matching	facility is PCRE-specific.

       (j) Patterns compiled by	PCRE can be saved and re-used at a later time,
       even on different hosts that have the other endianness.	However,  this
       does not	apply to optimized data	created	by the just-in-time compiler.

       (k)    The    alternative    matching	functions    (pcre_dfa_exec(),
       pcre16_dfa_exec() and pcre32_dfa_exec(),) match in a different way  and
       are not Perl-compatible.

       (l)  PCRE  recognizes some special sequences such as (*CR) at the start
       of a pattern that set overall options that cannot be changed within the

       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

       Last updated: 10	November 2013
       Copyright (c) 1997-2013 University of Cambridge.

PCRE 8.34		       10 November 2013			 PCRECOMPAT(3)


Want to link to this manual page? Use this URL:

home | help