Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PPIx::Regexp(3)	      User Contributed Perl Documentation      PPIx::Regexp(3)

       PPIx::Regexp - Represent	a regular expression of	some sort

	use PPIx::Regexp;
	use PPIx::Regexp::Dumper;
	my $re = PPIx::Regexp->new( 'qr{foo}smx' );
	PPIx::Regexp::Dumper->new( $re )

       "PPIx::Regexp" is a PPIx::Regexp::Node.

       "PPIx::Regexp" has no descendants.

       The purpose of the PPIx-Regexp package is to parse regular expressions
       in a manner similar to the way the PPI package parses Perl. This	class
       forms the root of the parse tree, playing a role	similar	to

       This package shares with	PPI the	property of being round-trip safe.
       That is,

	my $expr = 's/ ( \d+ ) ( \D+ ) /$2$1/smxg';
	my $re = PPIx::Regexp->new( $expr );
	print $re->content() eq	$expr ?	"yes\n"	: "no\n"

       should print 'yes' for any valid	regular	expression.

       Navigation is similar to	that provided by PPI. That is to say, things
       like "children",	"find_first", "snext_sibling" and so on	all work
       pretty much the same way	as in PPI.

       The class hierarchy is also similar to PPI. Except for some utility
       classes (the dumper, the	lexer, and the tokenizer) all classes are
       descended from PPIx::Regexp::Element, which provides basic navigation.
       Tokens are descended from PPIx::Regexp::Token, which provides content.
       All containers are descended from PPIx::Regexp::Node, which provides
       for children, and all structure elements	are descended from
       PPIx::Regexp::Structure,	which provides beginning and ending
       delimiters, and a type.

       There are two features of PPI that this package does not	provide	-
       mutability and operator overloading. There are no plans for serious
       mutability, though something like PPI's "prune" functionality might be
       considered. Similarly there are no plans	for operator overloading,
       which appears to	the author to represent	a performance hit for little
       tangible	gain.

       The author will attempt to preserve the documented interface, but if
       the interface needs to change to	correct	some egregiously bad design or
       implementation decision,	then it	will change.  Any incompatible changes
       will go through a deprecation cycle.

       The goal	of this	package	is to parse well-formed	regular	expressions
       correctly. A secondary goal is not to blow up on	ill-formed regular
       expressions. The	correct	identification and characterization of ill-
       formed regular expressions is not a goal	of this	package.

       This policy attempts to track features in development releases as well
       as public releases. However, features added in a	development release
       and then	removed	before the next	production release will	not be
       tracked,	and any	functionality relating to such features	will be
       removed.	The issue here is the potential	re-use (with different
       semantics) of syntax that did not make it into the production release.

       This class provides the following public	methods. Methods not
       documented here are private, and	unsupported in the sense that the
       author reserves the right to change or remove them without notice.

	my $re = PPIx::Regexp->new('/foo/');

       This method instantiates	a "PPIx::Regexp" object	from a string, a
       PPI::Token::QuoteLike::Regexp, a	PPI::Token::Regexp::Match, or a
       PPI::Token::Regexp::Substitute.	Honestly, any PPI::Element will	do,
       but only	the three Regexp classes mentioned previously are likely to do
       anything	useful.

       Optionally you can pass one or more name/value pairs after the regular
       expression. The possible	options	are:

       default_modifiers array_reference
	   This	option specifies a reference to	an array of default modifiers
	   to apply to the regular expression being parsed. Each modifier is
	   specified as	a string. Any actual modifiers found supersede the

	   When	applying the defaults, '?' and '/' are completely ignored, and
	   '^' is ignored unless it occurs at the beginning of the modifier.
	   The first dash ('-')	causes subsequent modifiers to be negated.

	   So, for example, if you wish	to produce a "PPIx::Regexp" object
	   representing	the regular expression in

	    use	re '/smx';
	       no re '/x';
	       m/ foo /;

	   you would (after some help from PPI in finding the relevant
	   statements),	do something like

	    my $re = PPIx::Regexp->new(	'm/ foo	/',
		default_modifiers => [ '/smx', '-/x' ] );
	   =item encoding name

	   This	option specifies the encoding of the regular expression. This
	   is passed to	the tokenizer, which will "decode" the regular
	   expression string before it tokenizes it. For example:

	    my $re = PPIx::Regexp->new(	'/foo/',
		encoding => 'iso-8859-1',

       trace number
	   If greater than zero, this option causes trace output from the
	   parse.  The author reserves the right to change or eliminate	this
	   without notice.

       Passing optional	input other than the above is not an error, but
       neither is it supported.

       This static method wraps	"new" in a caching mechanism. Only one object
       will be generated for a given PPI::Element, no matter how many times
       this method is called. Calls after the first for	a given	PPI::Element
       simply return the same "PPIx::Regexp" object.

       When the	"PPIx::Regexp" object is returned from cache, the values of
       the optional arguments are ignored.

       Calls to	this method with the regular expression	in a string rather
       than a PPI::Element will	not be cached.

       Caveat: This method is provided for code	like Perl::Critic which	might
       instantiate the same object multiple times. The cache will persist
       until "flush_cache" is called.

	$re->flush_cache();	       # Remove	$re from cache
	PPIx::Regexp->flush_cache();   # Empty the cache

       This method flushes the cache used by "new_from_cache". If called as a
       static method with no arguments,	the entire cache is emptied. Otherwise
       any objects specified are removed from the cache.

	foreach	my $name ( $re->capture_names()	) {
	    print "Capture name	'$name'\n";

       This convenience	method returns the capture names found in the regular

       This method is equivalent to


       except that if "$self->regular_expression()" returns "undef" (meaning
       that something went terribly wrong with the parse) this method will
       simply return.

	print join("\t", PPIx::Regexp->new('s/foo/bar/')->delimiters());
	# prints '//	  //'

       When called in list context, this method	returns	either one or two
       strings,	depending on whether the parsed	expression has a replacement
       string. In the case of non-bracketed substitutions, the start delimiter
       of the replacement string is considered to be the same as its finish
       delimiter, as illustrated by the	above example.

       When called in scalar context, you get the delimiters of	the regular
       expression; that	is, element 0 of the array that	is returned in list

       Optionally, you can pass	an index value and the corresponding
       delimiters will be returned; index 0 represents the regular
       expression's delimiters,	and index 1 represents the replacement
       string's	delimiters, which may be undef.	For example,

	print PPIx::Regexp->new('s{foo}<bar>')-delimiters(1);
	# prints '<>'

       If the object was not initialized with a	valid regexp of	some sort, the
       results of this method are undefined.

       This static method returns the error string from	the most recent
       attempt to instantiate a	"PPIx::Regexp".	It will	be "undef" if the most
       recent attempt succeeded.

	print "There were ", $re->failures(), "	parse failures\n";

       This method returns the number of parse failures. This is a count of
       the number of unknown tokens plus the number of unterminated structures
       plus the	number of unmatched right brackets of any sort.

	print "Highest used capture number ",
	    $re->max_capture_number(), "\n";

       This convenience	method returns the highest capture number used by the
       regular expression. If there are	no captures, the return	will be	0.

       This method is equivalent to


       except that if "$self->regular_expression()" returns "undef" (meaning
       that something went terribly wrong with the parse) this method will

	my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
	print $re->modifier()->content(), "\n";
	# prints 'smx'.

       This method retrieves the modifier of the object. This comes from the
       end of the initializing string or object	and will be a

       Note that this object represents	the actual modifiers present on	the
       regexp, and does	not take into account any that may have	been applied
       by default (i.e.	via the	"default_modifiers" argument to	"new()"). For
       something that takes account of default modifiers, see
       modifier_asserted(), below.

       In the event of a parse failure,	there may not be a modifier present,
       in which	case nothing is	returned.

	my $re = PPIx::Regexp->new( '/ . /',
	    default_modifiers => [ 'smx' ] );
	print $re->modifier_asserted( 'x' ) ? "yes\n" :	"no\n";
	# prints 'yes'.

       This method returns true	if the given modifier is asserted for the
       regexp, whether explicitly or by	the modifiers passed in	the
       "default_modifiers" argument.

       Starting	with version 0.036_01, if the argument is a single-character
       modifier	followed by an asterisk	(intended as a wild card character),
       the return is the number	of times that modifier appears.	In this	case
       an exception will be thrown if you specify a multi-character modifier
       (e.g.  'ee*'), or if you	specify	one of the match semantics modifiers
       (e.g.  'a*').

	my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
	print $re->regular_expression()->content(), "\n";
	# prints '/(foo)/'.

       This method returns that	portion	of the object which actually
       represents a regular expression.

	my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
	print $re->replacement()->content(), "\n";
	# prints '${1}bar/'.

       This method returns that	portion	of the object which represents the
       replacement string. This	will be	"undef"	unless the regular expression
       actually	has a replacement string. Delimiters will be included, but
       there will be no	beginning delimiter unless the regular expression was

	my $source = $re->source();

       This method returns the object or string	that was used to instantiate
       the object.

	my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
	print $re->type()->content(), "\n";
	# prints 's'.

       This method retrieves the type of the object. This comes	from the
       beginning of the	initializing string or object, and will	be a
       PPIx::Regexp::Token::Structure whose "content" is one of	's', 'm',
       'qr', or	''.

       By the nature of	this module, it	is never going to get everything
       right.  Many of the known problem areas involve interpolations one way
       or another.

   Ambiguous Syntax
       Perl's regular expressions contain cases	where the syntax is ambiguous.
       A particularly egregious	example	is an interpolation followed by	square
       or curly	brackets, for example $foo[...]. There is nothing in the
       syntax to say whether the programmer wanted to interpolate an element
       of array	@foo, or whether he wanted to interpolate scalar $foo, and
       then follow that	interpolation by a character class.

       The perlop documentation	notes that in this case	what Perl does is to
       guess. That is, it employs various heuristics on	the code to try	to
       figure out what the programmer wanted. These heuristics are documented
       as being	undocumented (!) and subject to	change without notice.

       Given this situation, this module's chances of duplicating every	Perl
       version's interpretation	of every regular expression are	pretty much
       nil.  What it does now is to assume that	square brackets	containing
       only an integer or an interpolation represent a subscript; otherwise
       they represent a	character class. Similarly, curly brackets containing
       only a bareword or an interpolation are a subscript; otherwise they
       represent a quantifier.

   Changes in Syntax
       Sometimes the introduction of new syntax	changes	the way	a regular
       expression is parsed. For example, the "\v" character class was
       introduced in Perl 5.9.5. But it	did not	represent a syntax error prior
       to that version of Perl,	it was simply parsed as	"v". So

	$ perl -le 'print "v" =~ m/\v/ ? "yes" : "no"'

       prints "yes" under Perl 5.8.9, but "no" under 5.10.0. "PPIx::Regexp"
       generally assumes the more modern parse in cases	like this.

   Static Parsing
       It is well known	that Perl can not be statically	parsed.	That is, you
       can not completely parse	a piece	of Perl	code without executing that
       same code.

       Nevertheless, this class	is trying to statically	parse regular
       expressions. The	main problem with this is that there is	no way to know
       what is being interpolated into the regular expression by an
       interpolated variable. This is a	problem	because	the interpolated value
       can change the interpretation of	adjacent elements.

       This module deals with this by making assumptions about what is in an
       interpolated variable. These assumptions	will not be enumerated here,
       but in general the principal is to assume the interpolated value	does
       not change the interpretation of	the regular expression.	For example,

	my $foo	= 'a-z]';
	my $re = qr{[$foo};

       is fine with the	Perl interpreter, but will confuse the dickens out of
       this module. Similarly and more usefully, something like

	my $mods = 'i';
	my $re = qr{(?$mods:foo)};

       or maybe

	my $mods = 'i';
	my $re = qr{(?$mods)$foo};

       probably	sets a modifier	of some	sort, and that is how this module
       interprets it. If the interpolation is not about	modifiers, this	module
       will get	it wrong. Another such semi-benign example is

	my $foo	= $] >=	5.010 ?	'?<foo>' : '';
	my $re = qr{($foo\w+)};

       which will parse, but this module will never realize that it might be
       looking at a named capture.

   Non-Standard	Syntax
       There are modules out there that	alter the syntax of Perl. If the
       syntax of a regular expression is altered, this module has no way to
       understand that it has been altered, much less to adapt to the
       alteration. The following modules are known to cause problems:

       Acme::PerlML, which renders Perl	as XML.

       Data::PostfixDeref, which causes	Perl to	interpret suffixed empty
       brackets	as dereferencing the thing they	suffix.

       Filter::Trigraph, which recognizes ANSI C trigraphs, allowing Perl to
       be written in the ISO 646 character set.

       Perl6::Pugs. Enough said.

       Perl6::Rules, which back-ports some of the Perl 6 regular expression
       syntax to Perl 5.

       Regexp::Extended, which extends regular expressions in various ways,
       some of which seem to conflict with Perl	5.010.

       Regexp::Parser, which parses a bare regular expression (without
       enclosing "qr{}", "m//",	or whatever) and uses a	different navigation

       Support is by the author. Please	file bug reports at
       <>, or	in electronic mail to the author.

       Thomas R. Wyant,	III wyant at cpan dot org

       Copyright (C) 2009-2014 by Thomas R. Wyant, III

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl 5.10.0. For	more details, see the full
       text of the licenses in the directory LICENSES.

       This program is distributed in the hope that it will be useful, but
       without any warranty; without even the implied warranty of
       merchantability or fitness for a	particular purpose.

perl v5.24.1			  2014-11-12		       PPIx::Regexp(3)


Want to link to this manual page? Use this URL:

home | help