Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Rules(3)	      User Contributed Perl Documentation	      Rules(3)

NAME
       Perl6::Rules - Implements (most of) the Perl 6 regex syntax

SYNOPSIS
	   # Perl 5 code...

	   use Perl6::Rules;

	   grammar HTML	{
	       rule doc	 :iw { \Q[<HTML>]  <?head>  <?body>  \Q[</HTML>] }
	       rule head :iw { \Q[<HEAD>]  <?head_tag>+	 \Q[<HEAD>] }
	       # etc.
	   }

	   $text =~ s:globally:2nd/ <?HTML.doc>	/$0{doc}{head}/;

	   rule	subj  {	<noun> }
	   rule	obj   {	<noun> }
	   rule	noun  {	time | flies | arrow }
	   rule	verb  {	flies |	like | time }
	   rule	adj   {	time }
	   rule	art   {	an? }
	   rule	prep  {	like }

	   "time flies like an arrow" =~
	       m:words:exhaustive/^ [ <?adj>  <?subj> <?verb> <?art> <?obj>
				    | <?subj> <?verb> <?prep> <?art> <?noun>
				    | <?verb> <?obj>  <?prep> <?art> <?noun>
				    ]
				 /;

	   print "Found	interpretation:\n", $_->dump
	       for @$0;

	   $dna_seq =~ m:overlap{ A <[CT]> <[AG]><3,7> <before:	C> };

	   print "Found	sequence: $_ starting at " $_->pos
	       for @$0;

	   # etc.

DESCRIPTION
       This module implements a	close simulation of the	Perl 6 rule and
       grammar constructs, translating them back to Perl 5 regexes via a
       source filter.  (And hence suffers from all the usual limitations of a
       source filter, including	the ability to translate complex code
       spectacularly wrongly).

       See LIMITATIONS for a summary of	those features that are	not currently
       supported.

       When it is "use"'d, the module expects that any subsequent match
       ("m/.../") or substitution ("s/.../.../") in the	rest of	the source
       file will be in Perl 6 syntax. It then translates every such pattern
       back to the equivalent Perl 5 syntax (where possible).

       When one	of these translated matches/substitutions is executed, it
       generates a "match object", which is available as $0 (and so, if	you
       use Perl6::Rules, the program name is no	longer available as $0).  This
       match object can	be treated as a	boolean	(in which case it returns true
       if the match succeeded, and false if it did not), or as a string	(in
       which case it returns the complete substring that the match matched),
       or as an	array (in which	case it	contains all of	the numbered captures
       -- $1, $2, etc. -- from the successful match), or as a hash (in which
       case it contains	all of the internal variables created during the
       match).

   Atoms
       Except for the special characters:

	   #  $	 @  %  ^  &  *	+  ?  (	 )  {  }  [  ]	<  >  .	 |  \

       whitespace, and certain special character sequences (see	below),	any
       character in a rule matches itself.

       Special characters can be made to match themselves by backslashing
       them:

	   \#  \$  \@  \%  \^  \&  \*  \+  \?  \(  \)  \{  \}  \[  \]  \<  \>  \.  \|  \\

       or by using one of the Perl 6 quoting constructs.

   Quantifiers
       Quantifiers control how often a particular atom matches.	Without	a
       quantifier an atom must match exactly once. The Perl 6 quantifiers are:

	   atom?	   Match the atom zero or one times
			   preferring to match once, if	possible

	   atom??	   Match the atom zero or one times
			   preferring to match zero times, if possible

	   atom*	   Match the atom zero or more times
			   preferring to match as many times as	possible

	   atom*?	   Match the atom zero or more times
			   preferring to match as few times as possible

	   atom+	   Match the atom one or more times
			   preferring to match as many times as	possible

	   atom+?	   Match the atom one or more times
			   preferring to match as few times as possible

	   atom<7>	   Match the atom exactly 7 times
			   (Any	positive integer can be	used)

	   atom<7,11>	   Match the atom between 7 and	11 times
			   preferring to match as many times as	possible.
			   (Any	positive integers can be used)

	   atom<7,11>?	   Match the atom between 7 and	11 times
			   preferring to match as few times as possible.
			   (Any	positive integers can be used)

	   atom<4,>	   Match the atom 4 or more times
			   preferring to match as many times as	possible.
			   (Any	positive integers can be used)

	   atom<4,>?	   Match the atom 4 or more times
			   preferring to match as few times as possible.
			   (Any	positive integers can be used)

       Note: Perl 6 also allows	the numbers in these ranges to be specified as
       interpolated variables, but due to limitations of the Perl 5 regex
       engine, the Perl6::Rules	module does not	currently support this
       feature.

   Alternatives
       The "|" operator	separates two alternative subpatterns. The resulting
       pattern matches if either of the	alternatives matches:

	   $animal =~ m/ cat | dog | fish | bird /;

       Note: Perl 6 also provides an "&" operator, but this is not yet
       supported by Perl6::Rules.

   Special metasequences
       A dot (".") matches any character at all	(including a newline).

       There are numerious backslashed metasequences, that match a particular
       single character, usually belonging to a	particular class of
       characters:

	   \d	Match a	single digit
	   \D	Match any single character except a digit
	   \e	Match a	single escape character
	   \E	Match any single character except an escape character
	   \f	Match a	single formfeed
	   \F	Match any single character except a formfeed
	   \h	Match a	single horizontal whitespace
	   \H	Match any single character except a horizontal whitespace
	   \n	Match a	single newline
	   \N	Match any single character except a newline
	   \r	Match a	single carriage	return
	   \R	Match any single character except a carriage return
	   \s	Match a	single whitespace character
	   \S	Match any single character except a whitespace
	   \t	Match a	single tab character
	   \T	Match any single character except a tab	character
	   \v	Match a	single vertical	whitespace
	   \V	Match any single character except a vertical whitespace
	   \w	Match a	single "word" character	(alpha,	digit, or underscore)
	   \W	Match any single character except a "word" character

   Specifying characters by name or code
       Any character can be specified by (Unicode) name, using the "\c"
       escape.	For example:

	   \c[LF]
	   \c[ESC]
	   \c[CARRIAGE RETURN]
	   \c[ARABIC LIGATURE TEH WITH MEEM WITH JEEM INITIAL FORM]
	   \c[HEBREW POINT HIRIQ]
	   \c[LOWER HALF INVERSE WHITE CIRCLE]

       Two or more such	named characters can be	specified in the same set of
       square brackets,	separated by a comma:

	   \c[CR;LF]
	   \c[ESC;LATIN	CAPITAL	LETTER Q]

       The "\C"	escape produces	the complement of the character:

	   \C[LF]		   Any character except	LINE FEED
	   \C[ESC]		   Any character except	ESCAPE
	   \C[CARRIAGE RETURN]	   Any character except	CARRIAGE RETURN

       The square brackets are always required for named characters.

       Characters and character	sequences can also be specified	by hexadecimal
       or octal	Unicode	code:

	   \x[A]	   LINE	FEED
	   \0[12]	   LINE	FEED
	   \x[1EA2]	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE
	   \0[17242]	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE
	   \x[1EA2;A]	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE; LINE	FEED
	   \0[17242;12]	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE; LINE	FEED

       Hexadecimal codes may also be complemented:

	   \X[A]	   Any character except	LINE FEED
	   \X[1EA2]	   Any character except	LATIN CAPITAL LETTER A WITH HOOK ABOVE

       For single coded	characters, the	square brackets	are not	required
       (except to avoid	ambiguity):

	   \xA		   LINE	FEED
	   \012		   LINE	FEED
	   \x1EA2	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE
	   \017242	   LATIN CAPITAL LETTER	A WITH HOOK ABOVE
	   \XA		   Any character except	LINE FEED
	   \X1EA2	   Any character except	LATIN CAPITAL LETTER A WITH HOOK ABOVE

   Anchors and assertions
       Anchors and assertions do not match any characters in the string, but
       instead test whether a particular condition is true, and	cause the
       match to	fail if	it is not.

       Perl6::Rules supports the following Perl	6 rule assertions:

	    ^	Currently matching at the start	of the entire string
	   ^^	Currently matching at the start	of a line within the string
	    $	Currently matching at the end of the entire string
	   $$	Currently matching at the end of a line	within the string

       Note that neither "$" nor $$ allows for an optional newline before the
       "end" in	question. Use "\n?$" and "\n?$$" if you	require	those
       semantics more forgiving	semantics.

	   <before: subpat>    The current match position is immediately
			       before the specified subpattern
	   <!before: subpat>   The current match position is not immediately
			       before the specified subpattern

	   <after: subpat>     The current match position is immediately
			       after the specified subpattern
	   <!after: subpat>    The current match position is not immediately
			       after the specified subpattern

	   \b	The current match position is in the middle of a \w\W or \W\w
		sequence (i.e. <after:\w><before:\W> | <after:\W><before:\w> )
	   \B	The current match position is in the middle of a \w\w or \W\W
		sequence (i.e. <after:\w><before:\w> | <after:\W><before:\W> )

       Note: Due to limitations	in the Perl 5 regex engine, the	"<after:...>"
       assertion requires that the subpattern always match a substring of
       fixed length.

   Grouping
       To group	a sequence of characters and have them treated as an atom, use
       square brackets:

	   $status =~ m/ [in]?valid /;

       Square brackets group, but do not capture.

   Capturing
       To group	a sequence of characters and have the matching substring
       captured	as well, use parentheses instead of square brackets. Each
       parenthesis captures into a successive "numeric"	variable:

	   $name =~ m/ (Mr|Mrs|Ms|Dr|Prof|Rev) (.+)  /;

	   print "Title: $1\n";
	   print "Name:	 $2\n";

   Whitespace indifference
       Whitespace is not significant in	a rules	and is usually ignored when
       matching	a pattern. For example,	this:

	   m/ <ident> =	\N+ /;

       matches exactly the same	set of strings as:

	   m/<ident>=\N+/;

       To match	actual whitespace in a string, use the appropriate backslash
       escape:

	   m/ <ident> \h* = \s*	\N+/;

       or named	characters:

	   m/ <ident> <ws> = <sp>+ \N+/;

   Making whitespace meaningful
       Just because whitespace is not significant in a rule doesn't mean it's
       not significant in the string that a rule is matching. For example:

	   $str	= "module_name = Perl6::Rules";

	   $str	=~ m/ <ident> =	\N+ /;

       will not	match, because there is	nothing	in the rule to match the
       whitespace in the string	between	"module_name" and "=".

       However,	you can	tell a rule to ignore whitespace in the	string,	by
       specifying the ":w" or ":words" modifier:

	   $str	= "module_name = Perl6::Rules";

	   $str	=~ m:words/ <ident> = \N+ /;

       This modifier causes each whitespace sequence in	the rule to be
       automagically replaced by a "\s*" or "\s+" subpattern. That is:

	   m:words/ next cmd  =	\h* <condition>/

       Is the same as:

	   m/ \s* next \s+ cmd \s* = \h* <condition>/

       If the whitespace is between two	"word" atoms --	as it is between
       "next" and "cmd"	in the above example --	then a "\s+" (mandatory
       whitespace) is inserted.	If the whitespace is between a "word" and a
       "non-word" atom -- as it	is between "cmd" and "=" above -- then a "\s*"
       (optional whitespace) is	inserted. If the atom on either	side of	the
       whitespace would	itself match whitespace	-- as for "=" and "\h*", and
       "\h*" and "<condition>" -- then no extra	whitespace matching is
       inserted.

       The overall effect is that, under ":words", any whitespace in the rule
       matches any whitespace in the string, in	the most reasonable way
       possible.

   Comments
       Any unbackslashed "#" character in a pattern starts a comment which
       runs to the end of the current line.

	   m/ <ident>  # name of environment variable
	      \h*      # optional whitespace, but stay on the same line
	      =	       # indicates that	the variable is	being set
	      \s*      # optional whitespace, can be on	separate lines
	      \N+      # everything else up to the end-of-line is the value
	    /;

   Evaluated substitutions
       When performing a substitution it is possible to	interpolate code into
       the replacement string using the	Perl 6 "$(...)"	or "@(...)"
       interpolators:

	   s/ (<sentence>) /On a dit: $( traduisez($1) )/

       Note: Perl6::Rules currently only allows	substitutions to have a	single
       "$(...)"	or "@(...)" in the replacement string.

   Repeated matches and	substitutions
       To cause	a match	or substitution	to match or substitute as many times
       as possible, specify the	":g" or	":globally" modifier before the
       pattern:

	   $str	=~ s:g{foo}{bar};	   # s/foo/bar/	as many	times as possible
	   $str	=~ s:globally{foo}{bar};   # Ditto

       To cause	a match	or substitution	to match or substitute a particular
       number of times,	specify	the ":x(...)" modifier:

	   $str	=~ s:x(2){foo}{bar};	   # s/foo/bar/	only the first two times
					   # "foo" is found

	   $str	=~ s:x(7){foo}{bar};	   # s/foo/bar/	only the first seven times
					   # "foo" is found

       The repetition count can	be a variable:

	   for my $n (2..7) {
	       $str[$n]	=~ s:x($n){foo}{bar};  # s/foo/bar/ only the first $n times
					       # "foo" is found
	   }

       If the repetition count is a constant, the ":x(...)" modifier can also
       be written as a suffix:

	   $str	=~ s:2x{foo}{bar};     # s/foo/bar/ only the first two times
				       # "foo" is found

	   $str	=~ s:7x{foo}{bar};     # s/foo/bar/ only the first seven times
				       # "foo" is found

       If you only want	the 2nd	(or 7th, or $n-th, etc.) occurance changed,
       you can use the "nth(...)" modifier instead:

	   $str	=~ s:nth(2){foo}{bar};	   # s/foo/bar/	only for the second occurance
					   # of	"foo" in the string

	   $str	=~ s:nth(7){foo}{bar};	   # s/foo/bar/	only for the seventh occurance
					   # of	"foo" in the string

	   $str	=~ s:nth($ord){foo}{bar};  # s/foo/bar/	only for the $ord-th occurance
					   # of	"foo" in the string

       If the ordinal number is	a constant, the	":nth(...)" modifier can also
       be written as a suffix:

	   $str	=~ s:2nd{foo}{bar};	   # s/foo/bar/	only the first two times
					   # "foo" is found

	   $str	=~ s:7th{foo}{bar};	   # s/foo/bar/	only the first seven times
					   # "foo" is found

       You can also combine ":globally"	with an	ordinal	modifier. For example,
       to replace every	third occurance	of "foo" with "bar":

	   $str	=~ s:globally:3rd{foo}{bar}

   Variations on global	matching
       Rules that match	":globally" do so by matching once, then restarting
       their search at the first character after the end of the	previous
       match. But there	are (at	least) two other alternative restart
       strategies for global matching, both of which Perl 6 (and Perl6::Rules)
       supports.

       Matching	":globally" will never find overlapping	matches. For example:

	   $dna	= "ACGTAGTCATGACGTACCA";

	   $dna	=~ m:globally{ A [ACGT]* T };

       will only match:

	   "ACGTAGTCATGACGT"

       after which it will try again on	the remainder of the string ("ACCA")
       and fail.

       But if you actually wanted overlapping matches from every possible
       start position:

	   "ACGTAGTCATGACGT"
	       "AGTCATGACGT"
		   "ATGACGT"
		      "ACGT"

       then you	need to	specify	":o" or	":overlap", instead of ":globally":

	   $dna	=~ m:overlap{ A	[ACGT]*	T };

       This works just like ":globally", except	that, instead of restarting
       the search from the first character after the end of the	previous
       match, it restarts the search from the first character after the	start
       of the previous match. Hence it will only ever find one match from any
       given starting position in the string, but it will find matches from
       every possible starting position, including those matched that overlap.

       Even that may not be enough. Rather than	one match at every starting
       position, you may require every possible	match at every starting
       position:

	   "ACGTAGTCATGACGT"
	   "ACGTAGTCAT"
	   "ACGTAGT"
	   "ACGT"
	       "AGTCATGACGT"
	       "AGTCAT"
	       "AGT"
		   "ATGACGT"
		   "AT"
		      "ACGT"

       To match	in this	way, use the ":e" or ":exhaustive" modifier:

	   $dna	=~ m:exhaustive{ A [ACGT]* T };

       Note that, when either ":overlap" or ":exhaustive" are specified, the
       match result returned in	$0 changes in structure. For a non-overlapping
       match $0	consists of:

	    $0	   # Complete substring	matched
	   @$0	   # Unnamed captures: ($0, $1,	$2, ...)
	   %$0	   # Named captures

       For an overlapping/exhaustive match, $0 consists	of:

	    $0	   # undef
	   @$0	   # The complete $0 of	each successive	overlapping match
	   %$0	   # Empty hash

   Ignoring case
       If you use the ":i" or ":ignorecase" modifier, the match	ignores	upper
       and lower case distinctions:

	   $str	=~ m:i/perl/;	   # Match "Perl" or "perl" or "pErL", etc.

       The ":i"	marker can also	be placed inside a rule, to turn off case
       sensitivity in only part	of the rule:

	   $title =~ m/The <sp>	[:i journal <sp> of <sp> the ] <sp> ACM	/;
	   #
	   #   match: The Journal Of The ACM
	   #	  or: The journal of the ACM
	   # but not: The journal of the acm

   Backtracking	control
       In Perl 6 a single colon	is ignored when	matching (or, in other words,
       it matches zero characters).

       However,	should the pattern subsequently	fail to	match and backtrack
       over the	single colon, it will not retry	the preceding atom. So if you
       write:

	   $str	=~ m:words/ \( <expr>  [ , <expr> ]* :	\) /

       and the match fails to find the closing parenthesis (and	hence starts
       backtracking), it will not attempt to rematch "[	, <expr" ]*> with one
       fewer repetition,but will continue backtracking and ultimately fail.
       This is a useful	optimization since a match with	one less comma'd
       expression still	wouldn't have a	parenthesis after it, so trying	it
       would be	a waste	of time).

       Note: Due to the	opaque nature of backtracking in the Perl 5 regex
       engine, Perl6::Rules cannot efficiently implement the "higher level"
       backtracking control features: "::", ":::", "commit", and "cut".	So
       these constructs	are not	currently supported.

   Starting position
       Normally	a rule attempts	to match from the start	of a string. But you
       can tell	the rule to match from the current <pos> of the	string by
       specifying the ":c" (or ":cont")	modifier:

	   $str	=~ m:c/	pattern	/  # start where the previous match on $str finished

   Code	blocks
       You can place a Perl code block inside a	rule. It will be executed when
       the rule	reaches	that point in its matching. Code execution does	not
       usually affect the match; it is typically only used for side-effects:

	   m/ (\S+) { warn "string not blank..."; $text=$1; }
	       \s+  { warn "...but does	contain	whitespace" }
	    /

       Note that variables accessed within a code block	(or indeed anywhere
       else inside a Perl 6 rule) must be accessed in Perl 6 syntax. So, this:

	   m:g/	(\S+) {	$::found{$1}++ } /;

       is equivalent to	the Perl 5:

	      /	(\S+) (?{ $::found->{$^N}++ }) /g;

       and to increment	an entry in %::found we'd need the correct Perl	6
       syntax:

	   m:g/	(\S+) {	%::found{$1}++ } /;

       A code block can	be made	to cause a match to fail, if it	calls the
       "fail" function (which is automatically exported	from Perl6::Rules):

	   $count =~ / (\d+): {$1<256 or fail} /

       By the way, that	"no backtracking" colon	is critical there. If $count
       contained 1000, then $1 would be	"1000",	the code would execute "fail"
       and the rule would backtrack. The colon prevents	the "\d+" pattern from
       then rematching just "100" instead of the full "1000", which would
       erroneously allow the pattern to	match.

   Code	assertions
       Blocks of the form "{ sometest()	or fail	}" are so common that Perl 6
       rules (and hence	Perl6::Rules) provide a	shorthand. Any expression in a
       "<(...)>" is treated as a code assertion, which causes a	match to fail
       and backtrack if	it is not true at that point in	the match. For
       example,	you could rewrite:

	   $count =~ m/	(\d+): {$1<256 or fail}	/

       more simply as:

	   $count =~ m/	(\d+): <($1<256)> /;

   Literal variable interpolation
       Variables that appear in	a  Perl	6 rule interpolate differently to
       variables that appear in	a Perl 5 regex.	Specifically, in Perl 5:

	   $dir	= "lost+found";
	   $str	=~ /$dir/;

       is the same as:

	   $str	=~ /lost+found/;

       which would match:

	   "lostfound"
	   "losttfound"
	   "lostttfound"
	   "losttttfound"
	   etc.

       In Perl 6, an interpolated scalar variable "eq" matches its contents
       against the string. So:

	   use Perl6::Rules;
	   $dir	= "lost+found";
	   $str	=~ m/$::dir/;

       would treat the contents	of $dir	as a literal sequence of characters to
       match, and hence	(only) match:

	   "lost+found"

       An interpolated array:

	   use Perl6::Rules;
	   @cmds = ('get','put','save','load','dump','quit');
	   $str	=~ m/ @::cmds /;

       matches if any of its elements "eq" matches the string at that point.
       So the above example is equivalent to:

	   $str	=~ /get|put|save|load|dump|quit/;

       An interpolated hash matches a "/\w+/" sequence and then	requires that
       that sequence is	a valid	key of the hash. So:

	   use Perl6::Rules;

	   my %cmds = (	get=>'Shorty', put=>'down', quit=>'griping' );

	   $str	=~ m/ %::cmds /;

       is a shorthand for:

	   / (\w+) { fail unless exists	%::cmds{$1} } /

       Note that the actual values in the hash are ignored.

       However,	if the hash being interpolated has a "keymatch"	trait:

	   use Perl6::Rules;

	   my %cmds is keymatch(rx/<alpha>+:/)
	       = ( get=>'Shorty', put=>'down', quit=>'griping' );

       then the	rule into which	it's interpolated uses that trait's value
       instead of "\w+"	as the required	subpattern. In which case:

	   $str	=~ m/ %::cmds /;

       would become a shorthand	for:

	   / (<alpha>+:) { fail	unless exists %::cmds{$1} } /

       instead.

       Furthermore, if the interpolated	hash also has a	"valuematch" trait:

	   use Perl6::Rules;

	   my %cmds is keymatch(rx/<alpha>+:/)
		    is valuematch(rx/\s+ <alpha>+:/)
	       = ( get=>'Shorty', put=>'down', quit=>'griping' );

       then, after the key has been successfully matched, the rule attempts to
       match the "valuematch" pattern, and requires that this secondary	match
       be equal	to the value for the previously	matched	key. That is, with a
       "valuematch" trait as well, this:

	   $str	=~ m/ %::cmds /;

       would become a shorthand	for:

	   / (<alpha>+:)     { fail unless exists %::cmds{$1} }
	     (\s+ <alpha>+:) { fail unless $2 eq %::cmds{$1}  }
	   /

       In other	words, when both traits	are specified, an interpolated hash
       has to match one	of its keys, followed by that key's value.

   Non-literal variable	interpolation
       Sometimes it would be more useful to interpolate	a variable not as a
       literal sequence	of characters to be matched, but rather	as a
       subpattern to be	matched	(i.e. the way Perl 5 does).

       To interpolate a	variable in that way in	a Perl 6 rule, place the
       variable	in angle brackets. That	is:

	   use Perl6::Rules;
	   $exclamation	= rx/Shee+sh/;
	   $str	=~ m/ <$::exclamation> /;

       would treat the contents	of $::exclamation as a subpattern (rather than
       as a literal sequence of	characters to match) and hence match:

	   "Sheesh"
	   "Sheeesh"
	   "Sheeeesh"
	   etc.

       but not:

	   "Shee+sh"

       An angle-bracketed interpolated array:

	   use Perl6::Rules;
	   @cmds = ( rx/<[gs]>et/, rx/put/, rx/save?/, rx/q[uit]?/ );
	   $str	=~ m/ <@::cmds>	/;

       treats each of its elements as a	subpattern, and	matches	if any of them
       matches at that point.  So the above example is equivalent to:

	   $str	=~ m/ <[gs]>et | put | save? | q[uit]?/;

       (i.e. with the metasequences left intact).

       An angle-bracketed interpolated hash first matches a "/\w+/" sequence
       and requires that that sequence is a valid key of the hash. It then
       treats the corresponding	hash value as a	subpattern and requires	that
       that subpattern match too. So:

	   use Perl6::Rules;

	   my %cmds =
	       ( get=>rx/\s+ <ident>/, put=>rx:i/\s+down/, quit=>rx/[\s+ griping]?/);

	   $str	=~ m/ %::cmds /;

       is a shorthand for:

	   $str	=~ m/ (\w+) { fail unless exists %::cmds{$1} }
		      <%::cmds{$1}>
		    /

       Once again, if the hash being interpolated has a	"keymatch" trait that
       trait's value is	used instead of	"\w+" to match the key.	 However, any
       "valuematch" trait on an	angle-bracketed	hash is	ignored.

       Note: due to limitations	of nesting pattern matches, Perl6::Rules
       requires	that any value in an angle-bracketed hash or array must	be a
       precompiled pattern (i.e. either	a Perl5-ish "qr/.../" or a Perl6-ish
       "rx/.../"), not a string.

   Predefined named rules
       Certain named rules are predefined by Perl 6 (and hence by the
       Perl6::Rules module). They are:

	   <ws>	       Match any sequence of whitespace
	   <ident>     Match an	identifier (alpha or underscore, followed by \w*)
	   <prior>     Match using the most recent successful rule
	   <self>      Match this entire pattern (recursively)
	   <sp>	       Match a single space char
	   <null>      Match zero characters (i.e. unconditionally)
	   <alpha>     Match a single alphabetic character
	   <space>     Match a single whitespace character
	   <digit>     Match a single digit
	   <alnum>     Match a single alphabetic or digit
	   <ascii>     Match a single ASCII character
	   <blank>     Match a single space or tab
	   <cntrl>     Match a single control character
	   <ctrl>      Match a single control character
	   <graph>     Match a single non-control character
	   <lower>     Match a single lower-case character
	   <print>     Match a single printable	character
	   <punct>     Match a single punctuation character
	   <upper>     Match a single upper-case character
	   <word>      Same as \w
	   <xdigit>    Match a single hexadecimal digit

       In addition, every long-	or short-form Unicode property name is a valid
       predefined subrule. For example:

	   <L> or <Letter>	       Match any letter
	   <Lu>	or <UppercaseLetter>   Match any upper-case letter

	   <Sm>	or <MathSymbol>	       Match any mathematical symbol

	   <BidiWS>		       Match any bidirectional whitespace

	   <Greek>		       Match any Greek character
	   <Mongolian>		       Match any Mongolian character
	   <Ogham>		       Match any Ogham character

	   <Any>		       Match any character

	   <InArrows>		       Match any character in the "Arrows" block
	   <InCurrencySymbols>	       Match any character in the "CurrencySymbols" block

	   etc.

       In addition, Perl6::Rules supports the Perl-specific "<Lr>" property,
       which replaces the non-standard Perl5-specific "<L&>" property, which
       matches any upper-, lower-, or title-case letter.

       Note that any such named	subrule	that matches exactly one character may
       also be used inside a character class.

   Code	interpolations
       uormally	code blocks don't actually match against anything. To make
       them do so, put the code	block in angle-brackets. For example:

	   / (@::cmds)	<{ get_body_for_cmd($1)	}> /

       This first matches one of the elements of @cmds (as a literal
       substring).  It then calls the "get_body_for_cmd" subroutine, passing
       it that substring.  The value returned by that call is then used	as a
       subpattern, which must match at that point.

       Note: due to limitations	of nesting pattern matches, Perl6::Rules
       requires	that any "<(...)>" block must return a precompiled pattern
       (i.e. either a Perl5-ish	"qr/.../" or a Perl6-ish "rx/.../"), not a
       string.

   Character classes
       A character class is an enumerated set of characters and/or properties.
       In Perl 6, character classes are	specified by square brackets inside
       angle brackets:

	   $str	=~ m/ <[A-Za-z_]> <[A-Za-z0-9_]>* /    # Match an ASCII	identifier

       A normal	character class	can also be indicated by a leading plus	sign,
       whilst a	complemented character class (i.e. "any	character except...")
       is indicated by a leading minus sign:

	   $str	=~ m/ <[aeiou]>	/      # Match a vowel
	   $str	=~ m/ <+[aeiou]> /     # Match a vowel
	   $str	=~ m/ <-[aeiou]> /     # Match a character that	isn't a	vowel

       Two or more square-bracketed sets (including their optional signs) can
       be placed in the	same angle brackets:

	   $str	=~ m/ <[aeiou][tlc]> /	   # Match a vowel or 't' or 'l' or 'c'
	   $str	=~ m/ <[aeiou]+[tlc]> /	   # Match a vowel or 't' or 'l' or 'c'
	   $str	=~ m/ <[a-x]-[aeiou]> /	   # Match a letter between 'a'	and 'x'
					   # but not a vowel

       Named properties, subrules and backslashed escapes that match a single
       character can also be placed in the character set:

	   $str	=~ m/ <<alpha>-[aeiou]>	/  # Match a non-vowel alphabetic
	   $str	=~ m/ <[\w]-<digit>> /	   # Match first letter	of an identifier

   Interpolated	literal	strings
       Any single-quoted string	in angle brackets is treated as	a literal
       sequence	of characters to be matched at that point. Whitespace and
       other metacharacters within the string must match literally.

       For example:

	   $text =~ m/ .*? <'# # # # #'> /;    # Match to first	'# # # # #'

       Another way to get the same effect is to	use a "quotemeta" block:

	   $text =~ m/ .*? \Q[#	# # # #] /;    # Match to first	'# # # # #'

       The subpattern inside the square	brackets following the "\Q" is treated
       as a literal string, to be "eq" matched.

   Backreferences
       Because variables are interpolated at match-time	in Perl	6 rules,
       backreferences to earlier captures are written as variables, not	as
       backslashed numbers. So,	to remove doubled words:

	   $text =~ s:words:globally{( <alpha>+) $1}{$1};

   Anonymous rule constructors
       Under Perl6::Rules, if you use "qr" to create an	anonymous rule you get
       the Perl	5 interpretation of the	pattern:

	   use Perl6::Rules;

	   my $pat = qr/[a-z+]:\0[123]/;

	   #  [a-z+]   Match one lower-case alpha or a '+',
	   #  :	       Match a literal colon,
	   #  \0       Match a null byte,
	   #  [123]    Match a '1', a '2', or a	'3'

       To get the Perl 6 interpretation, use the Perl 6	anonymous rule
       constructor ("rx") instead:

	   use Perl6::Rules;

	   my $pat = rx/[a-z+]:\0[123]/;

	   #  [	       Without capturing...
	   #	a-     Match 'a-',
	   #	z+     Match 'z' one or	more times
	   #  ]	       End of group
	   #  :	       Don't backtrack into previous group on failure
	   # \0[123]   Match an	'S' (specified via octal code)

       You can also use	the keyword "rule" there:

	   my $pat = rule {[a-z+]:\0[123]};

       Note: The "rx" keyword allows "{...}", "[...]", "<...>",	or "/.../" as
       pattern delimiters. The "rule" keyword allows only "{...}".

       If either needs modifiers, they go before the opening delimter, as for
       matches and substitutions:

	   my $pat = rule :wi {	my name	is (.*)	};
	   my $pat = rx:wi/ my name is (.*) /;

   Named Rules
       The "rule" keyword can also be used to create new named rules, by
       adding the rule name immediately	after the keyword:

	   rule	alpha_ident { <alpha> \w* }

	   # and later...

	   @ids	= grep m/<alpha_ident>/, @strings;

       In the Perl6::Rules implementation such a "rule"	declaration actually
       creates a subroutine of the same	name within the	current	Perl 5
       namespace.

       Note: Due to bugs in the	current	Perl 5 regex engine, captures that
       occur in	named rules that are called as subrules	from other rules may
       not work	correctly under	Perl6::Rules, and will frequently lead to
       segfaults and bus errors.

   Named captures to external variables
       Any set of capturing parentheses	can be prefixed	with the name of a
       variable	followed by ":=". The variable is then used as the destination
       of the captured substring, instead of assigning it to the next numbered
       variable.

       For example, after:

	   $input =~ / [ $::num	 := (\d+)
		       | $::alpha:= (<alpha>+)
		       | $::other:=(.)
		       ]
		     /

       then one	of $::num, $::alpha, or	$::other with have been	assigned the
       captured	substring from whichever subpattern actually matched.  But
       none of $1, $2, $3 will have been set (since the	named capture
       overrides the normal numbered capture mechanism).

       You can,	however, explicitly assign to a	numeric	variable (for example,
       to reorder them in some fiendish	way):

	   $pair =~  m:words{ $1:=(\w+)	=\> $2:=(.*)
			    | $2:=(.*?)	\<= $1:=(\w+)
			    };

       Note: due to unreliable interactions between Perl 5 regexes and lexical
       variables in the	current	Perl 5 regex engine, under this	version	of
       Perl6::Rules only explicitly-qualified package variables	and
       unqualified numeric variables may be used in rules.

       Repeated	captures can be	bound to arrays:

	   $list =~ m/ @::values:=[ (.*?) , ]* /;

       in which	case each captured substring will be pushed onto @::values.

       Pairs of	repeated captures can be bound to hashes:

	   $opts =~ m:words/ %::options:=[ (<ident>) = (\N+) ]*	/;

       in which	case the first capture in each repetition becomes the key and
       the second capture becomes the value. If	there are more than two
       captures, the value for that key	becomes	an array reference, and	the
       second and subsequent captures are stored in that array.

       If a single repeated capture is bound to	a hash,	each captured
       substring becomes a key of the hash (and	the corresponding values are
       "undef"):

	   $opts =~ m:words/ %::options:=[ (<ident>) = \N+ ]* /

   Named captures to internal variables
       Perl 6 rules also have their own	internal namespace, with their own
       internal	variables. Those variables are marked by a secondary '?'
       sigil. For example:

	   $input =~ / [ $?num	:= (\d+)
		       | $?alpha:= (<alpha>+)
		       | $?other:=(.)
		       ]
		     /

       After this match	succeeds, one of the three internal variables will
       have been set. To access	these variables, treat $0 as a hash reference:

	      if (exists $0->{num})   {	print "Got number: $0->{num}\n"	}
	   elsif (exists $0->{alpha}) {	print "Got alpha:  $0->{alpha}\n" }
	   elsif (exists $0->{other}) {	print "Got other:  $0->{other}\n" }

       Scalar internal variables are stored under a key	that is	the name of
       the variable stripped of	its leading $?.	Array and hash internal
       variables are stored under their	full variable name. For	example:

	   $list =~ m/ @?values:=[ (.*?) , ]* /;

	   for (@{ $0->{'@?values'} }) {
	       print "Another values was: $_\n";
	   }

       Named subrules can also capture their result into an internal scalar
       variable	of same	name. To do so,	prefix the rule	name inside the	angle-
       brackets	with a question-mark:

	   $pair =~ m:words/ <?key> =\>	<?value> /;

	   print "Key was: $0->{key}\n";
	   print "Val was: $0->{value}\n";

       Naturally enough, internal variables can	also be	accessed within	the
       rule itself. For	example:

	   $pair =~ m:words/ <?key> =\>	<?value> { $?first = substr($?key,0,1) /;
	   print "Key starts:  $0->{first}";
	   print "Key was:     $0->{key}\n";
	   print "Val was:     $0->{value}\n";

   Return values from matches
       In Perl 6, a match always returns a "match object", which is also
       available as (lexical) $0. This match object evaluates differently in
       different contexts:

       o   In a	boolean	context	it evaluates true or false (i.e. did the match
	   succeed?)

	       m/<ident>/;
	       if ($0) {
		   print "Success!\n";
	       }

       o   In a	string context it evaluates to the captured substring:

	       do {
		   $text =~ m:cont/,? (<ident>)/ and print $hash{$0};
	       } while $0;

       o   When	used as	an array reference, $0 provides	a reference to an
	   array containing the	numbered captures:

	       $text =~	m:words/ (<ident>) \: (\N+)/;

	       print "Option was:   $0->[0]\n";	   # $0->[0] same as "$0"
	       print "Option name:  $0->[1]\n";	   # $0->[1] same as  $1
	       print "Option value: $0->[2]\n";	   # $0->[2] same as  $2
						   # etc.

       o   When	used as	a hash reference, $0 provides a	reference to a hash
	   containing its internal named variables:

	       $text =~	m:words/ <?ident> \: @?vals:=[\s* (\S+)]+ /;

	       print "Option name: ", $0->{ident}, "\n";
	       print "Option vals: ", @{ $0->{'@?vals'}	}, "\n";

       Since it	is not feasible	to intercept the return	value of a Perl	5
       regex match, under Perl6::Rules,	the return value is still the Perl 5
       return value. However, $0 is set	to the polymorphic match object	shown
       above.

       Note that within	a regex, $0 acts like an internal variable, so you can
       capture or assign to it to control the overall substring	that is
       returned.  For example:

	   use Perl6::Rules;

	   $quoted_str =~ m{ (<["'`]>) ([\\?.]*?) $1 }
	   #
	   # default behaviour:	"$0" includes delimiters

	   $quoted_str =~ m{ (<["'`]>) $0:=([\\.|<!$1>]*) $1 }
	   #
	   # "$0" now excludes delimiters because it was
	   # explicitly	bound only to contents of quoted string

   Grammars
       Named rules can be placed in a particular namespace, called a
       "grammar".  For example:

	   grammar Identity {
	       rule name :words	{ Name \: (\N+)	}
	       rule age	 :words	{ Age  \: (\d+)	}
	       rule addr :words	{ Addr \: (\N+)	}
	       rule desc :words	{ <name> <age> <addr> }

	       # etc.
	   }

       Then, to	access these named rules, call them as if they were (Perl 6)
       methods:

	   $id =~ m/ <Identity.desc> /;

       Note: Perl6::Rules uses a regular package for each grammar you specify,
       adding each rule	as a subroutine	of that	package. Be careful not	to
       clobber your existing packages and classes when defining	new grammars.

       Like classes, grammars can inherit:

	   grammar Letter {
	       rule text     { <greet> <body> <close> }

	       rule greet :w { [Hi|Hey|Yo] $to:=(\S+?) , $$}

	       rule body     { <line>+ }

	       rule close :w { Later dude, $from:=(.+) }

	       # etc.
	   }

	   grammar FormalLetter	is Letter {

	       rule greet :w { Dear $to:=(\S+?)	, $$}

	       rule close :w { Yours sincerely,	$from:=(.+) }

	   }

       This syntax is fully supported by Perl6::Rules.

       Note: Due to bugs in the	Perl 5 regex engine, captures that occur in
       rules or	subrules called	in from	other grammatical namespaces may not
       work correctly under Perl6::Rules, and will frequently lead to
       segfaults and bus errors.

DEBUGGING
       If the module is	loaded with the	"-translate" flag:

	   use Perl6::Rules -translate;

       it translates any subsequent Perl 6 rules back to Perl 5	syntax,	prints
       the translated source file, and exits before attempting to compile it.

       If the module is	loaded with the	"-debug" flag:

	   use Perl6::Rules -debug;

       it adds a considerable number of	debugging statements into each
       translated rule,	producing extensive tracking of	the construction and
       matching	of each	rule.

       The match object	($0) also provides a "dump" method that	shows the
       various values that were	retrieved from the match.

LIMITATIONS
       This module implements most, but	not all, of the	proposed Perl 6
       semantics.  Generally speaking, a Perl 6	feature	has been omitted only
       where there is no way (or no efficient way) to implement	it within the
       constraints of the Perl 5 regex engine.

       o   Only	one "$(...)" or	"@(...)" is allowed in the replacement text of
	   a substitution. And the closing paren must be last closing paren of
	   the string. That is:

	       s/ <?ident> <?rnum> /marker for $(lookup($?ident).' '.from_roman($?rnum)) here/

	   is fine, but:

	       s/ <?ident> <?rnum> /marker for $(lookup	$?ident) $(from_roman $?rnum) here/

	   is not.

       o   The ":first"	(i.e. match once only between resets) modifier is not
	   implemented.

       o   The ":u0", ":u1", ":u2", ":u3" modifiers are	not implemented.

       o   The ":perl5"	modifier is not	supported. If you want a Perl 5
	   pattern under "use Perl6::Rules", just use "qr/.../"	or a raw
	   "/.../" (i.e. no "m"	before the delimiters).

       o   "Bare" Perl 6 patterns are not supported. Every Perl	6 pattern must
	   be specified	with an	explicit "rx", "m", "s", or "rule" keyword.
	   Bare	"/.../"	patterns and "qr/.../" patterns	are treated as Perl 5
	   patterns.

       o   The match string's "pos" is only set	correctly when the ":cont"
	   modifier is specified.

       o   You cannot use arbitrary delimiters when specifying a rule.	Only
	   "m{...}", "m[...]", "m<...>", and "m/.../" are supported. Likewise
	   for "rx", "rule", and "s".

       o   Lookbehinds (<after...> and <!after...>) are	restricted to fixed
	   length patterns.

       o   Repetitions must be statically defined (i.e.	a variable can't be
	   used	in an <n,m> qualifier).

       o   The "&" operator is not yet implemented.

       o   Variables used anywhere in a	rule/rx	pattern	must be	specified in
	   Perl	6 syntax (i.e. $a[0] always means $a->[0])

       o   Any subpattern interpolated by a "<$scalar>", "<@array>",
	   "<%hash>", or "<{block}>" construct must be precompiled regular
	   expression, not a raw string.

       o   <.> does not	always work correctly (esp. for	combining characters)
	   due to bugs in Perl 5.8.3

       o   Due to bugs in the handling of match-time interpolations in the
	   Perl	5.8.3 regex engine, subrules that capture may produce
	   segfaults during or immediately after the match.

       o   Due to problems in Perl 5.8.3's handling of lexical variables in
	   patterns (and especially in code blocks inside patterns), the
	   module does not allow lexical variables to be used in Perl 6	rules.
	   To enforce this, all	variables used in a Perl 6 rule	must include
	   at least one	explicit "::" in their name. That is:

	       our ($keyword, %valid);

	       # and later...

	       m/ $::keyword:=(<ident>)	<( %::valid{$::keyword}	)> /

	   but not:

	       my ($keyword, %valid);

	       # and later...

	       m/ $keyword:=(<ident>) <( %valid{$keyword} )> /

       o   The Perl 5 nonstandard "L&" property	(which is equivalent to	"Lu" +
	   "Ll"	+ "Lt")	has been renamed to "Lr" (mnemonic: Letter-regular).

       o   The various "cut" operators (except for ":")	are not	implemented.
	   That	is, "::", ":::", "<commit>", and "<cut>" are not supported.

       o   Rules cannot	be specified with parameter lists.  Consequently
	   subrules cannot be called with arguments.

WARNING
       The syntax and semantics	of Perl	6 is still being finalized and
       consequently is at any time subject to change. That means the same
       caveat applies to this module.

DEPENDENCIES
       Filter::Simple Attribute::Handlers

AUTHOR
       Damian Conway (DCONWAY@cpan.org)

BUGS AND IRRITATIONS
       No doubt	there are many.	You are	strongly advised not to	use this
       module in production code yet.

       Comments, suggestions, and patches are welcome, but due to the volume
       of email	I now receive from Nigerian widows and dispossessed heirs to
       mining fortunes,	I have some very tight mail filters deployed. If you'd
       like me to actually see your message regarding this module, please
       include the marker:

	   [P6R]

       somewhere in your subject line.

       Also please be patient if I am not able to respond immediately (i.e.
       within a	few months) to your bug	report.

SPONSORSHIP
       This module was developed under a grant from The	Perl Foundation.
       Hence it	was made possible by the generosity of people like yourself.
       Thank-you.

       If you'd	like to	help the Foundation continue to	work for the
       betterment of the entire	Perl community you can find out	how at:

	   http://www.perlfoundation.org/index.cgi?page=contrib

COPYRIGHT
	Copyright (c) 2004, The	Perl Foundation. All Rights Reserved.
	This module is free software. It may be	used, redistributed
	   and/or modified under the same terms	as Perl	itself.

POD ERRORS
       Hey! The	above document had some	coding errors, which are explained
       below:

       Around line 3413:
	   You forgot a	'=back'	before '=head1'

perl v5.24.1			  2004-04-12			      Rules(3)

NAME | SYNOPSIS | DESCRIPTION | DEBUGGING | LIMITATIONS | WARNING | DEPENDENCIES | AUTHOR | BUGS AND IRRITATIONS | SPONSORSHIP | COPYRIGHT | POD ERRORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Perl6::Rules&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help