Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Regex(3)	      User Contributed Perl Documentation	      Regex(3)

       YAPE::Regex - Yet Another Parser/Extractor for Regular Expressions

       This document refers to YAPE::Regex version 4.00.

	 use YAPE::Regex;
	 use strict;

	 my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
	 my $parser = YAPE::Regex->new($regex);

	 # here	is the tokenizing part
	 while (my $chunk = $parser->next) {
	   # ...

       The "YAPE" hierarchy of modules is an attempt at	a unified means	of
       parsing and extracting content.	It attempts to maintain	a generic
       interface, to promote simplicity	and reusability.  The API is powerful,
       yet simple.  The	modules	do tokenization	(which can be intercepted) and
       build trees, so that extraction of specific nodes is doable.

       This module is yet another (?) parser and tree-builder for Perl regular
       expressions.  It	builds a tree out of a regex, but at the moment, the
       extent of the extraction	tool for the tree is quite limited (see
       "Extracting Sections").	However, the tree can be useful	to extension

       In addition to the base class, "YAPE::Regex", there is the auxiliary
       class "YAPE::Regex::Element" (common to all "YAPE" base classes)	that
       holds the individual nodes' classes.  There is documentation for	the
       node classes in that module's documentation.

   Methods for "YAPE::Regex"
       o   "use	YAPE::Regex;"

       o   "use	YAPE::Regex qw(	MyExt::Mod );"

	   If supplied no arguments, the module	is loaded normally, and	the
	   node	classes	are given the proper inheritence (from
	   "YAPE::Regex::Element").  If	you supply a module (or	list of
	   modules), "import" will automatically include them (if needed) and
	   set up their	node classes with the proper inheritence -- that is,
	   it will append "YAPE::Regex"	to @MyExt::Mod::ISA, and
	   "YAPE::Regex::xxx" to each node class's @ISA	(where "xxx" is	the
	   name	of the specific	node class).

	     package MyExt::Mod;
	     use YAPE::Regex 'MyExt::Mod';

	     # does the	work of:
	     # @MyExt::Mod::ISA	= 'YAPE::Regex'
	     # @MyExt::Mod::text::ISA =	'YAPE::Regex::text'
	     # ...

       o   "my $p = YAPE::Regex->new($REx);"

	   Creates a "YAPE::Regex" object, using the contents of $REx as a
	   regular expression.	The "new" method will attempt to convert $REx
	   to a	compiled regex (using "qr//") if $REx isn't already one.  If
	   there is an error in	the regex, this	will fail, but the parser will
	   pretend it was ok.  It will then report the bad token when it gets
	   to it, in the course	of parsing.

       o   "my $text = $p->chunk($len);"

	   Returns the next $len characters in the input string; $len defaults
	   to 30 characters.  This is useful for figuring out why a parsing
	   error occurs.

       o   "my $done = $p->done;"

	   Returns true	if the parser is done with the input string, and false

       o   "my $errstr = $p->error;"

	   Returns the parser error message.

       o   "my $backref	= $p->extract;"

	   Returns a code reference that returns the next back-reference in
	   the regex.  For more	information on enhancements in upcoming
	   versions of this module, check "Extracting Sections".

       o   "my $node = $p->display(...);"

	   Returns a string representation of the entire content.  It calls
	   the "parse" method in case there is more data that has not yet been
	   parsed.  This calls the "fullstring"	method on the root nodes.
	   Check the "YAPE::Regex::Element" docs on the	arguments to

       o   "my $node = $p->next;"

	   Returns the next token, or "undef" if there is no valid token.
	   There will be an error message (accessible with the "error" method)
	   if there was	a problem in the parsing.

       o   "my $node = $p->parse;"

	   Calls "next"	until all the data has been parsed.

       o   "my $node = $p->root;"

	   Returns the root node of the	tree structure.

       o   "my $state =	$p->state;"

	   Returns the current state of	the parser.  It	is one of the
	   following values: "alt", "anchor", "any", "backref",	capture(N),
	   "Cchar", "class", "close", "code", "comment", "cond(TYPE)", "ctrl",
	   "cut", "done", "error", "flags", "group", "hex", "later",
	   "lookahead(neg|pos)", "lookbehind(neg|pos)",	"macro", "named",
	   "oct", "slash", "text", and "utf8hex".

	   For capture(N), N will be the number	the captured pattern

	   For "cond(TYPE)", TYPE will either be a number representing the
	   back-reference that the conditional depends on, or the string

	   For "lookahead" and "lookbehind", one of "neg" and "pos" will be
	   there, depending on the type	of assertion.

       o   "my $node = $p->top;"

	   Synonymous to "root".

   Extracting Sections
       While extraction	of nodes is the	goal of	the "YAPE" modules, the	author
       is at a loss for	words as to what needs to be extracted from a regex.
       At the current time, all	the "extract" method does is allow you access
       to the regex's set of back-references:

	 my $extor = $parser->extract;
	 while (my $backref = $extor->()) {
	   # ...

       "japhy" is very open to suggestions as to the approach to node
       extraction (in how the API should look, in addition to what should be
       proffered).  Preliminary	ideas include extraction keywords like the
       output of -Dr (or the "re" module's "debug" option).

       o   "YAPE::Regex::Explain"

	   Presents an explanation of a	regular	expression, node by node.

       o   "YAPE::Regex::Reverse" (Not released)

	   Reverses the	nodes of a regular expression.

       This is a listing of things to add to future versions of	this module.

       o   Create a robust "extract" method

	   Open	to suggestions.

       Following is a list of known or reported	bugs.

       o   "use	charnames ':full'"

	   To understand "\N{...}" properly, you must be using 5.6.0 or
	   higher.  However, the parser	only knows how to resolve full names
	   (those made using "use charnames ':full'").	There might be an
	   option in the future	to specify a class name.

       The "YAPE::Regex::Element" documentation, for information on the	node
       classes.	 Also, "Text::Balanced", Damian	Conway's excellent module,
       used for	the matching of	"(?{ ... })" and "(??{ ... })" blocks.

       The original author is Jeff "japhy" Pinyan (CPAN	ID: PINYAN).

       Gene Sullivan ( is a co-maintainer.

       This module is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.  See perlartistic.

perl v5.32.1			  2011-02-02			      Regex(3)


Want to link to this manual page? Use this URL:

home | help