Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Marpa::Marpa(3)	      User Contributed Perl Documentation      Marpa::Marpa(3)

NAME
       Marpa - Parse any Language You Can Describe in BNF

SYNOPSIS
	   use Marpa;

	   my $grammar = Marpa::Grammar->new(
	       {   start   => 'Expression',
		   actions => 'My_Actions',
		   default_action => 'first_arg',
		   rules   => [
		       { lhs =>	'Expression', rhs => [qw/Term/]	},
		       { lhs =>	'Term',	rhs => [qw/Factor/] },
		       { lhs =>	'Factor', rhs => [qw/Number/] },
		       { lhs =>	'Term',	rhs => [qw/Term	Add Term/], action => 'do_add' },
		       {   lhs	  => 'Factor',
			   rhs	  => [qw/Factor	Multiply Factor/],
			   action => 'do_multiply'
		       },
		   ],
	       }
	   );

	   $grammar->precompute();

	   my $recce = Marpa::Recognizer->new( { grammar => $grammar } );

	   my @tokens =	(
	       [ 'Number', 42 ],
	       [ 'Multiply', ],
	       [ 'Number', 1 ],
	       [ 'Add',	],
	       [ 'Number', 7 ],
	   );

	   $recce->tokens( \@tokens );

	   sub My_Actions::do_add {
	       my ( undef, $t1,	undef, $t2 ) = @_;
	       return $t1 + $t2;
	   }

	   sub My_Actions::do_multiply {
	       my ( undef, $t1,	undef, $t2 ) = @_;
	       return $t1 * $t2;
	   }

	   sub My_Actions::first_arg { shift; return shift; }

	   my $value_ref = $recce->value;
	   my $value = $value_ref ? ${$value_ref} : 'No	Parse';

DESCRIPTION
   This	release	is OBSOLETE
       This release is obsolete.  Please use the more up-to-date versions of
       Marpa instead.  As of this writing, I recommend Marpa::XS, which	is
       beta, and whose interface I expect to keep stable.  For those without a
       C compiler, Marpa::PP is	a Pure Perl alternative.

       This is alpha software.	There may be bugs.  Please be careful.	Do not
       rely on it for anything mission-critical.

   Overview
       Marpa parses any	language whose grammar can be written in BNF.  That
       includes	recursive grammars, ambiguous grammars,	infinitely ambiguous
       grammars	and grammars with useless or empty productions.

       This document contains a	top-level overview of the API for the Marpa
       parse engine.  The two examples in this document	show the typical flows
       of Marpa	method calls.  This document will use these examples to
       describe	the basic features of Marpa in semi-tutorial fashion.  Marpa's
       advanced	features, and full reference details of	all features, can be
       found in	the other Marpa	API documents.

   The Three Phases
       A parser	needs to:

       o   Accept a grammar.

       o   Read	input.

       o   Return values from the parses, according to a semantics.

       In Marpa	these three tasks are, for the most part, distinct phases.
       Grammars	are Marpa::Grammar objects.  The reading of input and the
       evaluation of the parse according to the	semantics is performed by
       Marpa::Recognizer objects.

EXAMPLE	1: A SIMPLE CALCULATOR
       The synopsis shows the code for a very simple calculator.  It handles
       only addition and multiplication	of integers.  This section explains,
       line by line, how it works.

   Marpa::Grammar::new
	   my $grammar = Marpa::Grammar->new(
	       {   start   => 'Expression',
		   actions => 'My_Actions',
		   default_action => 'first_arg',
		   rules   => [
		       { lhs =>	'Expression', rhs => [qw/Term/]	},
		       { lhs =>	'Term',	rhs => [qw/Factor/] },
		       { lhs =>	'Factor', rhs => [qw/Number/] },
		       { lhs =>	'Term',	rhs => [qw/Term	Add Term/], action => 'do_add' },
		       {   lhs	  => 'Factor',
			   rhs	  => [qw/Factor	Multiply Factor/],
			   action => 'do_multiply'
		       },
		   ],
	       }
	   );

       Marpa grammars are Marpa::Grammar objects.  They	are created with the
       Marpa::Grammar::new constructor.	 The arguments to Marpa::Grammar::new
       are references to hashes	of named arguments.  In	the key/value pairs of
       these hashes, the hash key is the name of the argument, and the hash
       value is	the value of the named argument.

       The start Named Argument

	   start => 'Expression',

       The "start" named argument is required.	Its value is a string
       containing the name of the grammar's start symbol.

       Named Arguments for the Semantics

		   actions => 'My_Actions',
		   default_action => 'first_arg',

       The "actions" and "default_action" named	arguments specify semantics.
       Their argument values are strings, which	acquire	their semantics	during
       evaluation.

       Evaluation will be described later.  Peeking ahead, the
       "default_action"	named argument will be interpreted as an action	name.
       This action name	will resolve to	a Perl closure that implements the
       semantics for rules without a per-rule semantics.  "actions" will be
       the name	of a Perl package where	Marpa looks for	the semantic Perl
       closures.

       The rules Named Argument

	   rules => [
	       { lhs =>	'Expression', rhs => [qw/Term/]	},
	       { lhs =>	'Term',	      rhs => [qw/Factor/] },
	       { lhs =>	'Factor',     rhs => [qw/Number/] },
	       { lhs =>	'Term',	rhs => [qw/Term	Add Term/], action => 'do_add' },
	       {   lhs	  => 'Factor',
		   rhs	  => [qw/Factor	Multiply Factor/],
		   action => 'do_multiply'
	       },
	   ],

       The value of the	"rules"	named argument is a reference to an array of
       rule descriptors.  In this example, all the rule	descriptors are	in the
       "long" form -- they are references to hashes of rule properties.	 In
       each key/value pair of a	rule descriptor	hash, the key is the name of a
       rule property, and the hash value is the	value of that rule property.

       The lhs Property

       The value of the	"lhs" rule property must be a string containing	the
       name of the rule's left hand side symbol.  Every	Marpa rule must	have a
       left hand side symbol.

       The rhs Property

       The value of the	"rhs" property is a reference to an array of strings
       containing names	of the rule's right hand symbols, in order.  This
       array may be zero length, in which case this is an empty	rule --	a rule
       with no symbols on the right hand side.	There are no empty rules in
       this example.

       The action Property

       The value of the	"action" rule property is a string.  Peeking ahead,
       each "action" property string will be interpreted as an action name.
       This action name	will be	resolved to a Perl closure that	implements the
       rule's semantics.

   Marpa::Grammar::precompute
	   $grammar->precompute();

       Before a	Marpa grammar object can be used by a Marpa recognizer,	it
       must be precomputed.  Precomputation compiles data structures that the
       recognizer will need.

   Marpa::Recognizer::new
	   my $recce = Marpa::Recognizer->new( { grammar => $grammar } );

       "Marpa::Recognizer::new"	creates	a new recognizer.  Its arguments are
       references to hashes of named arguments.	 In this example the only
       named argument is the required argument:	""grammar"".  The value	of the
       "grammar" named argument	must be	a precomputed Marpa grammar for	the
       recognizer to use in building its tables.

   Marpa::Recognizer::tokens
	   my @tokens =	(
	       [ 'Number', 42 ],
	       [ 'Multiply', ],
	       [ 'Number', 1 ],
	       [ 'Add',	],
	       [ 'Number', 7 ],
	   );

	   $recce->tokens( \@tokens );

       "Marpa::Recognizer::tokens" reads the input to be recognized.  Its
       first (and in our example, only)	argument is a reference	to an array of
       token descriptors.  Each	token descriptor is an array reference.

       The first element of a token descriptor is a string containing the
       token name.  The	token name must	be the name of a valid terminal	symbol
       in the grammar.	By default all symbols are valid as terminal symbols,
       unless a	grammar	contains empty rules.

       The grammars in our examples do not contain empty rules,	and therefore
       we are free to use any symbol in	the grammar as a token name.  For more
       on terminals, including how to explicitly mark terminal symbols when
       the grammar contains empty rules, see "Terminals" in Marpa::Grammar.

       The second element of a token descriptor	is the token value.  A token
       value must be a Perl scalar, but	otherwise its form and semantics are
       entirely	up to the application.	If the token value is omitted, it is a
       Perl "undef".  In the calculator	example, the values of the ""Add"" and
       ""Multiply"" tokens are never used, so they are allowed to default to
       "undef".

   Marpa::Recognizer::value
	   my $value_ref = $recce->value;
	   my $value = $value_ref ? ${$value_ref} : 'No	Parse';

       The "Marpa::Recognizer::value" method returns a reference to the	parse
       result's	value, if there	was a parse result.  If	there was no parse
       result, "Marpa::Recognizer::value" returns "undef".

   Resolving the Semantics
       The first thing "Marpa::Recognizer::value" needs	to do is to resolve
       the semantics.  Resolving the semantics means mapping the action	names
       into semantic Perl closures.  Semantic Perl closures are	Perl closures
       which directly implement	semantics.  In this example, the "actions"
       named argument is interpreted as	a Perl package name.  Marpa will look
       for its semantic	Perl closures in that package.

	   actions => 'My_Actions',

	   { lhs => 'Factor', rhs => [qw/Factor	Multiply Factor/], action => 'do_multiply' },

       For example, the	"action" property for the above	rule is
       ""do_multiply"" and the "actions" named argument	to the grammar was
       ""My_Actions"".	So Marpa looks for a closure whose fully qualified
       name is "My_Actions::do_multiply", which	it finds:

	   sub My_Actions::do_multiply {
	       my ( undef, $t1,	undef, $t2 ) = @_;
	       return $t1 * $t2;
	   }

       Rules do	not always have	"action" properties.  That is the case with
       these rules in this example:

	   { lhs => 'Expression', rhs => [qw/Term/] },
	   { lhs => 'Term', rhs	=> [qw/Factor/]	},
	   { lhs => 'Factor', rhs => [qw/Number/] },

       Where there is no "action" rule property, Marpa tries to	use the	"lhs"
       property	as an action name.  When Marpa cannot resolve the "lhs"
       property	as an action name, it will fall	back to	using the default
       action for the grammar.

       For example, in the first rule in the above display, Marpa will look
       for a Perl closure with the fully qualified name
       ""My_Actions::Expression"".  When Marpa does not	find a closure by that
       name, Marpa will	fall back to trying to find a semantic Perl closure
       using the default action.

       The other two rules in the above	display	work similarly.	 Marpa will
       look for	Perl closures named ""My_Actions::Term"" and
       ""My_Actions::Factor"".	It will	not find them and will fall back to
       trying to use the default action, as described next.

	   default_action => 'first_arg',

       The "default_action" named argument is resolved in the same way as are
       the "action" properties of the rules.  In this example, default_action
       is specified as ""first_arg"" and resolves to "My_Actions::first_arg".

   Semantic Perl Closures
	   sub My_Actions::first_arg { shift; return shift; }

	   sub My_Actions::do_add {
	       my ( undef, $t1,	undef, $t2 ) = @_;
	       return $t1 + $t2;
	   }

       The semantic Perl closures are callbacks, called	when each node in a
       parse tree is evaluated.	 The callbacks receive one or more arguments.
       The first argument to a semantic	Perl closure is	always a per-parse-
       result object, which the	callbacks can use as a scratchpad.  In these
       examples, the per-parse-result object is	not used.

       For a non-empty rule, the second	and any	subsequent arguments to	the
       callback	are the	values,	in lexical order, of the symbols on the	right
       hand side of the	rule.  If the semantic Perl closure is for an empty
       rule, the per-parse-result object will be its only argument.

       Every semantic Perl closure is expected to return a value.  With	one
       exception, this value is	passed up to a parent node as an argument.
       The exception is	the value for the start	rule.  The return value	for
       the start rule becomes the value	of the parse result.

       Rules with no action specified for them take their semantics from the
       "default_action"	named argument.	 If there is no	default	action for a
       grammar,	rules without no action	specified for them return a Perl
       "undef".

EXAMPLE	2: AN AMBIGUOUS	PARSE
       This is the same	calculator as before, rewritten	to be ambiguous.
       Rather than give	multiplication precedence over addition, the rewritten
       calculator allows any order of operations.  In this example, the
       semantic	Perl closures ("My_Actions::do_add", etc.)  and	the @tokens
       array remain the	same as	before.

       Eliminating precedence makes the	grammar	shorter, but it	also means
       there can be multiple parse results, and	the different parse results
       can have	different values.  In this application we decide, for each
       input, to return	the value for every one	of the parse results.

	   use Marpa;

	   my $ambiguous_grammar = Marpa::Grammar->new(
	       {   start   => 'E',
		   actions => 'My_Actions',
		   rules   => [
		       [ 'E', [qw/E Add	E/],	  'do_add' ],
		       [ 'E', [qw/E Multiply E/], 'do_multiply'	],
		       [ 'E', [qw/Number/],	  ],
		   ],
		   default_action => 'first_arg',
	       }
	   );

	   $ambiguous_grammar->precompute();

	   my $ambiguous_recce =
	       Marpa::Recognizer->new( { grammar => $ambiguous_grammar } );

	   $ambiguous_recce->tokens( \@tokens );

	   my @values =	();
	   while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value()	) ) {
	       push @values, ${$ambiguous_value_ref};
	   }

   Short Form Rule Descriptors
	   rules => [
	       [ 'E', [qw/E Add	E/],	  'do_add' ],
	       [ 'E', [qw/E Multiply E/], 'do_multiply'	],
	       [ 'E', [qw/Number/], ],
	   ],

       The rule	descriptors in the ambiguous example demonstrate the "short"
       or array	form of	rule descriptors.  Array form rule descriptors are
       references to arrays.  Here the elements	are, in	order, the "lhs"
       property, the "rhs" property, and the "action" property.

   Marpa::Recognizer::value
	   my @values =	();
	   while ( defined( my $ambiguous_value_ref = $ambiguous_recce->value()	) ) {
	       push @values, ${$ambiguous_value_ref};
	   }

       When called more	than once, the "Marpa::Recognizer::value" method
       iterates	through	the parse results.  For	each call, it returns a
       reference to the	value of a parse result.  At the end of	the iteration,
       after all parse results have been returned, "Marpa::Recognizer::value"
       returns "undef".	 If there were no parse	results,
       "Marpa::Recognizer::value" returns "undef" the first time that it is
       called.

ERRORS AND EXCEPTIONS
       Methods in the Marpa API	do not return errors.  When there are errors,
       Marpa API methods throw an exception.

INHERITANCE
       Classes in the Marpa API	are not	designed to be inherited.

OTHER DOCUMENTS
   Basic Documents
       This document gives a semi-tutorial overview of the entire Marpa	API.
       For full	details	on Marpa's grammar objects and their methods, see the
       Marpa::Grammar document.	 For full details on Marpa's recognizer
       objects and their methods, see the Marpa::Recognizer document.

       Marpa::Parse_Terms is intended as a quick refresher in parsing
       terminology.  Marpa's standard semantics	are fully described in the
       Marpa::Semantics	document.  Techniques for tracing and for debugging
       your Marpa grammars are described in the	Marpa::Tracing document	and
       the Marpa::Debug	document.

   Advanced Documents
       The Marpa::Advanced::Implementation describes Marpa's internals.	 The
       Marpa::Advanced::Models document	(yet to	be written) will provide
       details about alternative models	of the input for Marpa.

   Academic Documents
       For those with a	theoretical bent, my sources, and other	useful
       references, are described in Marpa::Advanced::Bibliography.
       Marpa::Advanced::Algorithm describes the	Marpa algorithm	itself.

AUTHOR
       Jeffrey Kegler

   Why is it Called "Marpa"?
       Marpa is	the name of the	greatest of the	Tibetan	"translators".	In his
       time (the 11th century AD) Indian Buddhism was at its height.  A
       generation of scholars was devoting itself to producing Tibetan
       versions	of Buddhism's Sanskrit scriptures.  Marpa became the greatest
       of them,	and today is known as Marpa Lotsawa: "Marpa the	Translator".

   Blatant Plug
       Marpa is	a character in my novel, The God Proof.	 The God Proof centers
       around Kurt Goedel's proof of God's existence.  Yes, that Kurt Goedel,
       and yes,	he really did work out a God Proof (it's in his	Collected
       Works, Vol. 3, pp. 403-404).  The God Proof is available	as a free
       download	(<http://www.lulu.com/content/933192>).	 It can	be purchased
       in print	form at	Amazon.com:
       <http://www.amazon.com/God-Proof-Jeffrey-Kegler/dp/1434807355>.

ACKNOWLEDGMENTS
       Marpa is	directly derived from two other	parsers.  The first was
       discovered by John Aycock and R.	 Nigel Horspool	and is described in
       their Aycock and	Horspool 2002.	The second was described by Joop Leo
       and is described	in Leo 1991.  Aycock, Horspool,	and Leo, in turn,
       based their algorithms on the algorithm discovered by Jay Earley.  I
       combined	the Aycock-Horspool algorithm with the Leo algorithm, and
       added significant changes of my own.  More details on the algorithm
       itself are in another document.

       I'm grateful to Randal Schwartz for his support over the	years that
       I've been working on Marpa.  My chats with Larry	Wall have been few and
       brief, but his openness to new ideas has	been a major encouragement and
       his insight into	the relationship between "natural language" and
       computer	language has been a major influence.  More recently, Allison
       Randal and Patrick Michaud have been generous with their	very valuable
       time.  They might have preferred	that I volunteered as a	Parrot cage-
       cleaner,	but if so, they	were too polite	to say.

       Many at perlmonks.org answered questions	for me.	 I used	answers	from
       chromatic, Corion, dragonchild, jdporter, samtregar and Juerd, among
       others, in writing this module.	I'm just as grateful to	those whose
       answers I didn't	use.  My inquiries were	made while I was thinking out
       the code	and it wasn't always 100% clear	what I was after.  If the butt
       is moved	after the round, it shouldn't count against the	archer.

       In writing the Pure Perl	version	of Marpa, I benefited from studying
       the work	of Francois Desarmenien	("Parse::Yapp"), Damian	Conway
       ("Parse::RecDescent") and Graham	Barr ("Scalar::Util").	Adam Kennedy
       patiently instructed me in module writing, both on the finer points and
       on issues about which I really should have know better.

SUPPORT
       Marpa comes without warranty.  Support is provided on a volunteer basis
       through the standard mechanisms for CPAN	modules.  The Support document
       has details.

LICENSE	AND COPYRIGHT
       Copyright 2007-2010 Jeffrey Kegler, all rights reserved.	 Marpa is free
       software	under the Perl license.	 For details see the LICENSE file in
       the Marpa distribution.

perl v5.32.0			  2020-08-10		       Marpa::Marpa(3)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLE 1: A SIMPLE CALCULATOR | EXAMPLE 2: AN AMBIGUOUS PARSE | ERRORS AND EXCEPTIONS | INHERITANCE | OTHER DOCUMENTS | AUTHOR | ACKNOWLEDGMENTS | SUPPORT | LICENSE AND COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Marpa::Marpa&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help