Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
yecc(3)			   Erlang Module Definition		       yecc(3)

       yecc - LALR-1 Parser Generator

       An  LALR-1  parser  generator  for Erlang, similar to yacc. Takes a BNF
       grammar definition as input, and	produces Erlang	code for a parser.

       To understand this text,	you also have to look at the  yacc  documenta-
       tion  in	 the UNIX(TM) manual. This is most probably necessary in order
       to understand the idea of a parser generator,  and  the	principle  and
       problems	of LALR	parsing	with finite look-ahead.

       error_info() =
	   {erl_anno:location()	| none,
	    ErrorDescriptor :: term()}

	      The  standard  error_info()  structure that is returned from all
	      I/O modules. ErrorDescriptor is formattable by format_error/1.

       file(FileName) -> yecc_ret()

       file(Grammarfile, Options) -> yecc_ret()


		 Grammarfile = file:filename()
		 Options = Option | [Option]
		 Option	=
		     {error_location, column | line} |
		     {includefile, Includefile :: file:filename()} |
		     {report_errors, boolean()}	|
		     {report_warnings, boolean()} |
		     {report, boolean()} |
		     {return_errors, boolean()}	|
		     {return_warnings, boolean()} |
		     {return, boolean()} |
		     {parserfile, Parserfile ::	file:filename()} |
		     {verbose, boolean()} |
		     {warnings_as_errors, boolean()} |
		     report_errors | report_warnings | report |	 return_errors
		     return_warnings | return |	verbose	| warnings_as_errors
		 yecc_ret() = ok_ret() | error_ret()
		 ok_ret() =
		     {ok, Parserfile ::	file:filename()} |
		     {ok, Parserfile ::	file:filename(), warnings()}
		 error_ret() =
		     error | {error, Errors :: errors(), Warnings :: warnings()}
		 errors() = [{file:filename(), [error_info()]}]
		 warnings() = [{file:filename(), [error_info()]}]

	      Grammarfile  is  the file	of declarations	and grammar rules. Re-
	      turns ok upon success, or	error if there are errors.  An	Erlang
	      file  containing	the  parser is created if there	are no errors.
	      The options are:

		{includefile, Includefile}.:
		  Indicates a customized prologue file which the user may want
		  to  use  instead  of	the  default  file  lib/parsetools/in-
		  clude/yeccpre.hrl which is otherwise included	at the	begin-
		  ning	of  the	resulting parser file. N.B. The	Includefile is
		  included 'as is' in the parser file, so it must not  have  a
		  module  declaration  of  its	own, and it should not be com-
		  piled. It must, however, contain the necessary export	decla-
		  rations. The default is indicated by "".

		{parserfile, Parserfile}.:
		  Parserfile is	the name of the	file that will contain the Er-
		  lang parser code that	is generated. The default ("")	is  to
		  add  the  extension .erl to Grammarfile stripped of the .yrl

		{report_errors,	boolean()}.:
		  Causes errors	to be printed as they occur. Default is	true.

		{report_warnings, boolean()}.:
		  Causes warnings to be	printed	 as  they  occur.  Default  is

		{report, boolean()}.:
		  This is a short form for both	report_errors and report_warn-

		{return_errors,	boolean()}.:
		  If this flag is set, {error, Errors, Warnings}  is  returned
		  when there are errors. Default is false.

		{return_warnings, boolean()}.:
		  If  this  flag is set, an extra field	containing Warnings is
		  added	to the tuple returned upon success. Default is false.

		{return, boolean()}.:
		  This is a short form for both	return_errors and return_warn-

		{verbose, boolean()}. :
		  Determines whether the parser	generator should give full in-
		  formation about resolved and unresolved  parse  action  con-
		  flicts  (true), or only about	those conflicts	that prevent a
		  parser from being generated from the input  grammar  (false,
		  the default).

		{warnings_as_errors, boolean()}:
		  Causes warnings to be	treated	as errors.

		{error_location, column	| line}.:
		  If  the value	of this	flag is	line, the location of warnings
		  and errors is	a line number. If the value is column, the lo-
		  cation  includes  a line number and a	column number. Default
		  is column.

	      Any of the Boolean options can be	set to	true  by  stating  the
	      name  of the option. For example,	verbose	is equivalent to {ver-
	      bose, true}.

	      The value	of the Parserfile option stripped of the  .erl	exten-
	      sion  is used by Yecc as the module name of the generated	parser

	      Yecc will	add the	extension .yrl to the  Grammarfile  name,  the
	      extension	 .hrl  to the Includefile name,	and the	extension .erl
	      to the Parserfile	name, unless the extension is already there.

       format_error(ErrorDescriptor) ->	io_lib:chars()


		 ErrorDescriptor = term()

	      Returns a	descriptive string in English of an error  reason  Er-
	      rorDescriptor returned by	yecc:file/1,2. This function is	mainly
	      used by the compiler invoking Yecc.

       The (host operating system) environment	variable  ERL_COMPILER_OPTIONS
       can be used to give default Yecc	options. Its value must	be a valid Er-
       lang term. If the value is a list, it is	used as	is. If	it  is	not  a
       list, it	is put into a list.

       The list	is appended to any options given to file/2.

       The list	can be retrieved with  compile:env_compiler_options/0.

       A  scanner  to pre-process the text (program, etc.) to be parsed	is not
       provided	in the yecc module. The	scanner	serves as a  kind  of  lexicon
       look-up routine.	It is possible to write	a grammar that uses only char-
       acter tokens as terminal	symbols, thereby eliminating the  need	for  a
       scanner,	but this would make the	parser larger and slower.

       The  user  should implement a scanner that segments the input text, and
       turns it	into one or more lists of tokens. Each token should be a tuple
       containing  information	about syntactic	category, position in the text
       (e.g. line number), and the actual terminal symbol found	in  the	 text:
       {Category, Position, Symbol}.

       If  a  terminal symbol is the only member of a category,	and the	symbol
       name is identical to the	category name, the token format	may  be	 {Sym-
       bol, Position}.

       A  list	of  tokens  produced  by the scanner should end	with a special
       end_of_input tuple which	the parser is looking for. The format of  this
       tuple should be {Endsymbol, EndPosition}, where Endsymbol is an identi-
       fier that is distinguished from all the terminal	and non-terminal cate-
       gories  of the syntax rules. The	Endsymbol may be declared in the gram-
       mar file	(see below).

       The simplest case is to segment the input string	into a list of identi-
       fiers  (atoms) and use those atoms both as categories and values	of the
       tokens. For example, the	input string aaa bbb 777,  X  may  be  scanned
       (tokenized) as:

       [{aaa, 1}, {bbb,	1}, {777, 1}, {',' , 1}, {'X', 1},
	{'$end', 1}].

       This  assumes  that  this is the	first line of the input	text, and that
       '$end' is the distinguished end_of_input	symbol.

       The Erlang scanner in the io module can be used	as  a  starting	 point
       when  writing  a	 new scanner. Study yeccscan.erl in order to see how a
       filter can be added on top of io:scan_erl_form/3	to provide  a  scanner
       for Yecc	that tokenizes grammar files before parsing them with the Yecc
       parser. A more general approach to scanner implementation is to	use  a
       scanner	generator.  A scanner generator	in Erlang called leex is under

       Erlang style comments, starting with a  '%',  are  allowed  in  grammar

       Each declaration	or rule	ends with a dot	(the character '.').

       The  grammar  starts with an optional header section. The header	is put
       first in	the generated file, before the module declaration. The purpose
       of the header is	to provide a means to make the documentation generated
       by EDoc look nicer. Each	header	line  should  be  enclosed  in	double
       quotes, and newlines will be inserted between the lines.	For example:

       Header "%% Copyright (C)"
       "%% @private"
       "%% @Author John".

       Next  comes  a  declaration of the nonterminal categories to be used in
       the rules. For example:

       Nonterminals sentence nounphrase	verbphrase.

       A non-terminal category can be used at the left hand side  (=  lhs,  or
       head)  of  a grammar rule. It can also appear at	the right hand side of

       Next comes a declaration	of the terminal	categories, which are the cat-
       egories of tokens produced by the scanner. For example:

       Terminals article adjective noun	verb.

       Terminal	 categories may	only appear in the right hand sides (= rhs) of
       grammar rules.

       Next comes a declaration	of the rootsymbol, or start  category  of  the
       grammar.	For example:

       Rootsymbol sentence.

       This symbol should appear in the	lhs of at least	one grammar rule. This
       is the most general syntactic category which the	parser ultimately will
       parse every input string	into.

       After  the  rootsymbol declaration comes	an optional declaration	of the
       end_of_input symbol that	your scanner is	expected to use. For example:

       Endsymbol '$end'.

       Next comes one or more declarations of operator precedences, if needed.
       These  are  used	to resolve shift/reduce	conflicts (see yacc documenta-

       Examples	of operator declarations:

       Right 100 '='.
       Nonassoc	200 '==' '=/='.
       Left 300	'+'.
       Left 400	'*'.
       Unary 500 '-'.

       These declarations mean that '='	is defined as a	right associative  bi-
       nary operator with precedence 100, '==' and '=/=' are operators with no
       associativity, '+' and '*' are left associative binary operators, where
       '*' takes precedence over '+' (the normal case),	and '-'	is a unary op-
       erator of higher	precedence than	'*'. The fact that '=='	has  no	 asso-
       ciativity  means	 that  an  expression like a ==	b == c is considered a
       syntax error.

       Certain rules are assigned precedence: each rule	 gets  its  precedence
       from  the  last terminal	symbol mentioned in the	right hand side	of the
       rule. It	is also	possible to declare precedence for non-terminals, "one
       level  up".  This is practical when an operator is overloaded (see also
       example 3 below).

       Next come the grammar rules. Each rule has the general form

       Left_hand_side -> Right_hand_side : Associated_code.

       The left	hand side is a non-terminal category. The right	hand side is a
       sequence	 of  one  or more non-terminal or terminal symbols with	spaces
       between.	The associated code is a sequence of zero or more  Erlang  ex-
       pressions  (with	 commas	 ',' as	separators). If	the associated code is
       empty, the separating colon ':' is also omitted.	A final	dot marks  the
       end of the rule.

       Symbols	such  as  '{', '.', etc., have to be enclosed in single	quotes
       when used as terminal or	non-terminal symbols in	grammar	rules. The use
       of the symbols '$empty',	'$end',	and '$undefined' should	be avoided.

       The  last  part	of the grammar file is an optional section with	Erlang
       code (= function	definitions) which is included 'as is' in the  result-
       ing  parser  file. This section must start with the pseudo declaration,
       or key words

       Erlang code.

       No syntax rule definitions or other declarations	may follow  this  sec-
       tion.  To  avoid	conflicts with internal	variables, do not use variable
       names beginning with two	underscore characters  ('__')  in  the	Erlang
       code  in	 this  section,	 or in the code	associated with	the individual
       syntax rules.

       The optional expect declaration can be placed anywhere before the  last
       optional	section	with Erlang code. It is	used for suppressing the warn-
       ing about conflicts that	is ordinarily given if the grammar is  ambigu-
       ous. An example:

       Expect 2.

       The  warning  is	 given if the number of	shift/reduce conflicts differs
       from 2, or if there are reduce/reduce conflicts.

       A grammar to parse list expressions (with empty associated code):

       Nonterminals list elements element.
       Terminals atom '(' ')'.
       Rootsymbol list.
       list -> '(' ')'.
       list -> '(' elements ')'.
       elements	-> element.
       elements	-> element elements.
       element -> atom.
       element -> list.

       This grammar can	be used	to generate a parser which parses list expres-
       sions, such as (), (a), (peter charles),	(a (b c) d (())), ... provided
       that your scanner tokenizes, for	example, the input (peter charles)  as

       [{'(', 1} , {atom, 1, peter}, {atom, 1, charles}, {')', 1},
	{'$end', 1}]

       When  a grammar rule is used by the parser to parse (part of) the input
       string as a grammatical phrase, the associated code is  evaluated,  and
       the  value  of  the  last  expression  becomes  the value of the	parsed
       phrase. This value may be used by the parser later to build  structures
       that  are  values  of  higher  phrases of which the current phrase is a
       part. The values	initially associated with terminal  category  phrases,
       i.e. input tokens, are the token	tuples themselves.

       Below  is  an example of	the grammar above with structure building code

       list -> '(' ')' : nil.
       list -> '(' elements ')'	: '$2'.
       elements	-> element : {cons, '$1', nil}.
       elements	-> element elements : {cons, '$1', '$2'}.
       element -> atom : '$1'.
       element -> list : '$1'.

       With this code added to the grammar rules, the parser produces the fol-
       lowing  value  (structure) when parsing the input string	(a b c).. This
       still assumes that this was the first input line	that the scanner  tok-

       {cons, {atom, 1,	a,} {cons, {atom, 1, b},
				   {cons, {atom, 1, c},	nil}}}

       The  associated	code  contains pseudo variables	'$1', '$2', '$3', etc.
       which refer to (are bound to) the values	associated previously  by  the
       parser  with the	symbols	of the right hand side of the rule. When these
       symbols are terminal categories,	the values are token tuples of the in-
       put string (see above).

       The associated code may not only	be used	to build structures associated
       with phrases, but may also be used for syntactic	 and  semantic	tests,
       printout	 actions  (for	example	 for tracing), etc. during the parsing
       process.	Since tokens contain positional	(line number) information,  it
       is  possible  to	 produce error messages	which contain line numbers. If
       there is	no associated code after the right hand	side of	the rule,  the
       value '$undefined' is associated	with the phrase.

       The  right  hand	side of	a grammar rule may be empty. This is indicated
       by using	the special symbol '$empty' as	rhs.  Then  the	 list  grammar
       above may be simplified to:

       list -> '(' elements ')'	: '$2'.
       elements	-> element elements : {cons, '$1', '$2'}.
       elements	-> '$empty' : nil.
       element -> atom : '$1'.
       element -> list : '$1'.

       To call the parser generator, use the following command:


       An  error  message from Yecc will be shown if the grammar is not	of the
       LALR type (for example too ambiguous). Shift/reduce conflicts  are  re-
       solved  in favor	of shifting if there are no operator precedence	decla-
       rations.	Refer to the yacc documentation	on the use of operator	prece-

       The  output  file  contains Erlang source code for a parser module with
       module name equal to the	Parserfile parameter. After  compilation,  the
       parser  can  be called as follows (the module name is assumed to	be my-


       The call	format may be different	if a customized	prologue file has been
       included	 when  generating  the	parser	instead	 of  the  default file

       With the	standard prologue, this	call will return either	{ok,  Result},
       where  Result  is  a structure that the Erlang code of the grammar file
       has built, or {error, {Position,	Module,	Message}} if there was a  syn-
       tax error in the	input.

       Message	is  something  which may be converted into a string by calling
       Module:format_error(Message) and	printed	with io:format/3.

       By default, the parser that was generated will not print	out error mes-
       sages  to  the screen. The user will have to do this either by printing
       the returned error messages, or by inserting tests and  print  instruc-
       tions  in the Erlang code associated with the syntax rules of the gram-
       mar file.

       It is also possible to make the parser ask for more input  tokens  when
       needed if the following call format is used:

       myparser:parse_and_scan({Function, Args})
       myparser:parse_and_scan({Mod, Tokenizer,	Args})

       The tokenizer Function is either	a fun or a tuple {Mod, Tokenizer}. The
       call apply(Function, Args) or apply({Mod, Tokenizer}, Args) is executed
       whenever	a new token is needed. This, for example, makes	it possible to
       parse from a file, token	by token.

       The tokenizer used above	has to be implemented so as to return  one  of
       the following:

       {ok, Tokens, EndPosition}
       {eof, EndPosition}
       {error, Error_description, EndPosition}

       This  conforms  to  the format used by the scanner in the Erlang	io li-
       brary module.

       If  {eof,  EndPosition}	is   returned	immediately,   the   call   to
       parse_and_scan/1	 returns  {ok, eof}. If	{eof, EndPosition} is returned
       before the parser expects  end  of  input,  parse_and_scan/1  will,  of
       course,	return an error	message	(see above). Otherwise {ok, Result} is

       1. A grammar for	parsing	infix arithmetic expressions into prefix nota-
       tion, without operator precedence:

       Nonterminals E T	F.
       Terminals '+' '*' '(' ')' number.
       Rootsymbol E.
       E -> E '+' T: {'$2', '$1', '$3'}.
       E -> T :	'$1'.
       T -> T '*' F: {'$2', '$1', '$3'}.
       T -> F :	'$1'.
       F -> '('	E ')' :	'$2'.
       F -> number : '$1'.

       2. The same with	operator precedence becomes simpler:

       Nonterminals E.
       Terminals '+' '*' '(' ')' number.
       Rootsymbol E.
       Left 100	'+'.
       Left 200	'*'.
       E -> E '+' E : {'$2', '$1', '$3'}.
       E -> E '*' E : {'$2', '$1', '$3'}.
       E -> '('	E ')' :	'$2'.
       E -> number : '$1'.

       3. An overloaded	minus operator:

       Nonterminals E uminus.
       Terminals '*' '-' number.
       Rootsymbol E.

       Left 100	'-'.
       Left 200	'*'.
       Unary 300 uminus.

       E -> E '-' E.
       E -> E '*' E.
       E -> uminus.
       E -> number.

       uminus -> '-' E.

       4.  The	Yecc grammar that is used for parsing grammar files, including

       grammar declaration rule	head symbol symbols attached_code
       token tokens.
       atom float integer reserved_symbol reserved_word	string char var
       Rootsymbol grammar.
       Endsymbol '$end'.
       grammar -> declaration :	'$1'.
       grammar -> rule : '$1'.
       declaration -> symbol symbols dot: {'$1', '$2'}.
       rule -> head '->' symbols attached_code dot: {rule, ['$1' | '$3'],
       head -> symbol :	'$1'.
       symbols -> symbol : ['$1'].
       symbols -> symbol symbols : ['$1' | '$2'].
       attached_code ->	':' tokens : {erlang_code, '$2'}.
       attached_code ->	'$empty' : {erlang_code,
			[{atom,	0, '$undefined'}]}.
       tokens -> token : ['$1'].
       tokens -> token tokens :	['$1' |	'$2'].
       symbol -> var : value_of('$1').
       symbol -> atom :	value_of('$1').
       symbol -> integer : value_of('$1').
       symbol -> reserved_word : value_of('$1').
       token ->	var : '$1'.
       token ->	atom : '$1'.
       token ->	float :	'$1'.
       token ->	integer	: '$1'.
       token ->	string : '$1'.
       token ->	char : '$1'.
       token ->	reserved_symbol	: {value_of('$1'), line_of('$1')}.
       token ->	reserved_word :	{value_of('$1'), line_of('$1')}.
       token ->	'->' : {'->', line_of('$1')}.
       token ->	':' : {':', line_of('$1')}.
       Erlang code.
       value_of(Token) ->
	   element(3, Token).
       line_of(Token) ->
	   element(2, Token).

       The symbols '-_', and ':' have to be treated in a special way, as  they
       are  meta  symbols of the grammar notation, as well as terminal symbols
       of the Yecc grammar.

       5. The file erl_parse.yrl in the	lib/stdlib/src directory contains  the
       grammar for Erlang.

       Syntactic tests are used	in the code associated with some rules,	and an
       error is	thrown (and caught by the generated parser to produce an error
       message)	when a test fails. The same effect can be achieved with	a call
       to return_error(ErrorPosition, Message_string), which is	defined	in the
       yeccpre.hrl default header file.


       Aho & Johnson: 'LR Parsing', ACM	Computing Surveys, vol.	6:2, 1974.

Ericsson AB		       parsetools 2.3.2			       yecc(3)


Want to link to this manual page? Use this URL:

home | help