Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RE2C(1)								       RE2C(1)

NAME
       re2c - convert regular expressions to C/C++

SYNOPSIS
       re2c [OPTIONS] FILE

DESCRIPTION
       re2c is a lexer generator for C/C++. It finds regular expression
       specifications inside of	C/C++ comments and replaces them with a
       hard-coded DFA. The user	must supply some interface code	in order to
       control and customize the generated DFA.

EXAMPLE
       Given the following code:

	   unsigned int	stou (const char * s)
	   {
	   #   define YYCTYPE char
	       const YYCTYPE * YYCURSOR	= s;
	       unsigned	int result = 0;

	       for (;;)
	       {
		   /*!re2c
		       re2c:yyfill:enable = 0;

		       "\x00" {	return result; }
		       [0-9]  {	result = result	* 10 + c; continue; }
		   */
	       }
	   }

       re2c -is	will generate:

	   /* Generated	by re2c	0.13.7.dev on Mon Jul 14 13:37:46 2014 */
	   unsigned int	stou (const char * s)
	   {
	   #   define YYCTYPE char
	       const YYCTYPE * YYCURSOR	= s;
	       unsigned	int result = 0;

	       for (;;)
	       {

	   {
		   YYCTYPE yych;

		   yych	= *YYCURSOR;
		   if (yych <= 0x00) goto yy3;
		   if (yych <= '/') goto yy2;
		   if (yych <= '9') goto yy5;
	   yy2:
	   yy3:
		   ++YYCURSOR;
		   { return result; }
	   yy5:
		   ++YYCURSOR;
		   { result = result * 10 + c; continue; }
	   }

	       }
	   }

OPTIONS
       -?, -h
	   Invoke a short help.

       -b
	   Implies -s. Use bit vectors as well in the attempt to coax better
	   code	out of the compiler. Most useful for specifications with more
	   than	a few keywords (e.g. for most programming languages).

       -c
	   Used	to support (f)lex-like condition support.

       -d
	   Creates a parser that dumps information about the current position
	   and in which	state the parser is while parsing the input. This is
	   useful to debug parser issues and states. If	you use	this switch
	   you need to define a	macro YYDEBUG that is called like a function
	   with	two parameters:	void YYDEBUG (int state, char current).	The
	   first parameter receives the	state or -1 and	the second parameter
	   receives the	input at the current cursor.

       -D
	   Emit	Graphviz dot data. It can then be processed with e.g.  dot
	   -Tpng input.dot > output.png. Please	note that scanners with	many
	   states may crash dot.

       -e
	   Generate a parser that supports EBCDIC. The generated code can deal
	   with	any character up to 0xFF. In this mode re2c assumes that input
	   character size is 1 byte. This switch is incompatible with -w, -x,
	   -u and -8.

       -f
	   Generate a scanner with support for storable	state. For details see
	   below at SCANNER WITH STORABLE STATES.

       -F
	   Partial support for flex syntax. When this flag is active then
	   named definitions must be surrounded	by curly braces	and can	be
	   defined without an equal sign and the terminating semi colon.
	   Instead names are treated as	direct double quoted strings.

       -g
	   Generate a scanner that utilizes GCC's computed goto	feature. That
	   is re2c generates jump tables whenever a decision is	of a certain
	   complexity (e.g. a lot of if	conditions are otherwise necessary).
	   This	is only	useable	with GCC and produces output that cannot be
	   compiled with any other compiler. Note that this implies -b and
	   that	the complexity threshold can be	configured using the inplace
	   configuration cgoto:threshold.

       -i
	   Do not output #line information. This is usefull when you want use
	   a CMS tool with the re2c output which you might want	if you do not
	   require your	users to have re2c themselves when building from your
	   source.

       -o OUTPUT
	   Specify the output file.

       -r
	   Allows reuse	of scanner definitions with /*!use:re2c	after
	   /*!rules:re2c. In this mode no /*!re2c block	and exactly one
	   /*!rules:re2c must be present. The rules are	being saved and	used
	   by every /*!use:re2c	block that follows. These blocks can contain
	   inplace configurations, especially re2c:flags:e, re2c:flags:w,
	   re2c:flags:x, re2c:flags:u and re2c:flags:8.	That way it is
	   possible to create the same scanner multiple	times for different
	   character types, different input mechanisms or different output
	   mechanisms. The /*!use:re2c blocks can also contain additional
	   rules that will be appended to the set of rules in /*!rules:re2c.

       -s
	   Generate nested ifs for some	switches. Many compilers need this
	   assist to generate better code.

       -t
	   Create a header file	that contains types for	the (f)lex-like
	   condition support. This can only be activated when -c is in use.

       -u
	   Generate a parser that supports UTF-32. The generated code can deal
	   with	any valid Unicode character up to 0x10FFFF. In this mode re2c
	   assumes that	input character	size is	4 bytes. This switch is
	   incompatible	with -e, -w, -x	and -8.	This implies -s.

       -v
	   Show	version	information.

       -V
	   Show	the version as a number	XXYYZZ.

       -w
	   Generate a parser that supports UCS-2. The generated	code can deal
	   with	any valid Unicode character up to 0xFFFF. In this mode re2c
	   assumes that	input character	size is	2 bytes. This switch is
	   incompatible	with -e, -x, -u	and -8.	This implies -s.

       -x
	   Generate a parser that supports UTF-16. The generated code can deal
	   with	any valid Unicode character up to 0x10FFFF. In this mode re2c
	   assumes that	input character	size is	2 bytes. This switch is
	   incompatible	with -e, -w, -u	and -8.	This implies -s.

       -1
	   Force single	pass generation, this cannot be	combined with -f and
	   disables YYMAXFILL generation prior to last re2c block.

       -8
	   Generate a parser that supports UTF-8. The generated	code can deal
	   with	any valid Unicode character up to 0x10FFFF. In this mode re2c
	   assumes that	input character	size is	1 byte.	This switch is
	   incompatible	with -e, -w, -x	and -u.

       --case-insensitive
	   All strings are case	insensitive, so	all "-expressions are treated
	   in the same way '-expressions are.

       --case-inverted
	   Invert the meaning of single	and double quoted strings. With	this
	   switch single quotes	are case sensitive and double quotes are case
	   insensitive.

       --no-generation-date
	   Suppress date output	in the generated output	so that	it only	shows
	   the re2c version.

       --encoding-policy POLICY
	   Specify how re2c must treat Unicode surrogates.  POLICY can be one
	   of the following: fail (abort with error when surrogate
	   encountered), substitute (silently substitute surrogate with	error
	   code	point 0xFFFD), ignore (treat surrogates	as normal code
	   points). By default re2c ignores surrogates (for backward
	   compatibility). Unicode standard says that standalone surrogates
	   are invalid code points, but	different libraries and	programs treat
	   them	differently.

INTERFACE CODE
       The user	must supply interface code either in the form of C/C++ code
       (macros,	functions, variables, etc.) or in the form of inplace
       configurations. Which symbols must be defined and which are optional
       depends on a particular use case.

       YYCONDTYPE
	   In -c mode you can use -t to	generate a file	that contains the
	   enumeration used as conditions. Each	of the values refers to	a
	   condition of	a rule set.

       YYCTXMARKER
	   l-value of type * YYCTYPE. The generated code saves trailing
	   context backtracking	information in YYCTXMARKER. The	user only
	   needs to define this	macro if a scanner specification uses trailing
	   context in one or more of its regular expressions.

       YYCTYPE
	   Type	used to	hold an	input symbol (code unit). Usually char or
	   unsigned char for ASCII, EBCDIC and UTF-8, unsigned short for
	   UTF-16 or UCS-2 and unsigned	int for	UTF-32.

       YYCURSOR
	   l-value of type * YYCTYPE that points to the	current	input symbol.
	   The generated code advances YYCURSOR	as symbols are matched.	On
	   entry, YYCURSOR is assumed to point to the first character of the
	   current token. On exit, YYCURSOR will point to the first character
	   of the following token.

       YYDEBUG (state, current)
	   This	is only	needed if the -d flag was specified. It	allows to
	   easily debug	the generated parser by	calling	a user defined
	   function for	every state. The function should have the following
	   signature: void YYDEBUG (int	state, char current). The first
	   parameter receives the state	or -1 and the second parameter
	   receives the	input at the current cursor.

       YYFILL (n)
	   The generated code "calls" YYFILL (n) when the buffer needs
	   (re)filling:	at least n additional characters should	be provided.
	   YYFILL (n) should adjust YYCURSOR, YYLIMIT, YYMARKER	and
	   YYCTXMARKER as needed. Note that for	typical	programming languages
	   n will be the length	of the longest keyword plus one. The user can
	   place a comment of the form /*!max:re2c*/ once to insert a
	   YYMAXFILL (n) definition that is set	to the maximum length value.
	   If -1 switch	is used	then YYMAXFILL can be triggered	only once
	   after the last /*!re2c ... */ block.

       YYGETCONDITION ()
	   This	define is used to get the condition prior to entering the
	   scanner code	when using -c switch. The value	must be	initialized
	   with	a value	from the enumeration YYCONDTYPE	type.

       YYGETSTATE ()
	   The user only needs to define this macro if the -f flag was
	   specified. In that case, the	generated code "calls" YYGETSTATE ()
	   at the very beginning of the	scanner	in order to obtain the saved
	   state.  YYGETSTATE () must return a signed integer. The value must
	   be either -1, indicating that the scanner is	entered	for the	first
	   time, or a value previously saved by	YYSETSTATE (s).	In the second
	   case, the scanner will resume operations right after	where the last
	   YYFILL (n) was called.

       YYLIMIT
	   Expression of type *	YYCTYPE	that marks the end of the buffer
	   (YYLIMIT[-1]	is the last character in the buffer). The generated
	   code	repeatedly compares YYCURSOR to	YYLIMIT	to determine when the
	   buffer needs	(re)filling.

       YYMARKER
	   l-value of type * YYCTYPE. The generated code saves backtracking
	   information in YYMARKER. Some easy scanners might not use this.

       YYMAXFILL
	   This	will be	automatically defined by /*!max:re2c*/ blocks as
	   explained above.

       YYSETCONDITION (c)
	   This	define is used to set the condition in transition rules. This
	   is only being used when -c is active	and transition rules are being
	   used.

       YYSETSTATE (s)
	   The user only needs to define this macro if the -f flag was
	   specified. In that case, the	generated code "calls" YYSETSTATE just
	   before calling YYFILL (n). The parameter to YYSETSTATE is a signed
	   integer that	uniquely identifies the	specific instance of YYFILL
	   (n) that is about to	be called. Should the user wish	to save	the
	   state of the	scanner	and have YYFILL	(n) return to the caller, all
	   he has to do	is store that unique identifer in a variable. Later,
	   when	the scannered is called	again, it will call YYGETSTATE () and
	   resume execution right where	it left	off. The generated code	will
	   contain both	YYSETSTATE (s) and YYGETSTATE even if YYFILL (n) is
	   being disabled.

SYNTAX
       Code for	re2c consists of a set of rules, named definitions and inplace
       configurations.

       rules consist of	a regular-expressions along with a block of C/C++ code
       that is to be executed when the associated regular-expression is
       matched.	You can	either start the code with an opening curly brace or
       the sequence :=.	When the code with a curly brace then re2c counts the
       brace depth and stops looking for code automatically. Otherwise curly
       braces are not allowed and re2c stops looking for code at the first
       line that does not begin	with whitespace. If two	or more	rules overlap,
       the first rule is preferred.

       regular-expression { C/C++ code }

       regular-expression := C/C++ code

       There is	one special rule: default rule *:

       * { C/C++ code }

       * := C/C++ code

	   Note
	   [^] differs from *: * has the lowest	priority, matches any code
	   unit	(either	valid or invalid) and always consumes one character;
	   [^] matches any valid code point (not code unit) and	can consume
	   multiple characters.	In fact, when variable-length encoding is
	   used, * is the only possible	way to match invalid input character.

       If -c is	active then each regular-expression is preceeded by a list of
       comma separated condition names.	Besides	normal naming rules there are
       two special cases. A rule may contain the single	condition name * and
       no contition name at all. In the	latter case the	rule cannot have a
       regular-expression. Non empty rules may further more specify the	new
       condition. In that case re2c will generated the necessary code to
       change the condition automatically. Just	as above code can be started
       with a curly brace of the sequence :=. Further more rules can use :=>
       as a shortcut to	automatically generate code that not only sets the new
       condition state but also	continues execution with the new state.	A
       shortcut	rule should not	be used	in a loop where	there is code between
       the start of the	loop and the re2c block	unless re2c:cond:goto is
       changed to continue. If code is necessary before	all rule (though not
       simple jumps) you can doso by using <! pseudo-rules.

       <condition-list>	regular-expression { C/C++ code	}

       <condition-list>	regular-expression := C/C++ code

       <condition-list>	* { C/C++ code }

       <condition-list>	* := C/C++ code

       <condition-list>	regular-expression => condition	{ C/C++	code }

       <condition-list>	regular-expression => condition	:= C/C++ code

       <condition-list>	regular-expression :=> condition

       <*> regular-expression {	C/C++ code }

       <*> regular-expression := C/C++ code

       <*> * { C/C++ code }

       <*> * :=	C/C++ code

       <*> regular-expression => condition { C/C++ code	}

       <*> regular-expression => condition := C/C++ code

       <*> regular-expression :=> condition

       <> { C/C++ code }

       <> := C/C++ code

       <> => condition { C/C++ code }

       <> => condition := C/C++	code

       <> :=> condition

       <!condition-list> { C/C++ code }

       <!condition-list> := C/C++ code

       <!*> { C/C++ code }

       <!*> := C/C++ code

       named definitions are of	the form:

       name = regular-expression;

       If -F is	active,	then named definitions are also	of the form:

       name regular-expression

       inplace configurations are of the form:

       re2c:name = value;

       re2c:name = "_value_";

REGULAR	EXPRESSIONS
       "foo"
	   literal string "foo". ANSI-C	escape sequences can be	used.

       `foo'
	   literal string "foo"	(characters [a-zA-Z] treated
	   case-insensitive). ANSI-C escape sequences can be used.

       [xyz]
	   character class; in this case, regular-expression matches either
	   `x',	`y', or	`z'.

       [abj-oZ]
	   character class with	a range	in it; matches `a', `b', any letter
	   from	`j' through `o'	or `Z'.

       [^class]
	   inverted character class.

       r \ s
	   match any r which isn't s.  r and s must be regular-expressions
	   which can be	expressed as character classes.

       r *
	   zero	or more	r's, where r is	any regular-expression.

       r +
	   one or more r's.

       r ?
	   zero	or one r's (that is, an	optional r).

       name
	   the expansion of the	named definition.

       ( r )

	   r; parentheses are used to override precedence.

       r s

	   r followed by s (concatenation).

       r | s
	   either r or s (alternative).

       r / s

	   r but only if it is followed	by s. Note that	s is not part of the
	   matched text. This type of regular-expression is called "trailing
	   context". Trailing context can only be the end of a rule and	not
	   part	of a named definition.

       r { n }
	   matches r exactly n times.

       r { n , }
	   matches r at	least n	times.

       r { n , m }
	   matches r at	least n	times, but not more than m times.

       .
	   match any character except newline.

       def
	   matches named definition as specified by def	only if	-F is off. If
	   -F is active	then this behaves like it was enclosed in double
	   quotes and matches the string "def".

       Character classes and string literals may contain octal or hexadecimal
       character definitions and the following set of escape sequences:	\a,
       \b, \f, \n, \r, \t, \v, \\. An octal character is defined by a
       backslash followed by its three octal digits (e.g. \377). Hexadecimal
       characters from 0 to 0xFF are defined by	backslash, a lower cased `x'
       and two hexadecimal digits (e.g.	\x12). Hexadecimal characters from
       0x100 to	0xFFFF are defined by backslash, a lower cased `u' (or an
       upper cased `X')	and four hexadecimal digits (e.g. \u1234). Hexadecimal
       characters from 0x10000 to 0xFFFFffff are defined by backslash, an
       upper cased `U' and eight hexadecimal digits (e.g. \U12345678).

       The only	portable "any" rule is the default rule	*.

INPLACE	CONFIGURATIONS
       It is possible to configure code	generation inside re2c blocks. The
       following lists the available configurations:

       re2c:condprefix = yyc_;
	   Allows to specify the prefix	used for condition labels. That	is
	   this	text is	prepended to any condition label in the	generated
	   output file.

       re2c:condenumprefix = yyc;
	   Allows to specify the prefix	used for condition values. That	is
	   this	text is	prepended to any condition enum	value in the generated
	   output file.

       re2c:cond:divider = "/* *********************************** */";
	   Allows to customize the devider for condition blocks. You can use
	   `@@'	to put the name	of the condition or ustomize the placeholder
	   using re2c:cond:divider@cond.

       re2c:cond:divider@cond =	@@;
	   Specifies the placeholder that will be replaced with	the condition
	   name	in re2c:cond:divider.

       re2c:cond:goto =	"goto @@;";
	   Allows to customize the condition goto statements used with :=>
	   style rules.	You can	use `@@' to put	the name of the	condition or
	   ustomize the	placeholder using re2c:cond:goto@cond. You can also
	   change this to `continue;', which would allow you to	continue with
	   the next loop cycle including any code between loop start and re2c
	   block.

       re2c:cond:goto@cond = @@;
	   Spcifies the	placeholder that will be replaced with the condition
	   label in re2c:cond:goto.

       re2c:indent:top = 0;
	   Specifies the minimum number	of indendation to use. Requires	a
	   numeric value greater than or equal zero.

       re2c:indent:string = "\t";
	   Specifies the string	to use for indendation.	Requires a string that
	   should contain only whitespace unless you need this for external
	   tools. The easiest way to specify spaces is to enclude them in
	   single or double quotes. If you do not want any indendation at all
	   you can simply set this to "".

       re2c:yych:conversion = 0;
	   When	this setting is	non zero, then re2c automatically generates
	   conversion code whenever yych gets read. In this case the type must
	   be defined using re2c:define:YYCTYPE.

       re2c:yych:emit =	1;
	   Generation of yych can be suppressed	by setting this	to 0.

       re2c:yybm:hex = 0;
	   If set to zero then a decimal table is being	used else a
	   hexadecimal table will be generated.

       re2c:yyfill:enable = 1;
	   Set this to zero to suppress	generation of YYFILL (n). When using
	   this	be sure	to verify that the generated scanner does not read
	   behind input. Allowing this behavior	might introduce	sever security
	   issues to you programs.

       re2c:yyfill:check = 1;
	   This	can be set 0 to	suppress output	of the pre condition using
	   YYCURSOR and	YYLIMIT	which becomes usefull when YYLIMIT + max
	   (YYFILL) is always accessible.

       re2c:yyfill:parameter = 1;
	   Allows to suppress parameter	passing	to YYFILL calls. If set	to
	   zero	then no	parameter is passed to YYFILL. However
	   define:YYFILL@LEN allows to specify a replacement string for	the
	   actual length value.	If set to a non	zero value then	YYFILL usage
	   will	be followed by the number of requested characters in braces
	   unless re2c:define:YYFILL:naked is set. Also	look at
	   re2c:define:YYFILL:naked and	re2c:define:YYFILL@LEN.

       re2c:startlabel = 0;
	   If set to a non zero	integer	then the start label of	the next
	   scanner blocks will be generated even if not	used by	the scanner
	   itself. Otherwise the normal	yy0 like start label is	only being
	   generated if	needed.	If set to a text value then a label with that
	   text	will be	generated regardless of	whether	the normal start label
	   is being used or not. This setting is being reset to	0 after	a
	   start label has been	generated.

       re2c:labelprefix	= yy;
	   Allows to change the	prefix of numbered labels. The default is yy
	   and can be set any string that is a valid label.

       re2c:state:abort	= 0;
	   When	not zero and switch -f is active then the YYGETSTATE block
	   will	contain	a default case that aborts and a -1 case is used for
	   initialization.

       re2c:state:nextlabel = 0;
	   Used	when -f	is active to control whether the YYGETSTATE block is
	   followed by a yyNext: label line. Instead of	using yyNext you can
	   usually also	use configuration startlabel to	force a	specific start
	   label or default to yy0 as start label. Instead of using a
	   dedicated label it is often better to separate the YYGETSTATE code
	   from	the actual scanner code	by placing a /*!getstate:re2c*/
	   comment.

       re2c:cgoto:threshold = 9;
	   When	-g is active this value	specifies the complexity threshold
	   that	triggers generation of jump tables rather than using nested
	   if's	and decision bitfields.	The threshold is compared against a
	   calculated estimation of if-s needed	where every used bitmap
	   divides the threshold by 2.

       re2c:yych:conversion = 0;
	   When	the input uses signed characters and -s	or -b switches are in
	   effect re2c allows to automatically convert to the unsigned
	   character type that is then necessary for its internal single
	   character. When this	setting	is zero	or an empty string the
	   conversion is disabled. Using a non zero number the conversion is
	   taken from YYCTYPE. If that is given	by an inplace configuration
	   that	value is being used. Otherwise it will be (YYCTYPE) and
	   changes to that configuration are no	longer possible. When this
	   setting is a	string the braces must be specified. Now assuming your
	   input is a char * buffer and	you are	using above mentioned switches
	   you can set YYCTYPE to unsigned char	and this setting to either 1
	   or (unsigned	char).

       re2c:define:define:YYCONDTYPE = YYCONDTYPE;
	   Enumeration used for	condition support with -c mode.

       re2c:define:YYCTXMARKER = YYCTXMARKER;
	   Allows to overwrite the define YYCTXMARKER and thus avoiding	it by
	   setting the value to	the actual code	needed.

       re2c:define:YYCTYPE = YYCTYPE;
	   Allows to overwrite the define YYCTYPE and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYCURSOR = YYCURSOR;
	   Allows to overwrite the define YYCURSOR and thus avoiding it	by
	   setting the value to	the actual code	needed.

       re2c:define:YYDEBUG = YYDEBUG;
	   Allows to overwrite the define YYDEBUG and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYFILL = YYFILL;
	   Allows to overwrite the define YYFILL and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYFILL:naked	= 0;
	   When	set to 1 neither braces, parameter nor semicolon gets emitted.

       re2c:define:YYFILL@len =	@@;
	   When	using re2c:define:YYFILL and re2c:yyfill:parameter is 0	then
	   any occurence of this text inside YYFILL will be replaced with the
	   actual length value.

       re2c:define:YYGETCONDITION = YYGETCONDITION;
	   Allows to overwrite the define YYGETCONDITION.

       re2c:define:YYGETCONDITION:naked	= 0;
	   When	set to 1 neither braces, parameter nor semicolon gets emitted.

       re2c:define:YYGETSTATE =	YYGETSTATE;
	   Allows to overwrite the define YYGETSTATE and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYGETSTATE:naked = 0;
	   When	set to 1 neither braces, parameter nor semicolon gets emitted.

       re2c:define:YYLIMIT = YYLIMIT;
	   Allows to overwrite the define YYLIMIT and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYMARKER = YYMARKER;
	   Allows to overwrite the define YYMARKER and thus avoiding it	by
	   setting the value to	the actual code	needed.

       re2c:define:YYSETCONDITION = YYSETCONDITION;
	   Allows to overwrite the define YYSETCONDITION.

       re2c:define:YYSETCONDITION@cond = @@;
	   When	using re2c:define:YYSETCONDITION then any occurence of this
	   text	inside YYSETCONDITION will be replaced with the	actual new
	   condition value.

       re2c:define:YYSETSTATE =	YYSETSTATE;
	   Allows to overwrite the define YYSETSTATE and thus avoiding it by
	   setting the value to	the actual code	needed.

       re2c:define:YYSETSTATE:naked = 0;
	   When	set to 1 neither braces, parameter nor semicolon gets emitted.

       re2c:define:YYSETSTATE@state = @@;
	   When	using re2c:define:YYSETSTATE then any occurence	of this	text
	   inside YYSETSTATE will be replaced with the actual new state	value.

       re2c:label:yyFillLabel =	yyFillLabel;
	   Allows to overwrite the name	of the label yyFillLabel.

       re2c:label:yyNext = yyNext;
	   Allows to overwrite the name	of the label yyNext.

       re2c:variable:yyaccept =	yyaccept;
	   Allows to overwrite the name	of the variable	yyaccept.

       re2c:variable:yybm = yybm;
	   Allows to overwrite the name	of the variable	yybm.

       re2c:variable:yych = yych;
	   Allows to overwrite the name	of the variable	yych.

       re2c:variable:yyctable =	yyctable;
	   When	both -c	and -g are active then re2c uses this variable to
	   generate a static jump table	for YYGETCONDITION.

       re2c:variable:yystable =	yystable;
	   When	both -f	and -g are active then re2c uses this variable to
	   generate a static jump table	for YYGETSTATE.

       re2c:variable:yytarget =	yytarget;
	   Allows to overwrite the name	of the variable	yytarget.

SCANNER	WITH STORABLE STATES
       When the	-f flag	is specified, re2c generates a scanner that can	store
       its current state, return to the	caller,	and later resume operations
       exactly where it	left off.

       The default operation of	re2c is	a "pull" model,	where the scanner asks
       for extra input whenever	it needs it. However, this mode	of operation
       assumes that the	scanner	is the "owner" the parsing loop, and that may
       not always be convenient.

       Typically, if there is a	preprocessor ahead of the scanner in the
       stream, or for that matter any other procedural source of data, the
       scanner cannot "ask" for	more data unless both scanner and source live
       in a separate threads.

       The -f flag is useful for just this situation: it lets users design
       scanners	that work in a "push" model, i.e. where	data is	fed to the
       scanner chunk by	chunk. When the	scanner	runs out of data to consume,
       it just stores its state, and return to the caller. When	more input
       data is fed to the scanner, it resumes operations exactly where it left
       off.

       When using the -f option	re2c does not accept stdin because it has to
       do the full generation process twice which means	it has to read the
       input twice. That means re2c would fail in case it cannot open the
       input twice or reading the input	for the	first time influences the
       second read attempt.

       Changes needed compared to the "pull" model:

	1. User	has to supply macros YYSETSTATE	() and YYGETSTATE (state).

	2. The -f option inhibits declaration of yych and yyaccept. So the
	   user	has to declare these. Also the user has	to save	and restore
	   these. In the example examples/push.re these	are declared as	fields
	   of the (C\++) class of which	the scanner is a method, so they do
	   not need to be saved/restored explicitly. For C they	could e.g. be
	   made	macros that select fields from a structure passed in as
	   parameter. Alternatively, they could	be declared as local
	   variables, saved with YYFILL	(n) when it decides to return and
	   restored at entry to	the function. Also, it could be	more efficient
	   to save the state from YYFILL (n) because YYSETSTATE	(state)	is
	   called unconditionally.  YYFILL (n) however does not	get state as
	   parameter, so we would have to store	state in a local variable by
	   YYSETSTATE (state).

	3. Modify YYFILL (n) to	return (from the function calling it) if more
	   input is needed.

	4. Modify caller to recognise "more input is needed" and respond
	   appropriately.

	5. The generated code will contain a switch block that is used to
	   restores the	last state by jumping behind the corrspoding YYFILL
	   (n) call. This code is automatically	generated in the epilog	of the
	   first /*!re2c */ block. It is possible to trigger generation	of the
	   YYGETSTATE () block earlier by placing a /*!getstate:re2c*/
	   comment. This is especially useful when the scanner code should be
	   wrapped inside a loop.

       Please see examples/push.re for push-model scanner. The generated code
       can be tweaked using inplace configurations state:abort and
       state:nextlabel.

SCANNER	WITH CONDITION SUPPORT
       You can preceed regular expressions with	a list of condition names when
       using the -c switch. In this case re2c generates	scanner	blocks for
       each conditon. Where each of the	generated blocks has its own
       precondition. The precondition is given by the interface	define
       YYGETCONDITON() and must	be of type YYCONDTYPE.

       There are two special rule types. First,	the rules of the condition *
       are merged to all conditions. And second	the empty condition list
       allows to provide a code	block that does	not have a scanner part.
       Meaning it does not allow any regular expression. The condition value
       referring to this special block is always the one with the enumeration
       value 0.	This way the code of this special rule can be used to
       initialize a scanner. It	is in no way necessary to have these rules:
       but sometimes it	is helpful to have a dedicated uninitialized condition
       state.

       Non empty rules allow to	specify	the new	condition, which makes them
       transition rules. Besides generating calls for the define
       YYSETCONDTITION no other	special	code is	generated.

       There is	another	kind of	special	rules that allow to prepend code to
       any code	block of all rules of a	certain	set of conditions or to	all
       code blocks to all rules. This can be helpful when some operation is
       common among rules. For instance	this can be used to store the length
       of the scanned string. These special setup rules	start with an
       exclamation mark	followed by either a list of conditions	<! condition,
       ... > or	a star <!*>. When re2c generates the code for a	rule whose
       state does not have a setup rule	and a star'd setup rule	is present,
       than that code will be used as setup code.

ENCODINGS
       re2c supports the following encodings: ASCII (default), EBCDIC (-e),
       UCS-2 (-w), UTF-16 (-x),	UTF-32 (-u) and	UTF-8 (-8). ASCII is default.
       You can either pass cmd flag or use inplace configuration in the	form
       re2c:flags.

       The following concepts should be	clarified when talking about encoding.
       Code point is an	abstract number, which represents single encoding
       symbol. Code unit is the	smallest unit of memory, which is used in the
       encoded text (it	corresponds to one character in	the input stream). One
       or more code units can be needed	to represent a single code point,
       depending on the	encoding. In fixed-length encoding, each code point is
       represented with	equal number of	code units. In variable-length
       encoding, different code	points can be represented with different
       number of code units.

       ASCII
	   is a	fixed-length encoding. Its code	space includes 0x100 code
	   points, from	0 to 0xFF (note	that this is re2c-specific
	   understanding of ASCII). One	code point is represented with exactly
	   one 1-byte code unit, which has the same value as the code point.
	   Size	of YYCTYPE must	be 1 byte.

       EBCDIC
	   is a	fixed-length encoding. Its code	space includes 0x100 code
	   points, from	0 to 0xFF. One code point is represented with exactly
	   one 1-byte code unit, which has the same value as the code point.
	   Size	of YYCTYPE must	be 1 byte.

       UCS-2
	   is a	fixed-length encoding. Its code	space includes 0x10000 code
	   points, from	0 to 0xFFFF. One code point is represented with
	   exactly one 2-byte code unit, which has the same value as the code
	   point. Size of YYCTYPE must be 2 bytes.

       UTF-16
	   is a	variable-length	encoding. Its code space includes all Unicode
	   code	points,	from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One code
	   point is represented	with one or two	2-byte code units. Size	of
	   YYCTYPE must	be 2 bytes.

       UTF-32
	   is a	fixed-length encoding. Its code	space includes all Unicode
	   code	points,	from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One code
	   point is represented	with exactly one 4-byte	code unit. Size	of
	   YYCTYPE must	be 4 bytes.

       UTF-8
	   is a	variable-length	encoding. Its code space includes all Unicode
	   code	points,	from 0 to 0xD7FF and from 0xE000 to 0x10FFFF. One code
	   point is represented	with sequence of one, two, three or four
	   1-byte code units. Size of YYCTYPE must be 1	bytes.

       In Unicode, values from range 0xD800 to 0xDFFF (surrogates) are not
       valid Unicode code points, any encoded sequence of code units, that
       would map to Unicode code points	in the range 0xD800-0xDFFF, is
       ill-formed. The user can	control	how re2c treats	such ill-formed
       sequences with --encoding-policy	policy flag (see OPTIONS section for
       full explanation).

       For some	encodings, there are code units, that never occur in valid
       encoded stream (e.g. 0xFF byte in UTF-8). If the	generated scanner must
       check for invalid input,	the only true way to do	so is to use default
       rule *. Note, that full range rule [^] won't catch invalid code units
       when variable-length encoding is	used ([^] means	"all valid code
       points",	while default rule * means "all	possible code units": see Note
       about default rule in SYNTAX section).

GENERIC	INPUT API
       re2c usually operates on	input using pointer-like primitives YYCURSOR,
       YYMARKER, YYCTXMARKER and YYLIMIT.

       Generic input API (enabled with --input custom switch) allows to
       customize input operations. In this mode, re2c will express all
       operations on input in terms of the following primitives:

	1.  YYPEEK () --- get current input character

	2.  YYSKIP () --- advance to the next character

	3.  YYBACKUP ()	--- backup current input position

	4.  YYBACKUPCTX	() --- backup current input position for trailing
	   context

	5.  YYRESTORE () --- restore current input position

	6.  YYRESTORECTX () ---	restore	current	input position for trailing
	   context

	7.  YYLESSTHAN (n) --- check if	less than n input characters are left

       This article
       (http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html)
       has more	details, and you can find some usage examples:
       http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html
       .

UNDERSTANDING RE2C
       The subdirectory	lessons	of the re2c distribution contains a few	step
       by step lessons to get you started with re2c. All examples in the
       lessons subdirectory can	be compiled and	actually work.

BUGS
	1. Difference only works for character sets, and not in	UTF-8 mode.

	2. The generated DFA is	not minimal.

	3. Features, that are naturally	orthogonal (such as reusable rules,
	   conditions, setup rules and default rules), cannot always be
	   combined. E.g., one cannot set setup/default	rule for condition in
	   scanner with	reusable rules.

	4.  re2c does too much unnecessary work: e.g., if /*!use:re2c ... */
	   block has additional	rules, these rules are parsed 4	times, while
	   they	should be parsed only once.

	5. The re2c internal algorithms	need documentation.

SEE ALSO
       flex(1),	lex(1),	quex (http://quex.sourceforge.net)

       More information	on re2c	can be found here: http://re2c.org/.

AUTHORS
	1. Peter Bumbulis peter@csg.uwaterloo.ca

	2. Brian Young bayoung@acm.org

	3. Dan Nuffer nuffer@users.sourceforge.net

	4. Marcus Boerger helly@users.sourceforge.net

	5. Hartmut Kaiser hkaiser@users.sourceforge.net

	6. Emmanuel Mogenet mgix@mgix.com (added storable state)

	7. Ulya	Trofimovich skvadrik@gmail.com

VERSION	INFORMATION
       This manpage describes re2c, version 0.14.3, package date 20 May	2015.

				  05/20/2015			       RE2C(1)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLE | OPTIONS | INTERFACE CODE | SYNTAX | REGULAR EXPRESSIONS | INPLACE CONFIGURATIONS | SCANNER WITH STORABLE STATES | SCANNER WITH CONDITION SUPPORT | ENCODINGS | GENERIC INPUT API | UNDERSTANDING RE2C | BUGS | SEE ALSO | AUTHORS | VERSION INFORMATION

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=re2c&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help