Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
REFLEX(1)		    General Commands Manual		     REFLEX(1)

       reflex -	fast lexical analyzer generator

       reflex  [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskele-
       ton] [--help --version] [filename ...]

       This manual describes reflex, a tool for	generating programs that  per-
       form  pattern-matching  on text.	 The manual includes both tutorial and
       reference sections:

	       a brief overview	of the tool

	   Some	Simple Examples

	   Format Of The Input File

	       the extended regular expressions	used by	reflex

	   How The Input Is Matched
	       the rules for determining what has been matched

	       how to specify what to do when a	pattern	is matched

	   The Generated Scanner
	       details regarding the scanner that reflex produces;
	       how to control the input	source

	   Start Conditions
	       introducing context into	your scanners, and
	       managing	"mini-scanners"

	   Multiple Input Buffers
	       how to manipulate multiple input	sources; how to
	       scan from strings instead of files

	   End-of-file Rules
	       special rules for matching the end of the input

	   Miscellaneous Macros
	       a summary of macros available to	the actions

	   Values Available To The User
	       a summary of values available to	the actions

	   Interfacing With Yacc
	       connecting reflex scanners together with	yacc parsers

	       reflex command-line options, and	the "%option"

	   Performance Considerations
	       how to make your	scanner	go as fast as possible

	   Generating C++ Scanners
	       the (experimental) facility for generating C++
	       scanner classes

	   Incompatibilities With Lex And POSIX
	       how reflex differs from AT&T lex	and the	POSIX lex

	       those error messages produced by	reflex (or scanners
	       it generates) whose meanings might not be apparent

	       files used by reflex

	   Deficiencies	/ Bugs
	       known problems with reflex

	   See Also
	       other documentation, related tools

	       includes	contact	information

       reflex is a tool	for generating	scanners:  programs  which  recognized
       lexical	patterns  in text.  reflex reads the given input files,	or its
       standard	input if no file names are given, for a	description of a scan-
       ner  to	generate.   The	description is in the form of pairs of regular
       expressions and C code, called rules. reflex generates as  output  a  C
       source  file,  lex.yy.c,	which defines a	routine	yylex().  This file is
       compiled	and linked with	the -lrefl library to produce  an  executable.
       When  the  executable  is run, it analyzes its input for	occurrences of
       the regular expressions.	 Whenever it finds one,	it executes the	corre-
       sponding	C code.

       First  some  simple  examples to	get the	flavor of how one uses reflex.
       The following reflex input specifies a scanner which  whenever  it  en-
       counters	 the  string  "username" will replace it with the user's login

	   username    printf( "%s", getlogin()	);

       By default, any text not	matched	by a reflex scanner is copied  to  the
       output,	so the net effect of this scanner is to	copy its input file to
       its output with each occurrence of "username" expanded.	In this	input,
       there  is just one rule.	 "username" is the pattern and the "printf" is
       the action.  The	"%%" marks the beginning of the	rules.

       Here's another simple example:

		   int num_lines = 0, num_chars	= 0;

	   \n	   ++num_lines;	++num_chars;
	   .	   ++num_chars;

		   printf( "# of lines = %d, # of chars	= %d\n",
			   num_lines, num_chars	);

       This scanner counts the number of characters and	the number of lines in
       its  input  (it	produces  no output other than the final report	on the
       counts).	  The  first  line  declares  two  globals,  "num_lines"   and
       "num_chars", which are accessible both inside yylex() and in the	main()
       routine declared	after the second "%%".	There are two rules, one which
       matches	a  newline  ("\n")  and	increments both	the line count and the
       character count,	and one	which matches any character other than a  new-
       line (indicated by the "." regular expression).

       A somewhat more complicated example:

	   /* scanner for a toy	Pascal-like language */

	   /* need this	for the	call to	atof() below */
	   #include <math.h>

	   DIGIT    [0-9]
	   ID	    [a-z][a-z0-9]*


	   {DIGIT}+    {
		       printf( "An integer: %s (%d)\n",	yytext,
			       atoi( yytext ) );

	   {DIGIT}+"."{DIGIT}*	      {
		       printf( "A float: %s (%g)\n", yytext,
			       atof( yytext ) );

	   if|then|begin|end|procedure|function	       {
		       printf( "A keyword: %s\n", yytext );

	   {ID}	       printf( "An identifier: %s\n", yytext );

	   "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );

	   "{"[^}\n]*"}"     /*	eat up one-line	comments */

	   [ \t\n]+	     /*	eat up whitespace */

	   .	       printf( "Unrecognized character:	%s\n", yytext );


	   main( argc, argv )
	   int argc;
	   char	**argv;
	       ++argv, --argc;	/* skip	over program name */
	       if ( argc > 0 )
		       yyin = fopen( argv[0], "r" );
		       yyin = stdin;


       This  is	the beginnings of a simple scanner for a language like Pascal.
       It identifies different types of	tokens and  reports  on	 what  it  has

       The  details  of	 this  example will be explained in the	following sec-

       The reflex input	file consists of three sections, separated by  a  line
       with just %% in it:

	   user	code

       The  definitions	 section  contains declarations	of simple name defini-
       tions to	simplify the scanner specification, and	declarations of	 start
       conditions, which are explained in a later section.

       Name definitions	have the form:

	   name	definition

       The  "name"  is	a  word	beginning with a letter	or an underscore ('_')
       followed	by zero	or more	letters, digits, '_', or '-' (dash).  The def-
       inition	is  taken to begin at the first	non-white-space	character fol-
       lowing the name and continuing to the end of the	line.  The  definition
       can  subsequently  be  referred to using	"{name}", which	will expand to
       "(definition)".	For example,

	   DIGIT    [0-9]
	   ID	    [a-z][a-z0-9]*

       defines "DIGIT" to be a	regular	 expression  which  matches  a	single
       digit,  and "ID"	to be a	regular	expression which matches a letter fol-
       lowed by	zero-or-more letters-or-digits.	 A subsequent reference	to


       is identical to


       and matches one-or-more digits followed by a '.'	followed  by  zero-or-
       more digits.

       The rules section of the	reflex input contains a	series of rules	of the

	   pattern   action

       where the pattern must be unindented and	the action must	begin  on  the
       same line.

       See below for a further description of patterns and actions.

       Finally,	 the  user code	section	is simply copied to lex.yy.c verbatim.
       It is used for companion	routines which call or are called by the scan-
       ner.   The  presence of this section is optional; if it is missing, the
       second %% in the	input file may be skipped, too.

       In the definitions and rules sections, any indented text	 or  text  en-
       closed  in  %{  and %} is copied	verbatim to the	output (with the %{}'s
       removed).  The %{}'s must appear	unindented on lines by themselves.

       In the rules section, any indented or %{}  text	appearing  before  the
       first  rule  may	 be  used  to declare variables	which are local	to the
       scanning	routine	and (after the declarations) code which	is to be  exe-
       cuted  whenever the scanning routine is entered.	 Other indented	or %{}
       text in the rule	section	is still copied	to the output, but its meaning
       is  not	well-defined  and  it may well cause compile-time errors (this
       feature is present for POSIX compliance;	see below for other such  fea-

       In  the	definitions  section  (but not in the rules section), an unin-
       dented comment (i.e., a line beginning with "/*") is also copied	verba-
       tim to the output up to the next	"*/".

       The  patterns in	the input are written using an extended	set of regular
       expressions.  These are:

	   x	      match the	character 'x'
	   .	      any character (byte) except newline
	   [xyz]      a	"character class"; in this case, the pattern
			matches	either an 'x', a 'y', or a 'z'
	   [abj-oZ]   a	"character class" with a range in it; matches
			an 'a',	a 'b', any letter from 'j' through 'o',
			or a 'Z'
	   [^A-Z]     a	"negated character class", i.e., any character
			but those in the class.	 In this case, any
			character EXCEPT an uppercase letter.
	   [^A-Z\n]   any character EXCEPT an uppercase	letter or
			a newline
	   r*	      zero or more r's,	where r	is any regular expression
	   r+	      one or more r's
	   r?	      zero or one r's (that is,	"an optional r")
	   r{2,5}     anywhere from two	to five	r's
	   r{2,}      two or more r's
	   r{4}	      exactly 4	r's
	   {name}     the expansion of the "name" definition
		      (see above)
		      the literal string: [xyz]"foo
	   \X	      if X is an 'a', 'b', 'f',	'n', 'r', 't', or 'v',
			then the ANSI-C	interpretation of \x.
			Otherwise, a literal 'X' (used to escape
			operators such as '*')
	   \0	      a	NUL character (ASCII code 0)
	   \123	      the character with octal value 123
	   \x2a	      the character with hexadecimal value 2a
	   (r)	      match an r; parentheses are used to override
			precedence (see	below)

	   rs	      the regular expression r followed	by the
			regular	expression s; called "concatenation"

	   r|s	      either an	r or an	s

	   r/s	      an r but only if it is followed by an s.	The
			text matched by	s is included when determining
			whether	this rule is the "longest match",
			but is then returned to	the input before
			the action is executed.	 So the	action only
			sees the text matched by r.  This type
			of pattern is called trailing context".
			(There are some	combinations of	r/s that reflex
			cannot match correctly;	see notes in the
			Deficiencies / Bugs section below regarding
			"dangerous trailing context".)
	   ^r	      an r, but	only at	the beginning of a line	(i.e.,
			which just starting to scan, or	right after a
			newline	has been scanned).
	   r$	      an r, but	only at	the end	of a line (i.e., just
			before a newline).  Equivalent to "r/\n".

		      Note that	reflex's notion	of "newline" is	exactly
		      whatever the C compiler used to compile reflex
		      interprets '\n' as; in particular, on some DOS
		      systems you must either filter out \r's in the
		      input yourself, or explicitly use	r/\r\n for "r$".

	   <s>r	      an r, but	only in	start condition	s (see
			below for discussion of	start conditions)
		      same, but	in any of start	conditions s1,
			s2, or s3
	   <*>r	      an r in any start	condition, even	an exclusive one.

	   <<EOF>>    an end-of-file
		      an end-of-file when in start condition s1	or s2

       Note that inside	of a character class, all regular expression operators
       lose  their special meaning except escape ('\') and the character class
       operators, '-', ']', and, at the	beginning of the class,	'^'.

       The regular expressions listed above are	grouped	 according  to	prece-
       dence,  from  highest  precedence  at  the top to lowest	at the bottom.
       Those grouped together have equal precedence.  For example,


       is the same as


       since the '*' operator has higher precedence  than  concatenation,  and
       concatenation  higher  than  alternation	('|').	This pattern therefore
       matches either the string "foo" or the string "ba" followed by zero-or-
       more r's.  To match "foo" or zero-or-more "bar"'s, use:


       and to match zero-or-more "foo"'s-or-"bar"'s:


       In  addition  to	characters and ranges of characters, character classes
       can also	contain	character class	expressions.   These  are  expressions
       enclosed	 inside	[: and :] delimiters (which themselves must appear be-
       tween the '[' and ']' of	the character class; other elements may	 occur
       inside the character class, too).  The valid expressions	are:

	   [:alnum:] [:alpha:] [:blank:]
	   [:cntrl:] [:digit:] [:graph:]
	   [:lower:] [:print:] [:punct:]
	   [:space:] [:upper:] [:xdigit:]

       These  expressions  all designate a set of characters equivalent	to the
       corresponding standard C	isXXX function.	 For example, [:alnum:]	desig-
       nates those characters for which	isalnum() returns true - i.e., any al-
       phabetic	or numeric.  Some systems don't	provide	isblank(),  so	reflex
       defines [:blank:] as a blank or a tab.

       For example, the	following character classes are	all equivalent:


       If  your	 scanner is case-insensitive (the -i flag), then [:upper:] and
       [:lower:] are equivalent	to [:alpha:].

       Some notes on patterns:

       o   A negated character class such as the example "[^A-Z]"  above  will
	   match  a  newline unless "\n" (or an	equivalent escape sequence) is
	   one of the characters explicitly present in the  negated  character
	   class  (e.g.,  "[^A-Z\n]").	 This is unlike	how many other regular
	   expression tools treat negated character classes, but unfortunately
	   the	inconsistency  is  historically	entrenched.  Matching newlines
	   means that a	pattern	like [^"]* can match the entire	 input	unless
	   there's another quote in the	input.

       o   A  rule  can	have at	most one instance of trailing context (the '/'
	   operator or the '$'	operator).   The  start	 condition,  '^',  and
	   "<<EOF>>"  patterns	can  only occur	at the beginning of a pattern,
	   and,	as well	as with	'/' and	'$', cannot be grouped	inside	paren-
	   theses.  A '^' which	does not occur at the beginning	of a rule or a
	   '$' which does not occur at the end of a  rule  loses  its  special
	   properties and is treated as	a normal character.

	   The following are illegal:


	   Note	that the first of these, can be	written	"foo/bar\n".

	   The	following  will	result in '$' or '^' being treated as a	normal


	   If what's wanted is a "foo"	or  a  bar-followed-by-a-newline,  the
	   following  could  be	 used (the special '|' action is explained be-

	       foo	|
	       bar$	/* action goes here */

	   A similar trick will	work for matching a foo	or a bar-at-the-begin-

       o   Character  classes  are  evaluated  when reflex processes the file,
	   rather than at the time the resulting scanner is run.

       When the	generated scanner is run, it analyzes its  input  looking  for
       strings	which  match  any  of its patterns.  If	it finds more than one
       match, it takes the one matching	the most text  (for  trailing  context
       rules,  this  includes  the length of the trailing part,	even though it
       will then be returned to	the input).  If	it finds two or	 more  matches
       of  the	same length, the rule listed first in the reflex input file is

       Once the	match is determined,  the  text	 corresponding	to  the	 match
       (called	the  token)  is	made available in the global character pointer
       yytext, and its length in the global integer yyleng.  The action	corre-
       sponding	 to  the matched pattern is then executed (a more detailed de-
       scription of actions follows), and then the remaining input is  scanned
       for another match.

       If no match is found, then the default rule is executed:	the next char-
       acter in	the input is considered	matched	and  copied  to	 the  standard
       output.	Thus, the simplest legal reflex	input is:


       which  generates	 a scanner that	simply copies its input	(one character
       at a time) to its output.

       Note that yytext	can be defined in two  different  ways:	 either	 as  a
       character pointer or as a character array.  You can control which defi-
       nition reflex uses by including one of the special directives  %pointer
       or %array in the	first (definitions) section of your reflex input.  The
       default is %pointer, unless you use the -l lex compatibility option, in
       which case yytext will be an array.  The	advantage of using %pointer is
       substantially faster scanning and no buffer overflow when matching very
       large  tokens (unless you run out of dynamic memory).  The disadvantage
       is that you are restricted in how your actions can modify  yytext  (see
       the  next  section),  and  calls	 to  the unput() function destroys the
       present contents	 of  yytext,  which  can  be  a	 considerable  porting
       headache	when moving between different lex versions.

       The  advantage  of  %array  is  that you	can then modify	yytext to your
       heart's content,	and calls to unput() do	not destroy  yytext  (see  be-
       low).   Furthermore,  existing lex programs sometimes access yytext ex-
       ternally	using declarations of the form:
	   extern char yytext[];
       This definition is erroneous when used with %pointer, but  correct  for

       %array  defines	yytext	to be an array of YYLMAX characters, which de-
       faults to a fairly large	value.	You can	change the size	by simply #de-
       fine'ing	 YYLMAX	 to a different	value in the first section of your re-
       flex input.  As mentioned above,	with %pointer yytext grows dynamically
       to  accommodate	large  tokens.	While this means your %pointer scanner
       can accommodate very large tokens (such as matching  entire  blocks  of
       comments),  bear	 in mind that each time	the scanner must resize	yytext
       it also must rescan the entire token from the  beginning,  so  matching
       such tokens can prove slow.  yytext presently does not dynamically grow
       if a call to unput() results in too much	text being  pushed  back;  in-
       stead, a	run-time error results.

       Also  note that you cannot use %array with C++ scanner classes (the c++
       option; see below).

       Each pattern in a rule has a corresponding action, which	can be any ar-
       bitrary	C statement.  The pattern ends at the first non-escaped	white-
       space character;	the remainder of the line is its action.  If  the  ac-
       tion is empty, then when	the pattern is matched the input token is sim-
       ply discarded.  For example, here is the	specification  for  a  program
       which deletes all occurrences of	"zap me" from its input:

	   "zap	me"

       (It  will  copy	all  other characters in the input to the output since
       they will be matched by the default rule.)

       Here is a program which compresses multiple blanks and tabs down	 to  a
       single blank, and throws	away whitespace	found at the end of a line:

	   [ \t]+	 putchar( ' ' );
	   [ \t]+$	 /* ignore this	token */

       If  the action contains a '{', then the action spans till the balancing
       '}' is found, and the action may	cross multiple	lines.	 reflex	 knows
       about C strings and comments and	won't be fooled	by braces found	within
       them, but also allows actions to	begin with %{ and  will	 consider  the
       action  to  be  all  the	text up	to the next %} (regardless of ordinary
       braces inside the action).

       An action consisting solely of a	vertical bar ('|') means "same as  the
       action for the next rule."  See below for an illustration.

       Actions	can  include  arbitrary	C code,	including return statements to
       return a	value to whatever routine called yylex().  Each	 time  yylex()
       is  called  it  continues processing tokens from	where it last left off
       until it	either reaches the end of the file or executes a return.

       Actions are free	to modify yytext except	 for  lengthening  it  (adding
       characters to its end--these will overwrite later characters in the in-
       put stream).  This however  does	 not  apply  when  using  %array  (see
       above); in that case, yytext may	be freely modified in any way.

       Actions	are  free to modify yyleng except they should not do so	if the
       action also includes use	of yymore() (see below).

       There are a number of special directives	which can be  included	within
       an action:

       o   ECHO	copies yytext to the scanner's output.

       o   BEGIN  followed by the name of a start condition places the scanner
	   in the corresponding	start condition	(see below).

       o   REJECT directs the scanner to proceed on to the "second best"  rule
	   which  matched  the	input (or a prefix of the input).  The rule is
	   chosen as described above in	"How the Input is Matched", and	yytext
	   and	yyleng	set  up	 appropriately.	  It  may  either be one which
	   matched as much text	as the originally chosen rule but  came	 later
	   in  the reflex input	file, or one which matched less	text.  For ex-
	   ample, the following	will both count	the words  in  the  input  and
	   call	the routine special() whenever "frob" is seen:

		       int word_count =	0;

	       frob	   special(); REJECT;
	       [^ \t\n]+   ++word_count;

	   Without  the	REJECT,	any "frob"'s in	the input would	not be counted
	   as words, since the scanner normally	executes only one  action  per
	   token.   Multiple  REJECT's	are allowed, each one finding the next
	   best	choice to the currently	active rule.  For  example,  when  the
	   following  scanner  scans  the token	"abcd",	it will	write "abcdab-
	   caba" to the	output:

	       a	|
	       ab	|
	       abc	|
	       abcd	ECHO; REJECT;
	       .|\n	/* eat up any unmatched	character */

	   (The	first three rules share	the fourth's action since they use the
	   special '|' action.)	 REJECT	is a particularly expensive feature in
	   terms of scanner performance; if it is used in any of the scanner's
	   actions  it will slow down all of the scanner's matching.  Further-
	   more, REJECT	cannot be used with the	-Cf or -CF  options  (see  be-

	   Note	 also  that  unlike  the  other	 special  actions, REJECT is a
	   branch; code	immediately following it in the	action will not	be ex-

       o   yymore()  tells  the	 scanner that the next time it matches a rule,
	   the corresponding token should be appended onto the	current	 value
	   of  yytext  rather than replacing it.  For example, given the input
	   "mega-kludge" the following will write  "mega-mega-kludge"  to  the

	       mega-	ECHO; yymore();
	       kludge	ECHO;

	   First  "mega-"  is matched and echoed to the	output.	 Then "kludge"
	   is matched, but the previous	"mega-"	is still hanging around	at the
	   beginning of	yytext so the ECHO for the "kludge" rule will actually
	   write "mega-kludge".

       Two notes regarding use of yymore().  First, yymore()  depends  on  the
       value  of yyleng	correctly reflecting the size of the current token, so
       you must	not modify yyleng if you  are  using  yymore().	  Second,  the
       presence	 of  yymore()  in the scanner's	action entails a minor perfor-
       mance penalty in	the scanner's matching speed.

       o   yyless(n) returns all but the first n characters of the current to-
	   ken back to the input stream, where they will be rescanned when the
	   scanner looks for the next match.  yytext and yyleng	 are  adjusted
	   appropriately (e.g.,	yyleng will now	be equal to n ).  For example,
	   on the input	"foobar" the following will write out "foobarbar":

	       foobar	 ECHO; yyless(3);
	       [a-z]+	 ECHO;

	   An argument of 0 to yyless will  cause  the	entire	current	 input
	   string  to be scanned again.	 Unless	you've changed how the scanner
	   will	subsequently process its input	(using	BEGIN,	for  example),
	   this	will result in an endless loop.

       Note  that  yyless  is a	macro and can only be used in the reflex input
       file, not from other source files.

       o   unput(c) puts the character c back onto the input stream.  It  will
	   be  the next	character scanned.  The	following action will take the
	   current token and cause it to be rescanned enclosed in parentheses.

	       int i;
	       /* Copy yytext because unput() trashes yytext */
	       char *yycopy = strdup( yytext );
	       unput( ')' );
	       for ( i = yyleng	- 1; i >= 0; --i )
		   unput( yycopy[i] );
	       unput( '(' );
	       free( yycopy );

	   Note	that since each	unput()	puts the given character back  at  the
	   beginning  of  the  input stream, pushing back strings must be done

       An important potential problem when using unput() is that  if  you  are
       using  %pointer	(the default), a call to unput() destroys the contents
       of yytext, starting with	its  rightmost	character  and	devouring  one
       character  to the left with each	call.  If you need the value of	yytext
       preserved after a call to unput() (as in	the above example),  you  must
       either  first copy it elsewhere,	or build your scanner using %array in-
       stead (see How The Input	Is Matched).

       Finally,	note that you cannot put back EOF to attempt to	mark the input
       stream with an end-of-file.

       o   input()  reads the next character from the input stream.  For exam-
	   ple,	the following is one way to eat	up C comments:

	       "/*"	   {
			   register int	c;

			   for ( ; ; )
			       while ( (c = input()) !=	'*' &&
				       c != EOF	)
				   ;	/* eat up text of comment */

			       if ( c == '*' )
				   while ( (c =	input()) == '*'	)
				   if (	c == '/' )
				       break;	 /* found the end */

			       if ( c == EOF )
				   error( "EOF in comment" );

	   (Note that if the scanner is	compiled using C++,  then  input()  is
	   instead  referred  to  as yyinput(),	in order to avoid a name clash
	   with	the C++	stream by the name of input.)

       o   YY_FLUSH_BUFFER flushes the scanner's internal buffer so  that  the
	   next	 time the scanner attempts to match a token, it	will first re-
	   fill	the buffer using YY_INPUT (see The Generated Scanner,  below).
	   This	action is a special case of the	more general yy_flush_buffer()
	   function, described below in	the section Multiple Input Buffers.

       o   yyterminate() can be	used in	lieu of	a return statement in  an  ac-
	   tion.   It  terminates the scanner and returns a 0 to the scanner's
	   caller, indicating "all done".  By default, yyterminate()  is  also
	   called  when	 an end-of-file	is encountered.	 It is a macro and may
	   be redefined.

       The output of reflex is the file	lex.yy.c, which	contains the  scanning
       routine yylex(),	a number of tables used	by it for matching tokens, and
       a number	of auxiliary routines and macros.  By default, yylex() is  de-
       clared as follows:

	   int yylex()
	       ... various definitions and the actions in here ...

       (If your	environment supports function prototypes, then it will be "int
       yylex( void )".)	 This  definition  may	be  changed  by	 defining  the
       "YY_DECL" macro.	 For example, you could	use:

	   #define YY_DECL float lexscan( a, b ) float a, b;

       to  give	 the scanning routine the name lexscan,	returning a float, and
       taking two floats as arguments.	Note that if you give arguments	to the
       scanning	routine	using a	K&R-style/non-prototyped function declaration,
       you must	terminate the definition with a	semi-colon (;).

       Whenever	yylex()	is called, it scans tokens from	the global input  file
       yyin  (which  defaults to stdin).  It continues until it	either reaches
       an end-of-file (at which	point it returns the value 0) or  one  of  its
       actions executes	a return statement.

       If  the	scanner	reaches	an end-of-file,	subsequent calls are undefined
       unless either yyin is pointed at	a new input file (in which case	 scan-
       ning  continues from that file),	or yyrestart() is called.  yyrestart()
       takes one argument, a FILE * pointer (which can be nil, if  you've  set
       up  YY_INPUT  to	 scan  from a source other than	yyin), and initializes
       yyin for	scanning from that file.  Essentially there is	no  difference
       between just assigning yyin to a	new input file or using	yyrestart() to
       do so; the latter is available for compatibility	with previous versions
       of reflex, and because it can be	used to	switch input files in the mid-
       dle of scanning.	 It can	also be	used to	throw away the	current	 input
       buffer,	by  calling  it	with an	argument of yyin; but better is	to use
       YY_FLUSH_BUFFER (see above).  Note that yyrestart() does	not reset  the
       start condition to INITIAL (see Start Conditions, below).

       If yylex() stops	scanning due to	executing a return statement in	one of
       the actions, the	scanner	may then be called again and  it  will	resume
       scanning	where it left off.

       By  default  (and  for purposes of efficiency), the scanner uses	block-
       reads rather than simple	getc() calls to	 read  characters  from	 yyin.
       The  nature  of how it gets its input can be controlled by defining the
       YY_INPUT	 macro.	  YY_INPUT's  calling  sequence	 is  "YY_INPUT(buf,re-
       sult,max_size)".	  Its  action is to place up to	max_size characters in
       the character array buf and return in the integer variable  result  ei-
       ther  the  number of characters read or the constant YY_NULL (0 on Unix
       systems)	to indicate EOF.  The default YY_INPUT reads from  the	global
       file-pointer "yyin".

       A  sample definition of YY_INPUT	(in the	definitions section of the in-
       put file):

	   #define YY_INPUT(buf,result,max_size) \
	       { \
	       int c = getchar(); \
	       result =	(c == EOF) ? YY_NULL : (buf[0] = c, 1);	\

       This definition will change the input processing	to occur one character
       at a time.

       When  the  scanner receives an end-of-file indication from YY_INPUT, it
       then checks the yywrap()	function.  If yywrap() returns	false  (zero),
       then  it	is assumed that	the function has gone ahead and	set up yyin to
       point to	another	input file, and	scanning  continues.   If  it  returns
       true  (non-zero),  then	the  scanner  terminates,  returning  0	to its
       caller.	Note that in either case,  the	start  condition  remains  un-
       changed;	it does	not revert to INITIAL.

       If you do not supply your own version of	yywrap(), then you must	either
       use %option noyywrap (in	which case the scanner behaves as  though  yy-
       wrap()  returned	1), or you must	link with -lrefl to obtain the default
       version of the routine, which always returns 1.

       Three routines are available for	scanning from in-memory	buffers	rather
       than  files:  yy_scan_string(),	yy_scan_bytes(), and yy_scan_buffer().
       See the discussion of them below	in the section Multiple	Input Buffers.

       The scanner writes its ECHO output to the yyout global  (default,  std-
       out), which may be redefined by the user	simply by assigning it to some
       other FILE pointer.

       reflex provides a mechanism for conditionally  activating  rules.   Any
       rule whose pattern is prefixed with "<sc>" will only be active when the
       scanner is in the start condition named "sc".  For example,

	   <STRING>[^"]*	{ /* eat up the	string body ...	*/

       will be active only when	the scanner is in the  "STRING"	 start	condi-
       tion, and

	   <INITIAL,STRING,QUOTE>\.	   { /*	handle an escape ... */

       will  be	 active	 only when the current start condition is either "INI-
       TIAL", "STRING",	or "QUOTE".

       Start conditions	are declared in	the definitions	(first)	section	of the
       input using unindented lines beginning with either %s or	%x followed by
       a list of names.	 The former declares inclusive start  conditions,  the
       latter  exclusive start conditions.  A start condition is activated us-
       ing the BEGIN action.  Until the	next BEGIN action is  executed,	 rules
       with  the  given	 start	condition  will	be active and rules with other
       start conditions	will be	inactive.  If the start	 condition  is	inclu-
       sive,  then  rules with no start	conditions at all will also be active.
       If it is	exclusive, then	only rules qualified with the start  condition
       will  be	active.	 A set of rules	contingent on the same exclusive start
       condition describe a scanner which is independent of any	of  the	 other
       rules in	the reflex input.  Because of this, exclusive start conditions
       make it easy to specify "mini-scanners" which scan portions of the  in-
       put that	are syntactically different from the rest (e.g., comments).

       If  the distinction between inclusive and exclusive start conditions is
       still a little vague, here's a simple example illustrating the  connec-
       tion between the	two.  The set of rules:

	   %s example

	   <example>foo	  do_something();

	   bar		  something_else();

       is equivalent to

	   %x example

	   <example>foo	  do_something();

	   <INITIAL,example>bar	   something_else();

       Without	the <INITIAL,example> qualifier, the bar pattern in the	second
       example wouldn't	be active (i.e., couldn't match) when in start	condi-
       tion  example.	If we just used	<example> to qualify bar, though, then
       it would	only be	active in example and not in  INITIAL,	while  in  the
       first example it's active in both, because in the first example the ex-
       ample startion condition	is an inclusive	(%s) start condition.

       Also note that the special start-condition specifier <*>	matches	 every
       start condition.	 Thus, the above example could also have been written;

	   %x example

	   <example>foo	  do_something();

	   <*>bar    something_else();

       The  default  rule  (to ECHO any	unmatched character) remains active in
       start conditions.  It is	equivalent to:

	   <*>.|\n     ECHO;

       BEGIN(0)	returns	to the original	state where only  the  rules  with  no
       start conditions	are active.  This state	can also be referred to	as the
       start-condition "INITIAL", so BEGIN(INITIAL) is equivalent to BEGIN(0).
       (The  parentheses  around the start condition name are not required but
       are considered good style.)

       BEGIN actions can also be given as indented code	at  the	 beginning  of
       the  rules  section.  For example, the following	will cause the scanner
       to enter	the "SPECIAL" start condition whenever yylex() is  called  and
       the global variable enter_special is true:

		   int enter_special;

	   %x SPECIAL
		   if (	enter_special )

	   ...more rules follow...

       To  illustrate  the  uses  of start conditions, here is a scanner which
       provides	two different interpretations of a string like "123.456".   By
       default it will treat it	as three tokens:

       o   the integer "123",

       o   a dot ('.'),	and

       o   the integer "456".

       But  if	the  string is preceded	earlier	in the line by the string "ex-
       pect-floats" it will treat it as	a  single  token,  the	floating-point
       number 123.456:

	   #include <math.h>
	   %s expect

	   expect-floats	BEGIN(expect);

	   <expect>[0-9]+"."[0-9]+	{
		       printf( "found a	float, = %f\n",
			       atof( yytext ) );
	   <expect>\n		{
		       /* that's the end of the	line, so
			* we need another "expect-number"
			* before we'll recognize any more
			* numbers

	   [0-9]+      {
		       printf( "found an integer, = %d\n",
			       atoi( yytext ) );

	   "."	       printf( "found a	dot\n" );

       Here  is	 a  scanner  which  recognizes (and discards) C	comments while
       maintaining a count of the current input	line.

	   %x comment
		   int line_num	= 1;

	   "/*"		BEGIN(comment);

	   <comment>[^*\n]*	   /* eat anything that's not a	'*' */
	   <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
	   <comment>\n		   ++line_num;
	   <comment>"*"+"/"	   BEGIN(INITIAL);

       This scanner goes to a bit of trouble to	match as much text as possible
       with  each  rule.   In  general,	 when attempting to write a high-speed
       scanner try to match as much possible in	each rule, as it's a big win.

       Note that start-conditions names	are really integer values and  can  be
       stored  as  such.   Thus,  the above could be extended in the following

	   %x comment foo
		   int line_num	= 1;
		   int comment_caller;

	   "/*"		{
			comment_caller = INITIAL;


	   <foo>"/*"	{
			comment_caller = foo;

	   <comment>[^*\n]*	   /* eat anything that's not a	'*' */
	   <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
	   <comment>\n		   ++line_num;
	   <comment>"*"+"/"	   BEGIN(comment_caller);

       Furthermore, you	can access the current start condition using the inte-
       ger-valued  YY_START macro.  For	example, the above assignments to com-
       ment_caller could instead be written

	   comment_caller = YY_START;

       Reflex provides YYSTATE as an alias for YY_START	(since that is	what's
       used by AT&T lex).

       Note  that  start conditions do not have	their own name-space; %s's and
       %x's declare names in the same fashion as #define's.

       Finally,	here's an example of how to match C-style quoted strings using
       exclusive  start	 conditions,  including	expanded escape	sequences (but
       not including checking for a string that's too long):

	   %x str

		   char	string_buf[MAX_STR_CONST];
		   char	*string_buf_ptr;

	   \"	   string_buf_ptr = string_buf;	BEGIN(str);

	   <str>\"	  { /* saw closing quote - all done */
		   *string_buf_ptr = '\0';
		   /* return string constant token type	and
		    * value to parser

	   <str>\n	  {
		   /* error - unterminated string constant */
		   /* generate error message */

	   <str>\\[0-7]{1,3} {
		   /* octal escape sequence */
		   int result;

		   (void) sscanf( yytext + 1, "%o", &result );

		   if (	result > 0xff )
			   /* error, constant is out-of-bounds */

		   *string_buf_ptr++ = result;

	   <str>\\[0-9]+ {
		   /* generate error - bad escape sequence; something
		    * like '\48' or '\0777777'

	   <str>\\n  *string_buf_ptr++ = '\n';
	   <str>\\t  *string_buf_ptr++ = '\t';
	   <str>\\r  *string_buf_ptr++ = '\r';
	   <str>\\b  *string_buf_ptr++ = '\b';
	   <str>\\f  *string_buf_ptr++ = '\f';

	   <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];

	   <str>[^\\\n\"]+	  {
		   char	*yptr =	yytext;

		   while ( *yptr )
			   *string_buf_ptr++ = *yptr++;

       Often, such as in some of the examples above, you  wind	up  writing  a
       whole  bunch of rules all preceded by the same start condition(s).  Re-
       flex makes this a little	easier and cleaner by introducing a notion  of
       start condition scope.  A start condition scope is begun	with:


       where  SCs is a list of one or more start conditions.  Inside the start
       condition scope,	every rule automatically has the prefix	_SCs_  applied
       to it, until a '}' which	matches	the initial '{'.  So, for example,

	       "\\n"   return '\n';
	       "\\r"   return '\r';
	       "\\f"   return '\f';
	       "\\0"   return '\0';

       is equivalent to:

	   <ESC>"\\n"  return '\n';
	   <ESC>"\\r"  return '\r';
	   <ESC>"\\f"  return '\f';
	   <ESC>"\\0"  return '\0';

       Start condition scopes may be nested.

       Three  routines	are  available for manipulating	stacks of start	condi-

       void yy_push_state(int new_state)
	      pushes the current start condition onto the  top	of  the	 start
	      condition	stack and switches to new_state	as though you had used
	      BEGIN new_state (recall that start condition names are also  in-

       void yy_pop_state()
	      pops the top of the stack	and switches to	it via BEGIN.

       int yy_top_state()
	      returns  the  top	of the stack without altering the stack's con-

       The start condition stack grows dynamically and so has no built-in size
       limitation.  If memory is exhausted, program execution aborts.

       To  use	start  condition  stacks,  your	scanner	must include a %option
       stack directive (see Options below).

       Some scanners (such as those which  support  "include"  files)  require
       reading	from  several  input  streams.	 As reflex scanners do a large
       amount of buffering, one	cannot control where the next  input  will  be
       read  from by simply writing a YY_INPUT which is	sensitive to the scan-
       ning context.  YY_INPUT is only called when the scanner reaches the end
       of its buffer, which may	be a long time after scanning a	statement such
       as an "include" which requires switching	the input source.

       To negotiate these sorts	of problems, reflex provides a	mechanism  for
       creating	and switching between multiple input buffers.  An input	buffer
       is created by using:

	   YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )

       which takes a FILE pointer and a	size and creates a  buffer  associated
       with  the  given	file and large enough to hold size characters (when in
       doubt, use YY_BUF_SIZE for the size).   It  returns  a  YY_BUFFER_STATE
       handle,	which  may  then be passed to other routines (see below).  The
       YY_BUFFER_STATE type is a pointer to an opaque  struct  yy_buffer_state
       structure,  so  you  may	safely initialize YY_BUFFER_STATE variables to
       ((YY_BUFFER_STATE) 0) if	you wish, and also refer to the	opaque	struc-
       ture  in	order to correctly declare input buffers in source files other
       than that of your scanner.  Note	that the FILE pointer in the  call  to
       yy_create_buffer	is only	used as	the value of yyin seen by YY_INPUT; if
       you redefine YY_INPUT so	it no longer uses yyin,	then  you  can	safely
       pass  a	nil FILE pointer to yy_create_buffer.  You select a particular
       buffer to scan from using:

	   void	yy_switch_to_buffer( YY_BUFFER_STATE new_buffer	)

       switches	the scanner's input buffer so subsequent tokens	will come from
       new_buffer.  Note that yy_switch_to_buffer() may	be used	by yywrap() to
       set things up for continued scanning, instead of	opening	a new file and
       pointing	yyin at	it.  Note also that switching input sources via	either
       yy_switch_to_buffer() or	yywrap() does not change the start condition.

	   void	yy_delete_buffer( YY_BUFFER_STATE buffer )

       is used to reclaim the storage associated with a	buffer.	 ( buffer  can
       be  nil,	 in  which case	the routine does nothing.)  You	can also clear
       the current contents of a buffer	using:

	   void	yy_flush_buffer( YY_BUFFER_STATE buffer	)

       This function discards the buffer's contents,  so  the  next  time  the
       scanner	attempts  to match a token from	the buffer, it will first fill
       the buffer anew using YY_INPUT.

       yy_new_buffer() is an alias for yy_create_buffer(), provided  for  com-
       patibility with the C++ use of new and delete for creating and destroy-
       ing dynamic objects.

       Finally,	the YY_CURRENT_BUFFER macro returns a  YY_BUFFER_STATE	handle
       to the current buffer.

       Here  is	an example of using these features for writing a scanner which
       expands include files (the <<EOF>> feature is discussed below):

	   /* the "incl" state is used for picking up the name
	    * of an include file
	   %x incl

	   #define MAX_INCLUDE_DEPTH 10
	   int include_stack_ptr = 0;

	   include	       BEGIN(incl);

	   [a-z]+	       ECHO;
	   [^a-z\n]*\n?	       ECHO;

	   <incl>[ \t]*	     /*	eat the	whitespace */
	   <incl>[^ \t\n]+   { /* got the include file name */
		   if (	include_stack_ptr >= MAX_INCLUDE_DEPTH )
		       fprintf(	stderr,	"Includes nested too deeply" );
		       exit( 1 );

		   include_stack[include_stack_ptr++] =

		   yyin	= fopen( yytext, "r" );

		   if (	! yyin )
		       error( ... );

		       yy_create_buffer( yyin, YY_BUF_SIZE ) );


	   <<EOF>> {
		   if (	--include_stack_ptr < 0	)

		       yy_delete_buffer( YY_CURRENT_BUFFER );
			    include_stack[include_stack_ptr] );

       Three routines are available for	setting	up input buffers for  scanning
       in-memory  strings  instead  of	files.	All of them create a new input
       buffer for scanning the string,	and  return  a	corresponding  YY_BUF-
       FER_STATE  handle (which	you should delete with yy_delete_buffer() when
       done  with  it).	  They	also  switch   to   the	  new	buffer	 using
       yy_switch_to_buffer(),  so the next call	to yylex() will	start scanning
       the string.

       yy_scan_string(const char *str)
	      scans a NUL-terminated string.

       yy_scan_bytes(const char	*bytes,	int len)
	      scans len	bytes (including possibly NUL's) starting at  location

       Note  that both of these	functions create and scan a copy of the	string
       or bytes.  (This	may be desirable, since	yylex()	modifies the  contents
       of the buffer it	is scanning.)  You can avoid the copy by using:

       yy_scan_buffer(char *base, yy_size_t size)
	      which  scans in place the	buffer starting	at base, consisting of
	      size bytes, the last two bytes of	which must  be	YY_END_OF_BUF-
	      FER_CHAR	(ASCII	NUL).	These  last two	bytes are not scanned;
	      thus, scanning consists of base[0] through base[size-2],	inclu-

	      If  you fail to set up base in this manner (i.e.,	forget the fi-
	      nal two YY_END_OF_BUFFER_CHAR bytes), then yy_scan_buffer()  re-
	      turns a nil pointer instead of creating a	new input buffer.

	      The  type	yy_size_t is an	integral type to which you can cast an
	      integer expression reflecting the	size of	the buffer.

       The special rule	"<<EOF>>" indicates actions which are to be taken when
       an  end-of-file is encountered and yywrap() returns non-zero (i.e., in-
       dicates no further files	to process).  The action must finish by	 doing
       one of four things:

       o   assigning yyin to a new input file (in previous versions of reflex,
	   after doing the assignment you  had	to  call  the  special	action
	   YY_NEW_FILE;	this is	no longer necessary);

       o   executing a return statement;

       o   executing the special yyterminate() action;

       o   or,	switching to a new buffer using	yy_switch_to_buffer() as shown
	   in the example above.

       <<EOF>> rules may not be	used with other	patterns;  they	 may  only  be
       qualified  with	a list of start	conditions.  If	an unqualified <<EOF>>
       rule is given, it applies to all	start conditions which do not  already
       have  <<EOF>> actions.  To specify an <<EOF>> rule for only the initial
       start condition,	use


       These rules are useful for catching things like unclosed	comments.   An

	   %x quote

	   ...other rules for dealing with quotes...

	   <quote><<EOF>>   {
		    error( "unterminated quote"	);
	   <<EOF>>  {
		    if ( *++filelist )
			yyin = fopen( *filelist, "r" );

       The  macro  YY_USER_ACTION can be defined to provide an action which is
       always executed prior to	the matched rule's action.   For  example,  it
       could  be  #define'd to call a routine to convert yytext	to lower-case.
       When YY_USER_ACTION is invoked, the variable yy_act gives the number of
       the  matched  rule  (rules  are numbered	starting with 1).  Suppose you
       want to profile how often each of your rules is matched.	 The following
       would do	the trick:

	   #define YY_USER_ACTION ++ctr[yy_act]

       where ctr is an array to	hold the counts	for the	different rules.  Note
       that the	macro YY_NUM_RULES gives the total number of rules  (including
       the default rule, even if you use -s), so a correct declaration for ctr

	   int ctr[YY_NUM_RULES];

       The macro YY_USER_INIT may be defined to	provide	an action which	is al-
       ways  executed before the first scan (and before	the scanner's internal
       initializations are done).  For example,	it could be  used  to  call  a
       routine to read in a data table or open a logging file.

       The  macro  yy_set_interactive(is_interactive)  can  be used to control
       whether the current buffer is considered	interactive.   An  interactive
       buffer  is  processed  more slowly, but must be used when the scanner's
       input source is indeed interactive to avoid problems due	to waiting  to
       fill  buffers  (see  the	 discussion of the -I flag below).  A non-zero
       value in	the macro invocation marks the buffer as interactive,  a  zero
       value  as  non-interactive.  Note that use of this macro	overrides %op-
       tion always-interactive or %option never-interactive (see  Options  be-
       low).   yy_set_interactive() must be invoked prior to beginning to scan
       the buffer that is (or is not) to be considered interactive.

       The macro yy_set_bol(at_bol) can	be used	to control whether the current
       buffer's	scanning context for the next token match is done as though at
       the beginning of	a line.	 A non-zero macro  argument  makes  rules  an-
       chored with '^' active, while a zero argument makes '^' rules inactive.

       The  macro  YY_AT_BOL() returns true if the next	token scanned from the
       current buffer will have	'^' rules active, false	otherwise.

       In the generated	scanner, the actions are all  gathered	in  one	 large
       switch  statement and separated using YY_BREAK, which may be redefined.
       By default, it is simply	a "break", to separate each rule's action from
       the  following  rule's.	 Redefining  YY_BREAK allows, for example, C++
       users to	#define	YY_BREAK to do nothing (while being very careful  that
       every  rule ends	with a "break" or a "return"!) to avoid	suffering from
       unreachable statement warnings where because a rule's action ends  with
       "return", the YY_BREAK is inaccessible.

       This section summarizes the various values available to the user	in the
       rule actions.

       o   char	*yytext	holds the text of the current token.  It may be	 modi-
	   fied	but not	lengthened (you	cannot append characters to the	end).

	   If the special directive %array appears in the first	section	of the
	   scanner description,	then  yytext  is  instead  declared  char  yy-
	   text[YYLMAX], where YYLMAX is a macro definition that you can rede-
	   fine	in the first section if	you don't like the default value (gen-
	   erally 8KB).	 Using %array results in somewhat slower scanners, but
	   the value of	yytext becomes immune to calls to input() and unput(),
	   which  potentially  destroy	its  value  when yytext	is a character
	   pointer.  The opposite of %array is %pointer, which is the default.

	   You cannot use %array when generating C++ scanner classes  (the  -+

       o   int yyleng holds the	length of the current token.

       o   FILE	 *yyin is the file which by default reflex reads from.	It may
	   be redefined	but doing so only makes	sense before  scanning	begins
	   or  after an	EOF has	been encountered.  Changing it in the midst of
	   scanning will have unexpected results since reflex buffers its  in-
	   put;	 use yyrestart() instead.  Once	scanning terminates because an
	   end-of-file has been	seen, you can assign yyin  at  the  new	 input
	   file	and then call the scanner again	to continue scanning.

       o   void	yyrestart( FILE	*new_file ) may	be called to point yyin	at the
	   new input file.  The	switch-over to the new file is immediate  (any
	   previously	buffered-up   input   is  lost).   Note	 that  calling
	   yyrestart() with yyin as an argument	thus throws away  the  current
	   input buffer	and continues scanning the same	input file.

       o   FILE	 *yyout	is the file to which ECHO actions are done.  It	can be
	   reassigned by the user.

       o   YY_CURRENT_BUFFER returns a YY_BUFFER_STATE handle to  the  current

       o   YY_START  returns  an  integer  value  corresponding	to the current
	   start condition.  You can subsequently use this value with BEGIN to
	   return to that start	condition.

       One  of	the  main uses of reflex is as a companion to the yacc parser-
       generator.  yacc	parsers	expect to call a routine named yylex() to find
       the  next  input	 token.	 The routine is	supposed to return the type of
       the next	token as well as putting any associated	value  in  the	global
       yylval.	 To  use reflex	with yacc, one specifies the -d	option to yacc
       to instruct it to generate the file containing  definitions  of
       all  the	 %tokens  appearing  in	the yacc input.	 This file is then in-
       cluded in the reflex scanner.  For example, if one  of  the  tokens  is
       "TOK_NUMBER", part of the scanner might look like:

	   #include ""


	   [0-9]+	 yylval	= atoi(	yytext ); return TOK_NUMBER;

       reflex has the following	options:

       -b     Generate	backing-up  information	to lex.backup.	This is	a list
	      of scanner states	which require backing up and the input charac-
	      ters  on which they do so.  By adding rules one can remove back-
	      ing-up states.  If all backing-up	states are eliminated and  -Cf
	      or  -CF  is used,	the generated scanner will run faster (see the
	      -p flag).	 Only users who	wish to	squeeze	every last  cycle  out
	      of  their	 scanners need worry about this	option.	 (See the sec-
	      tion on Performance Considerations below.)

       -c     is a do-nothing, deprecated option included  for	POSIX  compli-

       -d     makes  the generated scanner run in debug	mode.  Whenever	a pat-
	      tern is recognized and  the  global  yy_flex_debug  is  non-zero
	      (which  is the default), the scanner will	write to stderr	a line
	      of the form:

		  --accepting rule at line 53 ("the matched text")

	      The line number refers to	the location of	the rule in  the  file
	      defining	the  scanner  (i.e., the file that was fed to reflex).
	      Messages are also	generated when the scanner backs  up,  accepts
	      the  default  rule,  reaches the end of its input	buffer (or en-
	      counters a NUL; at this point, the two look the same as  far  as
	      the scanner's concerned),	or reaches an end-of-file.

       -f     specifies	 fast scanner.	No table compression is	done and stdio
	      is bypassed.  The	result is large	 but  fast.   This  option  is
	      equivalent to -Cfr (see below).

       -h     generates	 a  "help"  summary  of	reflex's options to stdout and
	      then exits.  -?  and --help are synonyms for -h.

       -i     instructs	reflex to generate a  case-insensitive	scanner.   The
	      case  of	letters	given in the reflex input patterns will	be ig-
	      nored, and tokens	in the input will  be  matched	regardless  of
	      case.   The matched text given in	yytext will have the preserved
	      case (i.e., it will not be folded).

       -l     turns on maximum compatibility with the original AT&T lex	imple-
	      mentation.   Note	 that  this  does not mean full	compatibility.
	      Use of this option costs a considerable amount  of  performance,
	      and  it cannot be	used with the -+, -f, -F, -Cf, or -CF options.
	      For details on the compatibilities it provides, see the  section
	      "Incompatibilities  With Lex And POSIX" below.  This option also
	      results in the name YY_FLEX_LEX_COMPAT being  #define'd  in  the
	      generated	scanner.

       -n     is another do-nothing, deprecated	option included	only for POSIX

       -p     generates	a performance report to	stderr.	 The  report  consists
	      of  comments  regarding  features	of the reflex input file which
	      will cause a serious loss	of performance in the resulting	 scan-
	      ner.  If you give	the flag twice,	you will also get comments re-
	      garding features that lead to minor performance losses.

	      Note that	the use	of  REJECT,  %option  yylineno,	 and  variable
	      trailing context (see the	Deficiencies / Bugs section below) en-
	      tails a substantial performance penalty; use of yymore(),	the  ^
	      operator,	and the	-I flag	entail minor performance penalties.

       -s     causes  the default rule (that unmatched scanner input is	echoed
	      to stdout) to be suppressed.  If the  scanner  encounters	 input
	      that  does  not match any	of its rules, it aborts	with an	error.
	      This option is useful for	finding	holes in a scanner's rule set.

       -t     instructs	reflex to write	the scanner it generates  to  standard
	      output instead of	lex.yy.c.

       -v     specifies	 that  reflex should write to stderr a summary of sta-
	      tistics regarding	the scanner it generates.  Most	of the statis-
	      tics  are	 meaningless  to the casual reflex user, but the first
	      line identifies the version of reflex (same as reported by  -V),
	      and  the	next  line the flags used when generating the scanner,
	      including	those that are on by default.

       -w     suppresses warning messages.

       -B     instructs	reflex to generate a batch scanner,  the  opposite  of
	      interactive  scanners  generated by -I (see below).  In general,
	      you use -B when you are certain that your	scanner	will never  be
	      used  interactively,  and	you want to squeeze a little more per-
	      formance out of it.  If your goal	is instead to  squeeze	out  a
	      lot  more	 performance,  you should  be using the	-Cf or -CF op-
	      tions (discussed below), which turn on -B	automatically anyway.

       -F     specifies	that the fast scanner table representation  should  be
	      used (and	stdio bypassed).  This representation is about as fast
	      as the full table	representation (-f), and for some sets of pat-
	      terns will be considerably smaller (and for others, larger).  In
	      general, if the pattern  set  contains  both  "keywords"	and  a
	      catch-all, "identifier" rule, such as in the set:

		  "case"    return TOK_CASE;
		  "switch"  return TOK_SWITCH;
		  "default" return TOK_DEFAULT;
		  [a-z]+    return TOK_ID;

	      then  you're better off using the	full table representation.  If
	      only the "identifier" rule is present and	you then  use  a  hash
	      table or some such to detect the keywords, you're	better off us-
	      ing -F.

	      This option is equivalent	to -CFr	(see  below).	It  cannot  be
	      used with	-+.

       -I     instructs	 reflex	to generate an interactive scanner.  An	inter-
	      active scanner is	one that only looks ahead to decide what token
	      has  been	 matched if it absolutely must.	 It turns out that al-
	      ways looking one extra character ahead, even if the scanner  has
	      already seen enough text to disambiguate the current token, is a
	      bit faster than only looking ahead when necessary.  But scanners
	      that  always  look  ahead	give dreadful interactive performance;
	      for example, when	a user types a newline,	it is  not  recognized
	      as  a  newline token until they enter another token, which often
	      means typing in another whole line.

	      Reflex scanners default to interactive unless you	use the	-Cf or
	      -CF  table-compression  options  (see below).  That's because if
	      you're looking for high-performance you should be	using  one  of
	      these  options,  so  if  you didn't, reflex assumes you'd	rather
	      trade off	a bit of run-time performance for  intuitive  interac-
	      tive  behavior.  Note also that you cannot use -I	in conjunction
	      with -Cf or -CF.	Thus, this option is not really	needed;	it  is
	      on by default for	all those cases	in which it is allowed.

	      You  can	force a	scanner	to not be interactive by using -B (see

       -L     instructs	reflex not to generate #line directives.  Without this
	      option,  reflex  peppers the generated scanner with #line	direc-
	      tives so error messages in the actions will be correctly located
	      with  respect  to	 either	the original reflex input file (if the
	      errors are due to	code in	the input file), or lex.yy.c  (if  the
	      errors  are  reflex's  fault -- you should report	these sorts of
	      errors to	the email address given	below).

       -T     makes reflex run in trace	mode.  It will generate	a lot of  mes-
	      sages  to	stderr concerning the form of the input	and the	resul-
	      tant non-deterministic and deterministic finite automata.	  This
	      option is	mostly for use in maintaining reflex.

       -V     prints  the  version number to stdout and	exits.	--version is a
	      synonym for -V.

       -7     instructs	reflex to generate a 7-bit scanner,  i.e.,  one	 which
	      can  only	 recognized 7-bit characters in	its input.  The	advan-
	      tage of using -7 is that the scanner's tables can	be up to  half
	      the  size	 of  those  generated using the	-8 option (see below).
	      The disadvantage is that such scanners often hang	 or  crash  if
	      their input contains an 8-bit character.

	      Note,  however,  that unless you generate	your scanner using the
	      -Cf or -CF table compression options, use	of -7 will save	only a
	      small  amount of table space, and	make your scanner considerably
	      less portable.  Reflex's default	behavior  is  to  generate  an
	      8-bit  scanner  unless you use the -Cf or	-CF, in	which case re-
	      flex defaults to generating 7-bit	scanners unless	your site  was
	      always  configured  to generate 8-bit scanners (as will often be
	      the case with non-USA sites).  You can tell whether reflex  gen-
	      erated  a	 7-bit or an 8-bit scanner by inspecting the flag sum-
	      mary in the -v output as described above.

	      Note that	if you use -Cfe	or -CFe	(those table  compression  op-
	      tions,  but  also	using equivalence classes as discussed see be-
	      low), reflex still defaults  to  generating  an  8-bit  scanner,
	      since  usually  with these compression options full 8-bit	tables
	      are not much more	expensive than 7-bit tables.

       -8     instructs	reflex to generate an 8-bit scanner, i.e.,  one	 which
	      can  recognize  8-bit  characters.  This flag is only needed for
	      scanners generated using -Cf or -CF,  as	otherwise  reflex  de-
	      faults to	generating an 8-bit scanner anyway.

	      See the discussion of -7 above for reflex's default behavior and
	      the tradeoffs between 7-bit and 8-bit scanners.

       -+     specifies	that you want reflex to	generate a C++ scanner	class.
	      See the section on Generating C++	Scanners below for details.

	      controls	the  degree  of	table compression and, more generally,
	      trade-offs between small scanners	and fast scanners.

	      -Ca ("align") instructs reflex to	trade off larger tables	in the
	      generated	scanner	for faster performance because the elements of
	      the tables are better aligned for	memory access and computation.
	      On  some RISC architectures, fetching and	manipulating longwords
	      is more efficient	than with smaller-sized	units such  as	short-
	      words.   This  option  can double	the size of the	tables used by
	      your scanner.

	      -Ce directs reflex to construct equivalence classes, i.e.,  sets
	      of characters which have identical lexical properties (for exam-
	      ple, if the only appearance of digits in the reflex input	is  in
	      the  character  class "[0-9]" then the digits '0', '1', ..., '9'
	      will all be put in the  same  equivalence	 class).   Equivalence
	      classes  usually give dramatic reductions	in the final table/ob-
	      ject file	sizes (typically a factor of 2-5) and are pretty cheap
	      performance-wise (one array look-up per character	scanned).

	      -Cf specifies that the full scanner tables should	be generated -
	      reflex should not	compress the tables by	taking	advantages  of
	      similar transition functions for different states.

	      -CF  specifies  that  the	 alternate fast	scanner	representation
	      (described above under the -F flag) should be used.  This	option
	      cannot be	used with -+.

	      -Cm  directs reflex to construct meta-equivalence	classes, which
	      are sets of equivalence classes (or characters,  if  equivalence
	      classes  are  not	 being	used) that are commonly	used together.
	      Meta-equivalence classes are often a big	win  when  using  com-
	      pressed tables, but they have a moderate performance impact (one
	      or two "if" tests	and one	array look-up per character scanned).

	      -Cr causes the generated scanner to bypass use of	 the  standard
	      I/O  library  (stdio)  for input.	 Instead of calling fread() or
	      getc(), the scanner will use the read() system  call,  resulting
	      in a performance gain which varies from system to	system,	but in
	      general is probably negligible unless you	are also using -Cf  or
	      -CF.   Using -Cr can cause strange behavior if, for example, you
	      read from	yyin using stdio prior to calling the scanner (because
	      the  scanner will	miss whatever text your	previous reads left in
	      the stdio	input buffer).

	      -Cr has no effect	if you	define	YY_INPUT  (see	The  Generated
	      Scanner above).

	      A	lone -C	specifies that the scanner tables should be compressed
	      but neither equivalence  classes	nor  meta-equivalence  classes
	      should be	used.

	      The  options  -Cf	 or  -CF  and -Cm do not make sense together -
	      there is no opportunity for meta-equivalence classes if the  ta-
	      ble  is  not  being  compressed.	 Otherwise  the	options	may be
	      freely mixed, and	are cumulative.

	      The default setting is -Cem, which specifies that	reflex	should
	      generate equivalence classes and meta-equivalence	classes.  This
	      setting provides the highest degree of table  compression.   You
	      can  trade  off  faster-executing	scanners at the	cost of	larger
	      tables with the following	generally being	true:

		  slowest & smallest
		  fastest & largest

	      Note that	scanners with the smallest tables are  usually	gener-
	      ated  and	 compiled the quickest,	so during development you will
	      usually want to use the default, maximal compression.

	      -Cfe is often a good compromise between speed and	size for  pro-
	      duction scanners.

	      directs  reflex  to write	the scanner to the file	output instead
	      of lex.yy.c.  If you combine -o with the	-t  option,  then  the
	      scanner  is  written to stdout but its #line directives (see the
	      -L option	above) refer to	the file output.

	      changes the default yy prefix used by reflex for	all  globally-
	      visible  variable	 and function names to instead be prefix.  For
	      example, -Pfoo changes the name of yytext	to footext.   It  also
	      changes  the  name  of  the default output file from lex.yy.c to  Here are all of the names affected:


	      (If  you	are  using  a  C++  scanner,  then  only  yywrap   and
	      yyFlexLexer  are affected.)  Within your scanner itself, you can
	      still refer to the global	variables and functions	 using	either
	      version  of  their  name;	but externally,	they have the modified

	      This option lets you easily link together	multiple  reflex  pro-
	      grams  into  the same executable.	 Note, though, that using this
	      option also renames yywrap(), so you  now	 must  either  provide
	      your  own	 (appropriately-named) version of the routine for your
	      scanner, or use %option noyywrap,	 as  linking  with  -lrefl  no
	      longer provides one for you by default.

	      overrides	the default skeleton file from which reflex constructs
	      its scanners.  You'll never need this option unless you are  do-
	      ing reflex maintenance or	development.

       reflex  also  provides  a  mechanism for	controlling options within the
       scanner specification itself, rather than from the reflex command-line.
       This  is	 done  by including %option directives in the first section of
       the scanner specification.  You can specify  multiple  options  with  a
       single  %option directive, and multiple directives in the first section
       of your reflex input file.

       Most options are	given simply as	names, optionally preceded by the word
       "no"  (with no intervening whitespace) to negate	their meaning.	A num-
       ber are equivalent to reflex flags or their negation:

	   7bit		   -7 option
	   8bit		   -8 option
	   align	   -Ca option
	   backup	   -b option
	   batch	   -B option
	   c++		   -+ option

	   caseful or
	   case-sensitive  opposite of -i (default)

	   case-insensitive or
	   caseless	   -i option

	   debug	   -d option
	   default	   opposite of -s option
	   ecs		   -Ce option
	   fast		   -F option
	   full		   -f option
	   interactive	   -I option
	   lex-compat	   -l option
	   meta-ecs	   -Cm option
	   perf-report	   -p option
	   read		   -Cr option
	   stdout	   -t option
	   verbose	   -v option
	   warn		   opposite of -w option
			   (use	"%option nowarn" for -w)

	   array	   equivalent to "%array"
	   pointer	   equivalent to "%pointer" (default)

       Some %option's provide features otherwise not available:

	      instructs	reflex to generate a scanner  which  always  considers
	      its  input  "interactive".  Normally, on each new	input file the
	      scanner calls isatty() in	an attempt to  determine  whether  the
	      scanner's	 input source is interactive and thus should be	read a
	      character	at a time.  When this option is	used, however, then no
	      such call	is made.

       main   directs reflex to	provide	a default main() program for the scan-
	      ner, which simply	calls yylex().	This option  implies  noyywrap
	      (see below).

	      instructs	reflex to generate a scanner which never considers its
	      input "interactive" (again, no call made to isatty()).  This  is
	      the opposite of always-interactive.

       stack  enables  the use of start	condition stacks (see Start Conditions

	      if set (i.e., %option stdinit) initializes  yyin	and  yyout  to
	      stdin  and stdout, instead of the	default	of nil.	 Some existing
	      lex programs depend on this behavior, even though	it is not com-
	      pliant  with  ANSI C, which does not require stdin and stdout to
	      be compile-time constant.

	      directs reflex to	generate a scanner that	maintains  the	number
	      of  the  current line read from its input	in the global variable
	      yylineno.	 This option is	implied	by %option lex-compat.

       yywrap if unset (i.e., %option noyywrap), makes the  scanner  not  call
	      yywrap()	upon  an end-of-file, but simply assume	that there are
	      no more files to scan (until the user points yyin	at a new  file
	      and calls	yylex()	again).

       reflex  scans your rule actions to determine whether you	use the	REJECT
       or yymore() features.  The reject and yymore options are	 available  to
       override	its decision as	to whether you use the options,	either by set-
       ting them (e.g.,	%option	reject)	to  indicate  the  feature  is	indeed
       used, or	unsetting them to indicate it actually is not used (e.g., %op-
       tion noyymore).

       Three options take string-delimited values, offset with '=':

	   %option outfile="ABC"

       is equivalent to	-oABC, and

	   %option prefix="XYZ"

       is equivalent to	-PXYZ.	Finally,

	   %option yyclass="foo"

       only applies when generating a C++ scanner ( -+	option).   It  informs
       reflex  that  you have derived foo as a subclass	of yyFlexLexer,	so re-
       flex will place your actions in the member  function  foo::yylex()  in-
       stead	 of	yyFlexLexer::yylex().	  It	also	generates    a
       yyFlexLexer::yylex() member function that emits a  run-time  error  (by
       invoking	 yyFlexLexer::LexerError())  if	 called.   See	Generating C++
       Scanners, below,	for additional information.

       A number	of options are available for lint purists who want to suppress
       the  appearance of unneeded routines in the generated scanner.  Each of
       the following, if unset (e.g., %option nounput ), results in the	corre-
       sponding	routine	not appearing in the generated scanner:

	   input, unput
	   yy_push_state, yy_pop_state,	yy_top_state
	   yy_scan_buffer, yy_scan_bytes, yy_scan_string

       (though	yy_push_state()	and friends won't appear anyway	unless you use
       %option stack).

       The main	design goal of reflex is  that	it  generate  high-performance
       scanners.   It  has  been optimized for dealing well with large sets of
       rules.  Aside from the effects on scanner speed of the  table  compres-
       sion  -C	 options outlined above, there are a number of options/actions
       which degrade performance.  These are, from most	expensive to least:

	   %option yylineno
	   arbitrary trailing context

	   pattern sets	that require backing up
	   %option interactive
	   %option always-interactive

	   '^' beginning-of-line operator

       with the	first three all	being quite expensive and the last  two	 being
       quite  cheap.   Note also that unput() is implemented as	a routine call
       that potentially	does quite a bit of work, while	yyless() is  a	quite-
       cheap  macro; so	if just	putting	back some excess text you scanned, use

       REJECT should be	avoided	at all costs when  performance	is  important.
       It is a particularly expensive option.

       Getting	rid of backing up is messy and often may be an enormous	amount
       of work for a complicated scanner.  In principal, one begins  by	 using
       the -b flag to generate a lex.backup file.  For example,	on the input

	   foo	      return TOK_KEYWORD;
	   foobar     return TOK_KEYWORD;

       the file	looks like:

	   State #6 is non-accepting -
	    associated rule line numbers:
		  2	  3
	    out-transitions: [ o ]
	    jam-transitions: EOF [ \001-n  p-\177 ]

	   State #8 is non-accepting -
	    associated rule line numbers:
	    out-transitions: [ a ]
	    jam-transitions: EOF [ \001-`  b-\177 ]

	   State #9 is non-accepting -
	    associated rule line numbers:
	    out-transitions: [ r ]
	    jam-transitions: EOF [ \001-q  s-\177 ]

	   Compressed tables always back up.

       The  first  few	lines tell us that there's a scanner state in which it
       can make	a transition on	an 'o' but not on  any	other  character,  and
       that  in	that state the currently scanned text does not match any rule.
       The state occurs	when trying to match the rules found at	lines 2	and  3
       in  the	input  file.   If  the scanner is in that state	and then reads
       something other than an 'o', it will have to back up  to	 find  a  rule
       which  is  matched.  With a bit of headscratching one can see that this
       must be the state it's in when it has seen "fo".	 When  this  has  hap-
       pened,  if  anything  other  than another 'o' is	seen, the scanner will
       have to back up to simply match the 'f' (by the default rule).

       The comment regarding State #8 indicates	there's	a problem when	"foob"
       has  been  scanned.   Indeed,  on  any character	other than an 'a', the
       scanner will have to back up to accept "foo".  Similarly,  the  comment
       for State #9 concerns when "fooba" has been scanned and an 'r' does not

       The final comment reminds us that there's no point  going  to  all  the
       trouble of removing backing up from the rules unless we're using	-Cf or
       -CF, since there's no performance gain doing so with  compressed	 scan-

       The way to remove the backing up	is to add "error" rules:

	   foo	       return TOK_KEYWORD;
	   foobar      return TOK_KEYWORD;

	   fooba       |
	   foob	       |
	   fo	       {
		       /* false	alarm, not really a keyword */
		       return TOK_ID;

       Eliminating  backing up among a list of keywords	can also be done using
       a "catch-all" rule:

	   foo	       return TOK_KEYWORD;
	   foobar      return TOK_KEYWORD;

	   [a-z]+      return TOK_ID;

       This is usually the best	solution when appropriate.

       Backing up messages tend	to cascade.  With a complicated	set  of	 rules
       it's  not  uncommon  to	get hundreds of	messages.  If one can decipher
       them, though, it	often only takes a dozen or so rules to	eliminate  the
       backing	up  (though it's easy to make a	mistake	and have an error rule
       accidentally match a valid token.  A  possible  future  reflex  feature
       will be to automatically	add rules to eliminate backing up).

       It's  important to keep in mind that you	gain the benefits of eliminat-
       ing backing up only if you eliminate  every  instance  of  backing  up.
       Leaving just one	means you gain nothing.

       Variable	trailing context (where	both the leading and trailing parts do
       not have	a fixed	length)	entails	almost the same	 performance  loss  as
       REJECT (i.e., substantial).  So when possible a rule like:

	   mouse|rat/(cat|dog)	 run();

       is better written:

	   mouse/cat|dog	 run();
	   rat/cat|dog		 run();

       or as

	   mouse|rat/cat	 run();
	   mouse|rat/dog	 run();

       Note that here the special '|' action does not provide any savings, and
       can even	make things worse (see Deficiencies / Bugs below).

       Another area where the user can increase	a scanner's  performance  (and
       one  that's  easier  to implement) arises from the fact that the	longer
       the tokens matched, the faster the scanner will run.  This  is  because
       with long tokens	the processing of most input characters	takes place in
       the (short) inner scanning loop,	and does not often have	to go  through
       the  additional	work of	setting	up the scanning	environment (e.g., yy-
       text) for the action.  Recall the scanner for C comments:

	   %x comment
		   int line_num	= 1;

	   "/*"		BEGIN(comment);

	   <comment>\n		   ++line_num;
	   <comment>"*"+"/"	   BEGIN(INITIAL);

       This could be sped up by	writing	it as:

	   %x comment
		   int line_num	= 1;

	   "/*"		BEGIN(comment);

	   <comment>[^*\n]*\n	   ++line_num;
	   <comment>"*"+[^*/\n]*\n ++line_num;
	   <comment>"*"+"/"	   BEGIN(INITIAL);

       Now instead of each newline requiring the processing of another action,
       recognizing  the	newlines is "distributed" over the other rules to keep
       the matched text	as long	as possible.  Note that	adding rules does  not
       slow  down the scanner!	The speed of the scanner is independent	of the
       number of rules or (modulo the considerations given at the beginning of
       this  section)  how  complicated	the rules are with regard to operators
       such as '*' and '|'.

       A final example in speeding up a	scanner:  suppose  you	want  to  scan
       through	a  file	 containing identifiers	and keywords, one per line and
       with no other extraneous	characters, and	recognize all the keywords.  A
       natural first approach is:

	   asm	    |
	   auto	    |
	   break    |
	   ... etc ...
	   volatile |
	   while    /* it's a keyword */

	   .|\n	    /* it's not	a keyword */

       To eliminate the	back-tracking, introduce a catch-all rule:

	   asm	    |
	   auto	    |
	   break    |
	   ... etc ...
	   volatile |
	   while    /* it's a keyword */

	   [a-z]+   |
	   .|\n	    /* it's not	a keyword */

       Now, if it's guaranteed that there's exactly one	word per line, then we
       can reduce the total number of matches by a  half  by  merging  in  the
       recognition of newlines with that of the	other tokens:

	   asm\n    |
	   auto\n   |
	   break\n  |
	   ... etc ...
	   volatile\n |
	   while\n  /* it's a keyword */

	   [a-z]+\n |
	   .|\n	    /* it's not	a keyword */

       One has to be careful here, as we have now reintroduced backing up into
       the scanner.  In	particular, while we know that there will never	be any
       characters  in  the input stream	other than letters or newlines,	reflex
       can't figure this out, and it will plan for possibly needing to back up
       when  it	has scanned a token like "auto"	and then the next character is
       something other than a newline or a letter.  Previously it  would  then
       just  match the "auto" rule and be done,	but now	it has no "auto" rule,
       only a "auto\n" rule.  To eliminate the possibility of backing  up,  we
       could  either duplicate all rules but without final newlines, or, since
       we never	expect to encounter such an input and therefore	don't how it's
       classified,  we	can  introduce one more	catch-all rule,	this one which
       doesn't include a newline:

	   asm\n    |
	   auto\n   |
	   break\n  |
	   ... etc ...
	   volatile\n |
	   while\n  /* it's a keyword */

	   [a-z]+\n |
	   [a-z]+   |
	   .|\n	    /* it's not	a keyword */

       Compiled	with -Cf, this is about	as fast	as one can get a reflex	 scan-
       ner to go for this particular problem.

       A  final	 note: reflex is slow when matching NUL's, particularly	when a
       token contains multiple NUL's.  It's best to write  rules  which	 match
       short  amounts of text if it's anticipated that the text	will often in-
       clude NUL's.

       Another final note regarding performance: as  mentioned	above  in  the
       section How the Input is	Matched, dynamically resizing yytext to	accom-
       modate huge tokens is a slow process because it presently requires that
       the  (huge) token be rescanned from the beginning.  Thus	if performance
       is vital, you should attempt to match "large" quantities	 of  text  but
       not  "huge" quantities, where the cutoff	between	the two	is at about 8K

       reflex provides two different ways to generate scanners	for  use  with
       C++.   The first	way is to simply compile a scanner generated by	reflex
       using a C++ compiler instead of a C compiler.  You should not encounter
       any  compilations  errors  (please report any you find to the email ad-
       dress given in the Author section below).  You can then use C++ code in
       your  rule  actions  instead  of	 C  code.  Note	that the default input
       source for your scanner remains yyin, and default echoing is still done
       to yyout.  Both of these	remain FILE * variables	and not	C++ streams.

       You  can	 also use reflex to generate a C++ scanner class, using	the -+
       option (or, equivalently, %option c++), which is	 automatically	speci-
       fied  if	 the  name of the reflex executable ends in a '+', such	as re-
       flex++.	When using this	option,	 reflex	 defaults  to  generating  the
       scanner to the file instead of	lex.yy.c.  The generated scan-
       ner includes the	header file reFlexLexer.h, which defines the interface
       to two C++ classes.

       The  first  class,  FlexLexer, provides an abstract base	class defining
       the general scanner class interface.  It	provides the following	member

       const char* YYText()
	      returns the text of the most recently matched token, the equiva-
	      lent of yytext.

       int YYLeng()
	      returns the length of  the  most	recently  matched  token,  the
	      equivalent of yyleng.

       int lineno() const
	      returns the current input	line number (see %option yylineno), or
	      1	if %option yylineno was	not used.

       void set_debug( int flag	)
	      sets the debugging flag for the scanner, equivalent to assigning
	      to yy_flex_debug (see the	Options	section	above).	 Note that you
	      must build the scanner using %option debug to include  debugging
	      information in it.

       int debug() const
	      returns the current setting of the debugging flag.

       Also provided are member	functions equivalent to	yy_switch_to_buffer(),
       yy_create_buffer() (though the first argument  is  an  istream*	object
       pointer	and  not  a FILE*), yy_flush_buffer(), yy_delete_buffer(), and
       yyrestart() (again, the first argument is a istream* object pointer).

       The second class	defined	in reFlexLexer.h is yyFlexLexer, which is  de-
       rived from FlexLexer.  It defines the following additional member func-

       yyFlexLexer( istream* arg_yyin =	0, ostream* arg_yyout =	0 )
	      constructs a yyFlexLexer object using the	given streams for  in-
	      put  and	output.	  If not specified, the	streams	default	to cin
	      and cout,	respectively.

       virtual int yylex()
	      performs the same	role is	yylex()	does for ordinary reflex scan-
	      ners:  it	 scans	the  input  stream,  consuming tokens, until a
	      rule's action returns a value.  If you derive a subclass S  from
	      yyFlexLexer  and	want  to access	the member functions and vari-
	      ables of S inside	yylex(), then you  need	 to  use  %option  yy-
	      class="S"	 to inform reflex that you will	be using that subclass
	      instead of yyFlexLexer.  In this case,  rather  than  generating
	      yyFlexLexer::yylex(), reflex generates S::yylex()	(and also gen-
	      erates a dummy yyFlexLexer::yylex() that calls yyFlexLexer::Lex-
	      erError()	if called).

       virtual void switch_streams(istream* new_in = 0,
	      ostream*	new_out	= 0) reassigns yyin to new_in (if non-nil) and
	      yyout to new_out (ditto),	deleting the previous input buffer  if
	      yyin is reassigned.

       int yylex( istream* new_in, ostream* new_out = 0	)
	      first  switches  the  input  streams via switch_streams( new_in,
	      new_out )	and then returns the value of yylex().

       In addition, yyFlexLexer	defines	the following protected	virtual	 func-
       tions which you can redefine in derived classes to tailor the scanner:

       virtual int LexerInput( char* buf, int max_size )
	      reads  up	to max_size characters into buf	and returns the	number
	      of characters read.  To indicate end-of-input, return 0  charac-
	      ters.   Note  that  "interactive"	 scanners  (see	 the -B	and -I
	      flags) define the	macro YY_INTERACTIVE.  If  you	redefine  Lex-
	      erInput()	 and  need  to	take  different	 actions  depending on
	      whether or not the scanner might be scanning an interactive  in-
	      put  source,  you	 can  test  for	 the presence of this name via

       virtual void LexerOutput( const char* buf, int size )
	      writes out size characters from the  buffer  buf,	 which,	 while
	      NUL-terminated,  may  also contain "internal" NUL's if the scan-
	      ner's rules can match text with NUL's in them.

       virtual void LexerError(	const char* msg	)
	      reports a	fatal error message.   The  default  version  of  this
	      function writes the message to the stream	cerr and exits.

       Note  that  a  yyFlexLexer  object  contains its	entire scanning	state.
       Thus you	can use	such objects to	create reentrant  scanners.   You  can
       instantiate  multiple  instances	of the same yyFlexLexer	class, and you
       can also	combine	multiple C++ scanner classes together in the same pro-
       gram using the -P option	discussed above.

       Finally,	 note  that the	%array feature is not available	to C++ scanner
       classes;	you must use %pointer (the default).

       Here is an example of a simple C++ scanner:

	       // An example of	using the reflex C++ scanner class.

	   int mylineno	= 0;

	   string  \"[^\n"]+\"

	   ws	   [ \t]+

	   alpha   [A-Za-z]
	   dig	   [0-9]
	   name	   ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
	   num1	   [-+]?{dig}+\.?([eE][-+]?{dig}+)?
	   num2	   [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
	   number  {num1}|{num2}


	   {ws}	   /* skip blanks and tabs */

	   "/*"	   {
		   int c;

		   while((c = yyinput()) != 0)
		       if(c == '\n')

		       else if(c == '*')
			   if((c = yyinput()) == '/')

	   {number}  cout << "number " << YYText() << '\n';

	   \n	     mylineno++;

	   {name}    cout << "name " <<	YYText() << '\n';

	   {string}  cout << "string " << YYText() << '\n';


	   int main( int /* argc */, char** /* argv */ )
	       FlexLexer* lexer	= new yyFlexLexer;
	       while(lexer->yylex() != 0)
	       return 0;
       If you want to create multiple (different) lexer	classes, you  use  the
       -P  flag	 (or  the  prefix=  option) to rename each yyFlexLexer to some
       other xxFlexLexer.  You then can	include	<reFlexLexer.h>	in your	 other
       sources once per	lexer class, first renaming yyFlexLexer	as follows:

	   #undef yyFlexLexer
	   #define yyFlexLexer xxFlexLexer
	   #include <reFlexLexer.h>

	   #undef yyFlexLexer
	   #define yyFlexLexer zzFlexLexer
	   #include <reFlexLexer.h>

       if,  for	example, you used %option prefix="xx" for one of your scanners
       and %option prefix="zz" for the other.

       IMPORTANT: the present form of the scanning class is  experimental  and
       may change considerably between major releases.

       reflex  is a rewrite of the AT&T	Unix lex tool (the two implementations
       do not share any	code, though), with some extensions and	incompatibili-
       ties,  both of which are	of concern to those who	wish to	write scanners
       acceptable to either implementation.  Reflex is	fully  compliant  with
       the  POSIX  lex specification, except that when using %pointer (the de-
       fault), a call to unput() destroys the contents	of  yytext,  which  is
       counter to the POSIX specification.

       In  this	 section  we discuss all of the	known areas of incompatibility
       between reflex, AT&T lex, and the POSIX specification.

       reflex's	-l option turns	on maximum  compatibility  with	 the  original
       AT&T  lex  implementation, at the cost of a major loss in the generated
       scanner's performance.  We note below which  incompatibilities  can  be
       overcome	using the -l option.

       reflex is fully compatible with lex with	the following exceptions:

       o   The undocumented lex	scanner	internal variable yylineno is not sup-
	   ported unless -l or %option yylineno	is used.

	   yylineno should be maintained on a per-buffer basis,	rather than  a
	   per-scanner (single global variable)	basis.

	   yylineno is not part	of the POSIX specification.

       o   The	input()	routine	is not redefinable, though it may be called to
	   read	characters following whatever has been matched by a rule.   If
	   input() encounters an end-of-file the normal	yywrap() processing is
	   done.  A "real" end-of-file is returned by input() as EOF.

	   Input is instead controlled by defining the YY_INPUT	macro.

	   The reflex restriction that input() cannot be redefined is  in  ac-
	   cordance  with the POSIX specification, which simply	does not spec-
	   ify any way of controlling the scanner's input other	than by	making
	   an initial assignment to yyin.

       o   The unput() routine is not redefinable.  This restriction is	in ac-
	   cordance with POSIX.

       o   reflex scanners are not as reentrant	as lex scanners.  In  particu-
	   lar,	 if  you  have an interactive scanner and an interrupt handler
	   which long-jumps out	of the scanner,	 and  the  scanner  is	subse-
	   quently called again, you may get the following message:

	       fatal reflex scanner internal error--end	of buffer missed

	   To reenter the scanner, first use

	       yyrestart( yyin );

	   Note	 that  this  call  will	throw away any buffered	input; usually
	   this	isn't a	problem	with an	interactive scanner.

	   Also	note that reflex C++ scanner classes are reentrant, so if  us-
	   ing	C++  is	 an  option for	you, you should	use them instead.  See
	   "Generating C++ Scanners" above for details.

       o   output() is not supported.  Output from the ECHO macro is  done  to
	   the file-pointer yyout (default stdout).

	   output() is not part	of the POSIX specification.

       o   lex	does  not support exclusive start conditions (%x), though they
	   are in the POSIX specification.

       o   When	definitions are	expanded, reflex encloses them in parentheses.
	   With	lex, the following:

	       NAME    [A-Z][A-Z0-9]*
	       foo{NAME}?      printf( "Found it\n" );

	   will	 not match the string "foo" because when the macro is expanded
	   the rule is equivalent to "foo[A-Z][A-Z0-9]*?"  and the  precedence
	   is  such that the '?' is associated with "[A-Z0-9]*".  With reflex,
	   the rule will be expanded  to  "foo([A-Z][A-Z0-9]*)?"  and  so  the
	   string "foo"	will match.

	   Note	that if	the definition begins with ^ or	ends with $ then it is
	   not expanded	with parentheses, to allow these operators  to	appear
	   in definitions without losing their special meanings.  But the <s>,
	   /, and <<EOF>> operators cannot be used in a	reflex definition.

	   Using -l results in the lex behavior	of no parentheses  around  the

	   The	POSIX  specification  is  that	the  definition	be enclosed in

       o   Some	implementations	of lex allow a rule's action  to  begin	 on  a
	   separate line, if the rule's	pattern	has trailing whitespace:

	       foo|bar<space here>
		 { foobar_action(); }

	   reflex does not support this	feature.

       o   The lex %r (generate	a Ratfor scanner) option is not	supported.  It
	   is not part of the POSIX specification.

       o   After a call	to unput(), yytext is undefined	until the  next	 token
	   is matched, unless the scanner was built using %array.  This	is not
	   the case with lex or	the POSIX specification.  The -l  option  does
	   away	with this incompatibility.

       o   The	precedence  of	the  {}	(numeric range)	operator is different.
	   lex interprets "abc{1,3}" as	"match one, two, or three  occurrences
	   of  'abc'", whereas reflex interprets it as "match 'ab' followed by
	   one,	two, or	three occurrences of 'c'".  The	latter is in agreement
	   with	the POSIX specification.

       o   The	precedence  of	the  ^	operator is different.	lex interprets
	   "^foo|bar" as "match	either 'foo' at	the beginning of  a  line,  or
	   'bar'  anywhere",  whereas  reflex  interprets  it as "match	either
	   'foo' or 'bar' if they come at the beginning	of a line".  The  lat-
	   ter is in agreement with the	POSIX specification.

       o   The special table-size declarations such as %a supported by lex are
	   not required	by reflex scanners; reflex ignores them.

       o   The name FLEX_SCANNER is #define'd so scanners may be  written  for
	   use	with  either reflex or lex.  Scanners also include YY_FLEX_MA-
	   JOR_VERSION and YY_FLEX_MINOR_VERSION indicating which  version  of
	   reflex  generated  the  scanner  (for example, for the 2.5 release,
	   these defines would be 2 and	5 respectively).

       The following reflex features are not included  in  lex	or  the	 POSIX

	   C++ scanners
	   start condition scopes
	   start condition stacks
	   interactive/non-interactive scanners
	   yy_scan_string() and	friends
	   #line directives
	   %{}'s around	actions
	   multiple actions on a line

       plus  almost  all  of  the  reflex flags.  The last feature in the list
       refers to the fact that with reflex you can put multiple	actions	on the
       same line, separated with semi-colons, while with lex, the following

	   foo	  handle_foo();	++num_foos_seen;

       is (rather surprisingly)	truncated to

	   foo	  handle_foo();

       reflex  does not	truncate the action.  Actions that are not enclosed in
       braces are simply terminated at the end of the line.

       warning,	rule cannot be matched indicates that the given	rule cannot be
       matched	because	it follows other rules that will always	match the same
       text as it.  For	example, in the	following "foo"	cannot be matched  be-
       cause it	comes after an identifier "catch-all" rule:

	   [a-z]+    got_identifier();
	   foo	     got_foo();

       Using REJECT in a scanner suppresses this warning.

       warning,	 -s option given but default rule can be matched means that it
       is possible (perhaps only in a particular start condition) that the de-
       fault rule (match any single character) is the only one that will match
       a particular input.  Since -s was given,	presumably  this  is  not  in-

       reject_used_but_not_detected  undefined or yymore_used_but_not_detected
       undefined - These errors	can occur at compile time.  They indicate that
       the  scanner  uses  REJECT or yymore() but that reflex failed to	notice
       the fact, meaning that reflex scanned the first	two  sections  looking
       for  occurrences	 of  these actions and failed to find any, but somehow
       you snuck some in (via a	#include file, for example).  Use %option  re-
       ject  or	 %option  yymore  to indicate to reflex	that you really	do use
       these features.

       reflex scanner jammed - a scanner compiled with -s has  encountered  an
       input  string which wasn't matched by any of its	rules.	This error can
       also occur due to internal problems.

       token too large,	exceeds	YYLMAX - your scanner uses %array and  one  of
       its rules matched a string longer than the YYLMAX constant (8K bytes by
       default).  You can increase the value by	#define'ing YYLMAX in the def-
       initions	section	of your	reflex input.

       scanner requires	-8 flag	to use the character 'x' - Your	scanner	speci-
       fication	includes recognizing the 8-bit character 'x' and you  did  not
       specify	the  -8	 flag, and your	scanner	defaulted to 7-bit because you
       used the	-Cf or -CF table compression options.  See the	discussion  of
       the -7 flag for details.

       reflex  scanner	push-back  overflow - you used unput() to push back so
       much text that the scanner's buffer could not hold both the pushed-back
       text  and  the current token in yytext.	Ideally	the scanner should dy-
       namically resize	the buffer in this case, but at	present	it does	not.

       input buffer overflow, can't enlarge buffer because scanner uses	REJECT
       -  the  scanner	was  working  on matching an extremely large token and
       needed to expand	the input buffer.  This	 doesn't  work	with  scanners
       that use	REJECT.

       fatal  reflex  scanner  internal	error--end of buffer missed - This can
       occur in	an scanner which is reentered after a long-jump	has jumped out
       (or  over) the scanner's	activation frame.  Before reentering the scan-
       ner, use:

	   yyrestart( yyin );

       or, as noted above, switch to using the C++ scanner class.

       too many	start conditions in __ construct! - you	listed more start con-
       ditions	in a <>	construct than exist (so you must have listed at least
       one of them twice).

       -lrefl library with which scanners must be linked.

	      generated	scanner	(called	lexyy.c	on some	systems).
	      generated	C++ scanner class, when	using -+.

	      header file defining the C++ scanner base	class, FlexLexer,  and
	      its derived class, yyFlexLexer.

	      skeleton	scanner.  This file is only used when building reflex,
	      not when reflex executes.

	      backing-up information for -b flag (called lex.bck on some  sys-

       Some  trailing context patterns cannot be properly matched and generate
       warning messages	("dangerous trailing context").	  These	 are  patterns
       where the ending	of the first part of the rule matches the beginning of
       the second part,	such as	"zx*/xy*", where the 'x*' matches the  'x'  at
       the  beginning  of  the	trailing  context.  (Note that the POSIX draft
       states that the text matched by such patterns is	undefined.)

       For some	trailing context rules,	parts which are	actually  fixed-length
       are  not	 recognized as such, leading to	the abovementioned performance
       loss.  In particular, parts using '|' or	{n} (such as "foo{3}") are al-
       ways considered variable-length.

       Combining  trailing  context  with the special '|' action can result in
       fixed trailing context being turned into	the  more  expensive  variable
       trailing	context.  For example, in the following:

	   abc	    |

       Use  of unput() invalidates yytext and yyleng, unless the %array	direc-
       tive or the -l option has been used.

       Pattern-matching	of NUL's is substantially slower than  matching	 other

       Dynamic	resizing of the	input buffer is	slow, as it entails rescanning
       all the text matched so far by the current (generally huge) token.

       Due to both buffering of	input  and  read-ahead,	 you  cannot  intermix
       calls  to <stdio.h> routines, such as, for example, getchar(), with re-
       flex rules and expect it	to work.  Call input() instead.

       The total table entries listed by the -v	flag excludes  the  number  of
       table entries needed to determine what rule has been matched.  The num-
       ber of entries is equal to the number of	DFA states if the scanner does
       not  use	 REJECT,  and somewhat greater than the	number of states if it

       REJECT cannot be	used with the -f or -F options.

       The reflex internal algorithms need documentation.

       lex(1), yacc(1),	sed(1),	awk(1).

       John Levine, Tony Mason,	and Doug Brown,	Lex _ Yacc, O'Reilly and Asso-
       ciates.	Be sure	to get the 2nd edition.

       M. E. Lesk and E. Schmidt, LEX -	Lexical	Analyzer Generator

       Alfred Aho, Ravi	Sethi and Jeffrey Ullman, Compilers: Principles, Tech-
       niques and Tools, Addison-Wesley	(1986).	 Describes the	pattern-match-
       ing techniques used by reflex (deterministic finite automata).

       Vern  Paxson, with the help of many ideas and much inspiration from Van
       Jacobson.  Original version by Jef Poskanzer.  The fast table represen-
       tation  is  a  partial implementation of	a design done by Van Jacobson.
       The implementation was done by Kevin Gong and Vern Paxson.

       Thanks to the many reflex beta-testers, feedbackers, and	 contributors,
       especially Francois Pinard, Casey Leedom, Robert	Abramovitz, Stan Ader-
       mann, Terry Allen, David	Barker-Plummer,	John Basrai, Neal Becker, Nel-
       son H.F.	Beebe,, Karl Berry, Peter A. Bigot, Simon Blan-
       chard, Keith Bostic, Frederic  Brehm,  Ian  Brockbank,  Kin  Cho,  Nick
       Christopher,  Brian  Clapper,  J.T.  Conklin, Jason Coughlin, Bill Cox,
       Nick Cropper, Dave Curtis, Scott	David  Daniels,	 Chris	G.  Demetriou,
       Theo  Deraadt,  Mike  Donahue,  Chuck Doucette, Tom Epperly, Leo	Eskin,
       Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,  Joe	Gayda,
       Kaveh R.	Ghazi, Wolfgang	Glunz, Eric Goldman, Christopher M. Gould, Ul-
       rich Grepel, Peer Griebel, Jan Hajic,  Charles  Hemphill,  NORO	Hideo,
       Jarkko  Hietaniemi, Scott Hofmann, Jeff Honig, Dana Hudes, Eric Hughes,
       John Interrante,	Ceriel Jacobs, Michal  Jaegermann,  Sakari  Jalovaara,
       Jeffrey R. Jones, Henry Juengst,	Klaus Kaempf, Jonathan I. Kamens, Ter-
       rence O Kane, Amir  Katz,,  Kevin  B.	 Kenny,	 Steve
       Kirsch,	Winfried  Koenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan
       Lenard, Craig Leres, John Levine, Steve Liddle,	David  Loffredo,  Mike
       Long,  Mohamed  el  Lozy,  Brian	 Madsen,  Malte,  Joe  Marshall, Bengt
       Martensson, Chris Metcalf, Luke Mewburn,	 Jim  Meyering,	 R.  Alexander
       Milowski,  Erik	Naggum,	 G.T.  Nicol,  Landon Noll, James Nordby, Marc
       Nozell, Richard Ohnemus,	Karsten	Pahnke,	Sven Panne, Roland Pesch, Wal-
       ter  Pelissero, Gaumond Pierre, Esmond Pitt, Jef	Poskanzer, Joe Rahmeh,
       Jarmo Raiha, Frederic Raimbault,	Pat  Rankin,  Rick  Richardson,	 Kevin
       Rodgers,	Kai Uwe	Rommel,	Jim Roskind, Alberto Santini, Andreas Scherer,
       Darrell Schiebel, Raf Schietekat, Doug Schmidt,	Philippe  Schnoebelen,
       Andreas	Schwab,	Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik
       Strvmquist, Mike	Stump, Paul Stuart, Dave Tallman,  Ian	Lance  Taylor,
       Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul Tuinenga, Gary Weik,
       Frank Whaley, Gerhard Wilhelms, Kent Williams,  Ken  Yap,  Ron  Zellar,
       Nathan  Zelle,  David  Zuhn, and	those whose names have slipped my mar-
       ginal mail-archiving skills but whose contributions are appreciated all
       the same.

       Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John	Gilmore, Craig
       Leres, John Levine, Bob Mulcahy,	G.T.   Nicol,  Francois	 Pinard,  Rich
       Salz,   and   Richard  Stallman	for  help  with	 various  distribution

       Thanks to Esmond	Pitt and Earle Horton for 8-bit	character support;  to
       Benson  Margulies  and Fred Burke for C++ support; to Kent Williams and
       Tom Epperly for C++ class support; to Ove Ewerlid for support of	NUL's;
       and to Eric Hughes for support of multiple buffers.

       This  work  was	primarily  done	 when I	was with the Real Time Systems
       Group at	the Lawrence Berkeley Laboratory in Berkeley, CA.  Many	thanks
       to all there for	the support I received.

       Send comments to

Version	2.5			  April	1995			     REFLEX(1)


Want to link to this manual page? Use this URL:

home | help