Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
regexp(5)		      File Formats Manual		     regexp(5)

NAME
       regexp -	regular	expression and pattern matching	notation definitions

DESCRIPTION
       A  is  a	mechanism supported by many utilities for locating and manipu-
       lating patterns in text.	 is used by shells  and	 other	utilities  for
       file  name  expansion.	This manual entry defines two forms of regular
       expressions: and	and the	one form of

BASIC REGULAR EXPRESSIONS
       Basic regular expression	(RE) notation and construction rules apply  to
       utilities  defined as using basic REs.  Any exceptions to the following
       rules are noted in the descriptions of the specific utilities that  use
       REs.

   REs Matching	a Single Character
       The  following  REs match a single character or a single	collating ele-
       ment: An	ordinary character is an RE that matches itself.  An  ordinary
       character  is  any character in the supported character set except new-
       line and	the regular expression special characters  listed  in  Special
       Characters  below.   An	ordinary  character preceded by	a backslash is
       treated as the ordinary character itself, except	when the character  is
       or  or  the  digits  through  (see  REs	Matching Multiple Characters).
       Matching	is based on the	bit pattern used for encoding  the  character;
       not  on the graphic representation of the character.  A regular expres-
       sion special character preceded by a backslash is a regular  expression
       that  matches  the  special  character  itself.	When not preceded by a
       backslash, such characters have special meaning in the specification of
       REs.   Regular  expression special characters and the contexts in which
       they have special meaning are:

	      The period, left square bracket, and backslash are special
			     except when used in a bracket expression (see  RE
			     Bracket Expression).

	      The  asterisk  is	 special except	when used in a bracket expres-
	      sion,
			     as	the first character of a  regular  expression,
			     or	as the first character following the character
			     pair (see REs Matching Multiple Characters).

	      The circumflex is	special	when used as the first character
			     of	an entire RE (see Expression Anchoring)	or  as
			     the first character of a bracket expression.

	      The dollar sign is special when used as the last character of an
	      entire RE
			     (see Expression Anchoring).

	      delimiter	     Any character used	to bound  (i.e.,  delimit)  an
			     entire RE is special for that RE.
       A  period  when	used  outside  of  a bracket expression, is an RE that
       matches any printable or	nonprintable character except newline.

   RE Bracket Expression
       A bracket expression enclosed in	square brackets	is an RE that  matches
       a  single  collating element contained in the nonempty set of collating
       elements	represented by the bracket expression.

       The following rules apply to bracket expressions:

	    A bracket expression is either a
			   or a	and consists of	one or more expressions	in any
			   order.   Expressions	 can  be:  collating elements,
			   collating symbols, noncollating characters, equiva-
			   lence  classes,  range  expressions,	 or  character
			   classes.  The right bracket loses its special mean-
			   ing	and  represents	itself in a bracket expression
			   if it occurs	first in the list (after an initial if
			   any).  Otherwise, it	terminates the bracket expres-
			   sion	(unless	it is the ending right bracket	for  a
			   valid collating symbol, equivalence class, or char-
			   acter class,	or it is the collating element	within
			   a  collating	 symbol	 or  equivalence class expres-
			   sion).  The special characters

			   (period, asterisk,  left  bracket,  and  backslash)
			   lose	their special meaning within a bracket expres-
			   sion.

			   The character sequences:

			   (left-bracket followed by a period,	equal-sign  or
			   colon)  are special inside a	bracket	expression and
			   are used to delimit collating symbols,  equivalence
			   class  expressions and character class expressions.
			   These symbols must be followed by a	valid  expres-
			   sion	and the	matching terminating or

	    A  matching	 list expression specifies a list that matches any one
	    of the
			   characters represented  in  the  list.   The	 first
			   character  in  the  list  cannot be the circumflex.
			   For example,	is an RE that matches any of or

	    A		   expression begins with a circumflex and specifies a
			   list	 that  matches any character or	collating ele-
			   ment	except newline and the characters  represented
			   in  the  list.   For	example, is an RE that matches
			   any character except	newline	or or  The  circumflex
			   has	this  special  meaning when it occurs first in
			   the list, immediately  following  the  left	square
			   bracket.

	    A		   is a	sequence of one	or more	characters that	repre-
			   sents a single element in the collating sequence as
			   identified  via the most current setting of the lo-
			   cale	variable (see setlocale(3C)).

	    A		   is a	collating element enclosed within  bracket-pe-
			   riod	delimiters.  Multicharacter collating elements
			   must	be represented as collating symbols to distin-
			   guish  them	from  single-character	collating ele-
			   ments.  For example,	if the string is a valid  col-
			   lating  element,  then  is  treated	as  an element
			   matching the	same string of	characters,  while  is
			   treated  as	a simple list of the characters	and If
			   the string within the bracket-period	delimiters  is
			   not	a  valid collating element in the current col-
			   lating sequence definition, the symbol  is  treated
			   as an invalid expression.

	    A		   is  a  character that is ignored for	collating pur-
			   poses.  By definition, such characters cannot  par-
			   ticipate  in	 equivalence  classes or range expres-
			   sions.

	    An		   expression represents the set of collating elements
			   belonging to	an equivalence class.  It is expressed
			   by enclosing	any one	of the collating  elements  in
			   the	equivalence  class within bracket-equal	delim-
			   iters.  For example,	if  and	 belong	 to  the  same
			   equivalence class, then and are each	equivalent to

	    A		   represents  the set of collating elements that fall
			   between two elements	in the current	collation  se-
			   quence  as  defined via the most current setting of
			   the locale variable (see setlocale(3C)).  It	is ex-
			   pressed  as the starting point and the ending point
			   separated by	a hyphen

			   The starting	range point and	the ending range point
			   must	 be  a collating element, collating symbol, or
			   equivalence class expression.  An equivalence class
			   expression  used as an end point of a range expres-
			   sion	is interpreted such that  all  collating  ele-
			   ments  within the equivalence class are included in
			   the range.  For example, if the collating order  is
			   and	and  the  characters  and  belong  to the same
			   equivalence class, then the expression  is  treated
			   as

			   Both	starting and ending range points must be valid
			   collating elements, collating symbols,  or  equiva-
			   lence class expressions, and	the ending range point
			   must	collate	equal to or higher than	 the  starting
			   range  point;  otherwise the	expression is invalid.
			   For example,	with the above collating order and as-
			   suming  that	is a noncollating character, then both
			   the expressions and are invalid.

			   An ending range point  can  also  be	 the  starting
			   range point in a subsequent range expression.  Each
			   such	range expression is evaluated separately.  For
			   example, the	bracket	expression is treated as

			   The hyphen character	is treated as itself if	it oc-
			   curs	first (after an	initial	if any)	or last	in the
			   list, or as the rightmost symbol in a range expres-
			   sion.  As examples, the expressions and are equiva-
			   lent	and match any of the characters	or the expres-
			   sions and are equivalent and	match  any  characters
			   except  newline,  or	 the expression	matches	any of
			   the characters in the  defined  collating  sequence
			   between  and	 inclusive; the	expression matches any
			   of the characters in	the defined collating sequence
			   between  and	 inclusive;  and the expression	is in-
			   valid, assuming precedes in the collating sequence.

			   If a	bracket	expression must	specify	both  and  the
			   must	 be  placed  first  (after the if any) and the
			   last	within the bracket expression.

	    A character	class expression represents the	set of characters  be-
	    longing
			   to  a character class, as defined via the most cur-
			   rent	setting	of the locale variable It is expressed
			   as  a character class name enclosed within bracket-
			   colon delimiters.

			   Standard character class expressions	 supported  in
			   all locales are:

				letters

				upper-case letters

				lower-case letters

				decimal	digits

				hexadecimal digits

				letters	or decimal digits

				characters  producing white-space in displayed
				text

				printing characters

				punctuation characters

				characters with	a visible representation

				control	characters

				blank characters

			   For example,	if the locale variable is set  to  the
			   expression  is  equivalent to Similarly the expres-
			   sion	is same	as

   REs Matching	Multiple Characters
       The following rules may be used	to  construct  REs  matching  multiple
       characters from REs matching a single character:

	    RERE	   The	concatenation of REs is	an RE that matches the
			   first  encountered  concatenation  of  the  strings
			   matched  by each component of the RE.  For example,
			   the RE matches the second and third	characters  of
			   the string

	    An RE matching a single character followed by an asterisk
			   is  an  RE that matches zero	or more	occurrences of
			   the RE preceding the	asterisk.  The	first  encoun-
			   tered  string  that	permits	a match	is chosen, and
			   the matched string will encompass the maximum  num-
			   ber	of  characters permitted by the	RE.  For exam-
			   ple,	in the string both  the	 RE  and  the  RE  are
			   matched  by	the  substring	in  the	second through
			   fifth positions.  An	asterisk as the	first  charac-
			   ter	of  an	RE  loses  this	special	meaning	and is
			   treated as itself.

	    A subexpression can	be defined within an RE
			   by enclosing	it between  the	 character  pairs  and
			   Such	a subexpression	matches	whatever it would have
			   matched without the and Subexpressions can be arbi-
			   trarily  nested.  An	asterisk immediately following
			   the loses its special meaning and is	treated	as it-
			   self.   An  asterisk	 immediately  following	the is
			   treated as an invalid character.

	    The	expression matches  the	 same  string  of  characters  as  was
			   matched  by	a  subexpression  enclosed between and
			   preceding the The character n must be a digit  from
			   through  specifying the n-th	subexpression (the one
			   that	begins with the	n-th and ends with the	corre-
			   sponding paired For example,	the expression matches
			   a line consisting of	two  adjacent  appearances  of
			   the same string.

			   If  the is followed by an asterisk, it matches zero
			   or more occurrences of the  subexpression  referred
			   to.	For example, the expression matches the	string

	    An RE matching a single character followed by
			   or  is  an  RE that matches repeated	occurrences of
			   the RE.  The	values of m and	n must be decimal  in-
			   tegers  in the range	0 through 255, with m specify-
			   ing the exact or minimum number of occurrences  and
			   n  specifying  the  maximum	number of occurrences.
			   matches exactly m occurrences of the	preceding  RE,
			   matches  at	least  m  occurrences, and matches any
			   number of occurrences between m and n, inclusive.

			   The first encountered string	that matches  the  ex-
			   pression  is	chosen;	it will	contain	as many	occur-
			   rences of the RE as possible.  For example, in  the
			   string  the RE is matched by	characters two through
			   four, the RE	is matched by characters  two  through
			   eight,  and	the  RE	 is matched by characters four
			   through nine.

   Expression Anchoring
       An RE can be limited to matching	strings	 that  begin  or  end  a  line
       (i.e., anchored)	according to the following rules:

	    o  A  circumflex  as  the first character of an RE anchors the ex-
	       pression	to the beginning of a line; only strings  starting  at
	       the first character of a	line are matched by the	RE.  For exam-
	       ple, the	RE matches the string in the line  but	not  the  same
	       string in the line

	    o  A  dollar  sign	as the last character of an RE anchors the ex-
	       pression	to the end of a	line; only strings ending at the  last
	       character of a line are matched by the RE.  For example,	the RE
	       matches the string in the line but not the same string  in  the
	       line

	    o  An RE anchored by both and matches only strings that are	lines.
	       For example, the	RE matches only	lines consisting of the	string

       The use of duplication characters (+,*) following anchors is illegal.

EXTENDED REGULAR EXPRESSIONS
       The extended regular expression (ERE) notation and  construction	 rules
       apply  to  utilities  defined as	using extended REs.  Any exceptions to
       the following rules are noted in	the descriptions of the	specific util-
       ities using EREs.

   EREs	Matching a Single Character
       The  following EREs match a single character or a single	collating ele-
       ment: An	ordinary character is an ERE that matches itself.  An ordinary
       character  is  any character in the supported character set except new-
       line and	the regular expression special characters  listed  in  Special
       Characters  below.   An	ordinary  character preceded by	a backslash is
       treated as the ordinary character itself.  Matching is based on the bit
       pattern used for	encoding the character,	not on the graphic representa-
       tion of the character.  A regular expression special character preceded
       by a backslash is a regular expression that matches the special charac-
       ter itself.  When not preceded by a  backslash,	such  characters  have
       special meaning in the specification of EREs.  The extended regular ex-
       pression	special	characters and the contexts in which they  have	 their
       special meaning are:

	    The	period,	left square bracket, backslash,	left parenthesis,
			     right  parenthesis, asterisk, plus	sign, question
			     mark, dollar sign,	and vertical bar  are  special
			     except when used in a bracket expression (see ERE
			     Bracket Expression).

	    The	circumflex is special except when used
			     in	a bracket expression in	 a  non-leading	 posi-
			     tion.

	    delimiter	     Any  character  used  to bound (i.e., delimit) an
			     entire ERE	is special for that ERE.
       A period	when used outside of a bracket	expression,  is	 an  ERE  that
       matches any printable or	nonprintable character except newline.

   ERE Bracket Expression
       The syntax and rules for	ERE bracket expressions	are the	same as	for RE
       bracket expressions found above.

   EREs	Matching Multiple Characters
       The following rules may be used to  construct  EREs  matching  multiple
       characters from EREs matching a single character:

	    EREERE	   A  concatenation  of	EREs matches the first encoun-
			   tered concatenation of the strings matched by  each
			   component of	the ERE.  Such a concatenation of EREs
			   enclosed in parentheses matches whatever  the  con-
			   catenation  without	the  parentheses matches.  For
			   example, both the ERE and the ERE matches the  sec-
			   ond	and third characters of	the string The longest
			   overall string is matched.

	    The	special	character plus
			   when	following an ERE matching a single  character,
			   or a	concatenation of EREs enclosed in parenthesis,
			   is an ERE that matches one or more  occurrences  of
			   the	ERE  preceding	the  plus  sign.   The	string
			   matched will	contain	as many	occurrences as	possi-
			   ble.	  For  example,	 the  ERE  matches  the	fourth
			   through seventh characters in the string

	    The	special	character asterisk
			   when	following an ERE matching a single  character,
			   or a	concatenation of EREs enclosed in parenthesis,
			   is an ERE that matches zero or more occurrences  of
			   the	ERE  preceding the asterisk.  For example, the
			   ERE matches the first character in  the  string  If
			   there  is  any choice, the longest left-most	string
			   that	permits	a match	is chosen.  For	 example,  the
			   ERE matches the third through seventh characters in
			   the string

	    The	special	character question mark
			   when	following an ERE matching a single  character,
			   or a	concatenation of EREs enclosed in parenthesis,
			   is an ERE that matches zero or one  occurrences  of
			   the	ERE  preceding	the question mark.  The	string
			   matched will	contain	as many	occurrences as	possi-
			   ble.	 For example, the ERE matches the second char-
			   acter in the	string

	    interval expression	that functions the same	way
			   as basic regular expression syntax,

   Alternation
       Two EREs	separated by the special  character  vertical  bar  matches  a
       string that is matched by either	ERE.  For example, the ERE matches the
       string and the string A vertical	bar '|'	may not	appear as follows:

	      may not appear first or last in an ERE.

	      may not appear immediately following a vertical bar.

	      may not appear after a left parenthesis.

	      may not appear immediately preceding a right parenthesis.

   Precedence
       The order of precedence is as follows, from high	to low:

	    square brackets

	    asterisk, plus sign, question mark

	    anchoring

			   concatenation

	    alternation

       For example, the	ERE is interpreted as "match either  or	 It  does  not
       mean  "match  followed by or followed in	turn by	(because concatenation
       has a higher order of precedence	than alternation).

   Expression Anchoring
       An ERE can be limited to	matching strings that  begin  or  end  a  line
       (i.e., anchored)	according to the following rules:

	    o  A  circumflex  matches the beginning of a line (anchors the ex-
	       pression	to the beginning of a line).   For  example,  the  ERE
	       matches	the  string in the line	but not	the same string	in the
	       line

	    o  A dollar	sign matches the end of	a line (anchors	the expression
	       to the end of a line).  For example, the	ERE matches the	string
	       in the line but not the same string in the line

	    o  An ERE anchored by both	and  matches  only  strings  that  are
	       lines.	For  example, the ERE matches only lines consisting of
	       the string Only empty lines match the ERE

       The use of duplication characters (+,*) following anchors is illegal.

PATTERN	MATCHING NOTATION
       The following rules apply to pattern matching notation except as	 noted
       in the descriptions of the specific utilities using pattern matching.

   Patterns Matching a Single Character
       The  following  patterns	match a	single character or a single collating
       element:	An ordinary character is a pattern that	 matches  itself.   An
       ordinary	 character is any character in the supported character set ex-
       cept newline and	the pattern matching special characters	listed in Spe-
       cial  Characters	 below.	 Matching is based on the bit pattern used for
       encoding	the character, not on the graphic representation of the	 char-
       acter.  A pattern matching special character preceded by	a backslash is
       a pattern that matches the special character itself.  When not preceded
       by  a backslash,	such characters	have special meaning in	the specifica-
       tion of patterns.  The pattern matching special characters and the con-
       texts in	which they have	their special meaning are:

	    The	 question  mark, asterisk, and left square bracket are special
	    except when
			   used	in a bracket expression	(see  Pattern  Bracket
			   Expression).
       A question mark when used outside of a bracket expression, is a pattern
       that matches any	printable or nonprintable character except newline.

   Pattern Bracket Expression
       The syntax and rules for	pattern	bracket	expressions are	 the  same  as
       for RE bracket expressions found	above with the following exceptions:

	      The  exclamation point character replaces	the circumflex charac-
	      ter in its role in a non-matching	list in	the regular expression
	      notation.

	      The  backslash is	used as	an escape character within bracket ex-
	      pressions.

   Patterns Matching Multiple Characters
       The following rules may be used to construct patterns matching multiple
       characters from patterns	matching a single character:

	      The asterisk   is	 a  pattern that matches any string, including
			     the null string.

	      RERE	     The concatenation of patterns matching  a	single
			     character	is  a  valid  pattern that matches the
			     concatenation of the single characters or collat-
			     ing  elements matched by each of the concatenated
			     patterns.	For example, the pattern  matches  the
			     string and

			     The  concatenation	of one or more patterns	match-
			     ing a single character with one or	more asterisks
			     is	 a  valid pattern.  In such patterns, each as-
			     terisk matches a string of	zero or	 more  charac-
			     ters,  up to the first character that matches the
			     character following the asterisk in the pattern.

			     For example, the pattern matches the strings  and
			     but  not the string When an asterisk is the first
			     or	last character in a pattern, it	 matches  zero
			     or	 more  characters  that	 precede or follow the
			     characters	matched	by the remainder of  the  pat-
			     tern.   For  example,  the	 pattern  matches  the
			     strings and the pattern matches the strings and

   Rule	Qualification for Patterns Used	for Filename Expansion
       The rules described above for pattern matching  are  qualified  by  the
       following rules when the	pattern	matching notation is used for filename
       expansion by sh(1), csh(1), ksh(1), and make(1).

	      If a filename (including the component of	a pathname  that  fol-
	      lows  the	 slash character) begins with a	period the period must
	      be explicitly matched by using a period as the  first  character
	      of the pattern; it cannot	be matched by either the asterisk spe-
	      cial character,  the  question  mark  special  character,	 or  a
	      bracket expression.  This	rule does not apply to make(1).

	      The  slash character in a	pathname must be explicitly matched by
	      using a slash in the pattern; it cannot be matched by either the
	      asterisk special character, the question mark special character,
	      or a bracket expression.	For make(1) only the part of the path-
	      name following the last slash character can be matched by	a spe-
	      cial character.  That is,	all special characters	preceding  the
	      last slash character lose	their special meaning.

	      Specified	 patterns  are	matched	against	existing filenames and
	      pathnames, as appropriate.  If the pattern matches any  existing
	      filenames	or pathnames, the pattern is replaced with those file-
	      names and	pathnames, sorted according to the collating  sequence
	      in effect.  If the pattern does not match	any existing filenames
	      or pathnames, the	pattern	string is left unchanged.

	      If the pattern begins with a tilde character, all	of  the	 ordi-
	      nary  characters preceding the first slash (or all characters if
	      there is no slash) are treated as	a possible login name.	If the
	      login name is null (i.e.,	the pattern contains only the tilde or
	      the tilde	is immediately followed	by a slash), the tilde is  re-
	      placed  by  a pathname of	the process's home directory, followed
	      by a slash.  Otherwise, the combination of tilde and login  name
	      are replaced by a	pathname of the	home directory associated with
	      the login	name, followed by a slash.  If the system cannot iden-
	      tify the login name, the result is implementation-defined.  This
	      rule does	not apply to sh(1) or make(1).

	      If the pattern contains a	character, variable  substitution  can
	      take place.  Environmental variables can be embedded within pat-
	      terns as:

	      or:

	      Braces are used to guarantee that	characters following name  are
	      not  interpreted	as  belonging to name.	Substitution occurs in
	      the order	specified only once; that is, the resulting string  is
	      not  examined  again  for	new names that occurred	because	of the
	      substitution.

   Rule	Qualification for Patterns Used	in the case Command
       The rules described above for pattern matching  are  qualified  by  the
       following  rule	when the pattern matching notation is used in the case
       command of sh(1)	and ksh(1).

	      Multiple alternative patterns in a single	clause can  be	speci-
	      fied  by	separating  individual	patterns with the vertical bar
	      character	strings	matching any of	the  patterns  separated  this
	      way will cause the corresponding command list to be selected.

SEE ALSO
       ksh(1), sh(1), fnmatch(3C), glob(3C), regcomp(3C), setlocale(3C), envi-
       ron(5).

STANDARDS CONFORMANCE
								     regexp(5)

NAME | DESCRIPTION | BASIC REGULAR EXPRESSIONS | EXTENDED REGULAR EXPRESSIONS | PATTERN MATCHING NOTATION | SEE ALSO | STANDARDS CONFORMANCE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=regexp&sektion=5&manpath=HP-UX+11.22>

home | help