Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
AGREP(l)							      AGREP(l)

       agrep - search a	file for a string or regular expression, with approxi-
       mate matching capabilities

       agrep [ -#cdehiklnpstvwxBDGIS ] pattern [  -f  patternfile  ]  [	 file-
       name... ]

       agrep  searches the input filenames (standard input is the default, but
       see a warning under LIMITATIONS)	for records containing	strings	 which
       either  exactly	or  approximately match	a pattern.  A record is	by de-
       fault a line, but it can	be defined differently	using  the  -d	option
       (see  below).   Normally,  each	record found is	copied to the standard
       output.	Approximate matching allows finding records that  contain  the
       pattern	with  several  errors including	substitutions, insertions, and
       deletions.  For example,	Massechusets matches  Massachusetts  with  two
       errors	(one  substitution  and	 one  insertion).   Running  agrep  -2
       Massechusets foo	outputs	all lines in foo containing any	string with at
       most 2 errors from Massechusets.

       agrep  supports	many  kinds of queries including arbitrary wild	cards,
       sets of patterns, and in	general, regular  expressions.	 See  PATTERNS
       below.	It  supports  most of the options supported by the grep	family
       plus several more (but it is not	100% compatible	with grep).  For  more
       information  on	the  algorithms	used by	agrep see Wu and Manber, "Fast
       Text Searching With Errors," Technical  report  #91-11,	Department  of
       Computer	Science, University of Arizona,	June 1991 (available by	anony-
       mous ftp	from in agrep/, and Wu  and  Manber,
       "Agrep  --  A  Fast  Approximate	 Pattern Searching Tool", To appear in
       USENIX Conference 1992 January (available by anonymous ftp from cs.ari-	in agrep/

       As with the rest	of the grep family, the	characters `$',	`^', `*', `[',
       `]', `^', `|', `(', `)',	`!', and `\' can cause unexpected results when
       included	in the pattern,	as these characters are	also meaningful	to the
       shell.  To avoid	these problems,	one should always enclose  the	entire
       pattern	argument in single quotes, i.e., 'pattern'.  Do	not use	double
       quotes (").

       When agrep is applied to	more than one input file, the name of the file
       is  displayed preceding each line which matches the pattern.  The file-
       name is not displayed when processing a single file, so if you actually
       want  the  filename  to	appear,	 use /dev/null as a second file	in the

       -#     #	is a non-negative integer (at most 8) specifying  the  maximum
	      number  of  errors  permitted in finding the approximate matches
	      (defaults	to zero).  Generally,  each  insertion,	 deletion,  or
	      substitution  counts as one error.  It is	possible to adjust the
	      relative cost of insertions, deletions and substitutions (see -I
	      -D and -S	options).

       -c     Display only the count of	matching records.

       -d 'delim'
	      Define  delim  to	be the separator between two records.  The de-
	      fault value is '$', namely a record is by	default	a line.	 delim
	      can  be  a  string of size at most 8 (with possible use of ^ and
	      $), but not a regular expression.	 Text between two delim's, be-
	      fore  the	first delim, and after the last	delim is considered as
	      one record.  For example,	-d '$$'	defines	paragraphs as  records
	      and -d '^From ' defines mail messages as records.	 agrep matches
	      each record separately.  This option  does  not  currently  work
	      with regular expressions.

       -e pattern
	      Same  as	a simple pattern argument, but useful when the pattern
	      begins with a `-'.

       -f patternfile
	      patternfile contains a set of (simple) patterns.	The output  is
	      all  lines  that	match at least one of the patterns in pattern-
	      file.  Currently,	the -f option works only for exact  match  and
	      for simple patterns (any meta symbol is interpreted as a regular
	      character); it is	compatible only	with -c, -h, -i, -l,  -s,  -v,
	      -w, and -x options.  see LIMITATIONS for size bounds.

       -h     Do not display filenames.

       -i     Case-insensitive	search	--  e.g.,  "A"	and "a"	are considered

       -k     No symbol	in the pattern is treated as a	meta  character.   For
	      example,	agrep  -k  'a(b|c)*d' foo will find the	occurrences of
	      a(b|c)*d in foo whereas agrep  'a(b|c)*d'	 foo  will  find  sub-
	      strings in foo that match	the regular expression 'a(b|c)*d'.

       -l     List only	the files that contain a match.	 This option is	useful
	      for looking for files containing a certain pattern.   For	 exam-
	      ple,  "  agrep  -l 'wonderful'  *	" will list the	names of those
	      files in current directory that contain the word 'wonderful'.

       -n     Each line	that is	printed	is prefixed by its  record  number  in
	      the file.

       -p     Find  records  in	 the  text that	contain	a supersequence	of the
	      pattern.	For example,
	       agrep -p	DCS foo	will match "Department of Computer Science."

       -s     Work silently, that is, display nothing except  error  messages.
	      This is useful for checking the error status.

       -t     Output the record	starting from the end of delim to (and includ-
	      ing) the next delim.  This  is  useful  for  cases  where	 delim
	      should come at the end of	the record.

       -v     Inverse  mode  --	display	only those records that	do not contain
	      the pattern.

       -w     Search for the pattern as	a word -- i.e.,	surrounded by  non-al-
	      phanumeric  characters.	The non-alphanumeric must surround the
	      match;  they cannot be counted as	errors.	 For example, agrep -w
	      -1 car will match	cars, but not characters.

       -x     The pattern must match the whole line.

       -y     Used with	-B option. When	-y is on, agrep	will always output the
	      best matches without giving a prompt.

       -B     Best match mode.	When -B	is specified and no exact matches  are
	      found,  agrep  will continue to search until the closest matches
	      (i.e., the ones with minimum number of  errors)  are  found,  at
	      which point the following	message	will be	shown: "the best match
	      contains x errors, there are y matches, output them? (y/n)"  The
	      best match mode is not supported for standard input, e.g., pipe-
	      line input.  When	the -#,	-c, or -l options are  specified,  the
	      -B option	is ignored.  In	general, -B may	be slower than -#, but
	      not by very much.

       -Dk    Set the cost of a	deletion to k (k is a positive integer).  This
	      option does not currently	work with regular expressions.

       -G     Output the files that contain a match.

       -Ik    Set  the	cost  of  an insertion to k (k is a positive integer).
	      This option does not currently work with regular expressions.

       -Sk    Set the cost of a	substitution to	k (k is	a  positive  integer).
	      This option does not currently work with regular expressions.

       agrep  supports	a large	variety	of patterns, including simple strings,
       strings with classes of characters, sets	of strings,  wild  cards,  and
       regular expressions.

	      any  sequence  of	 characters, including the special symbols `^'
	      for beginning of line and	`$' for	 end  of  line.	  The  special
	      characters  listed  above	 (  `$', `^', `*', `[',	`^', `|', `(',
	      `)', `!',	and `\'	) should be preceded by	`\' if they are	to  be
	      matched as regular characters.  For example, \^abc\\ corresponds
	      to the string ^abc\, whereas ^abc	corresponds to the string  abc
	      at the beginning of a line.

       Classes of characters
	      a	 list  of  characters  inside [] (in order) corresponds	to any
	      character	from the list.	For example, [a-ho-z] is any character
	      between  a  and  h or between o and z.  The symbol `^' inside []
	      complements the list.  For example, [^i-n] denote	any  character
	      in  the  character  set except character 'i' to 'n'.  The	symbol
	      `^' thus has two meanings, but this is  consistent  with	egrep.
	      The  symbol  `.'	(don't care) stands for	any symbol (except for
	      the newline symbol).

       Boolean operations
	      agrep supports an	`and' operation	`;' and	an `or'	operation `,',
	      but  not	a  combination	of  both.  For example,	'fast;network'
	      searches for all records containing both words.

       Wild cards
	      The symbol '#' is	used to	denote a wild card.  # matches zero or
	      any  number  of arbitrary	characters.  For example, ex#e matches
	      example.	The symbol # is	equivalent to .* in egrep.   In	 fact,
	      .*  will work too, because it is a valid regular expression (see
	      below), but unless this is part of an actual regular expression,
	      #	will work faster.

       Combination of exact and	approximate matching
	      any pattern inside angle brackets	<> must	match the text exactly
	      even if the match	is with	errors.	  For  example,	 <mathemat>ics
	      matches  mathematical  with one error (replacing the last	s with
	      an a), but mathe<matics> does not	match mathematical  no	matter
	      how many errors we allow.

       Regular expressions
	      The  syntax  of  regular	expressions in agrep is	in general the
	      same as that for egrep.  The union operation `|',	Kleene closure
	      `*', and parentheses () are all supported.  Currently '+'	is not
	      supported.  Regular expressions are currently limited to approx-
	      imately  30  characters  (generally  excluding meta characters).
	      Some options (-d,	-w, -f,	-t, -x,	-D, -I,	-S) do	not  currently
	      work with	regular	expressions.  The maximal number of errors for
	      regular expressions that use '*' or '|' is 4.

       agrep -2	-c ABCDEFG foo
	      gives the	number of lines	 in  file  foo	that  contain  ABCDEFG
	      within two errors.

       agrep -1	-D2 -S2	'ABCD#YZ' foo
	      outputs  the  lines  containing  ABCD followed, within arbitrary
	      distance,	by YZ, with up to one additional  insertion  (-D2  and
	      -S2 make deletions and substitutions too "expensive").

       agrep -5	-p abcdefghij /usr/dict/words
	      outputs the list of all words containing at least	5 of the first
	      10 letters of the	alphabet in order.  (Try it:  any list	start-
	      ing  with	 academia and ending with sacrilegious must mean some-

       agrep -1	'abc[0-9](de|fg)*[x-z]'	foo
	      outputs the lines	containing, within up to one error, the	string
	      that  starts with	abc followed by	one digit, followed by zero or
	      more repetitions of either de or fg, followed by either x, y, or

       agrep -d	'^From ' 'breakdown;internet' mbox
	      outputs  all  mail messages (the pattern '^From '	separates mail
	      messages in a mail file) that contain keywords  'breakdown'  and

       agrep -d	'$$' -1	'<word1> <word2>' foo
	      finds  all  paragraphs that contain word1	followed by word2 with
	      one error	in place of the	blank.	In particular, if word1	is the
	      last  word  in  a	 line  and word2 is the	first word in the next
	      line, then the space will	be substituted by a newline symbol and
	      it  will match.  Thus, this is a way to overcome separation by a
	      newline.	Note that -d '$$' (or another delim which  spans  more
	      than  one	 line)	is necessary, because otherwise	agrep searches
	      only one line at a time.

       agrep '^agrep' <this manual>
	      outputs all the examples of the use of agrep in this man pages.

       ed(1), ex(1), grep(1V), sh(1), csh(1).

       Any bug reports or comments will	be appreciated!	 Please	mail  them  to or

       Regular	expressions  do	 not support the '+' operator (match 1 or more
       instances of the	preceding token).  These can be	searched for by	 using
       this syntax in the pattern:


       (search for strings containing one instance of the pattern, followed by
       0 or more instances of the pattern).

       The following can cause an  infinite  loop:  agrep  pattern  *  >  out-
       put_file.   If  the number of matches is	high, they may be deposited in
       output_file before it is	completely read	leading	to more	matches	of the
       pattern	within	output_file  (the matches are against the whole	direc-
       tory).  It's not	clear whether this is a	"bug" (grep will do the	same),
       but be warned.

       The  maximum  size  of  the patternfile is limited to be	250Kb, and the
       maximum number of patterns is limited to	be 30,000.

       Standard	input is the default if	no input file is given.	  However,  if
       standard	 input is keyed	in directly (as	opposed	to through a pipe, for
       example)	agrep may not work for some non-simple patterns.

       There is	no size	limit for simple patterns.  More complicated  patterns
       are  currently  limited to approximately	30 characters.	Lines are lim-
       ited to 1024 characters.	 Records are limited to	48K, and may be	 trun-
       cated  if they are larger than that.  The limit of record length	can be
       changed by modifying the	parameter Max_record in	agrep.h.

       Exit status is 0	if any matches are found, 1 if none, 2 for syntax  er-
       rors or inaccessible files.

       Sun  Wu	and  Udi Manber, Department of Computer	Science, University of
       Arizona,	Tucson,	AZ 85721.  {sw|udi}

				 Jan 17, 1992			      AGREP(l)


Want to link to this manual page? Use this URL:

home | help