Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
GAWK(1)			       Utility Commands			       GAWK(1)

       gawk - pattern scanning and processing language

       gawk [ POSIX or GNU style options ] -f program-file [ --	] file ...
       gawk [ POSIX or GNU style options ] [ --	] program-text file ...

       Gawk  is	 the  GNU Project's implementation of the AWK programming lan-
       guage.  It conforms to the definition of	 the  language	in  the	 POSIX
       1003.2  Command	Language And Utilities Standard.  This version in turn
       is based	on the description in The AWK Programming  Language,  by  Aho,
       Kernighan,  and	Weinberger,  with the additional features found	in the
       System V	Release	4 version of UNIX awk.	Gawk also provides more	recent
       Bell Labs awk extensions, and some GNU-specific extensions.

       The  command  line  consists of options to gawk itself, the AWK program
       text (if	not supplied via the -f	or --file options), and	values	to  be
       made available in the ARGC and ARGV pre-defined AWK variables.

       Gawk options may	be either the traditional POSIX	one letter options, or
       the GNU style long options.  POSIX options start	 with  a  single  "-",
       while long options start	with "--".  Long options are provided for both
       GNU-specific features and for POSIX mandated features.

       Following the POSIX standard, gawk-specific options  are	 supplied  via
       arguments  to  the -W option.  Multiple -W options may be supplied Each
       -W option has a corresponding long option, as  detailed	below.	 Argu-
       ments  to  long options are either joined with the option by an = sign,
       with no intervening spaces, or they may be provided in the next command
       line  argument.	Long options may be abbreviated, as long as the	abbre-
       viation remains unique.

       Gawk accepts the	following options.

       -F fs
       --field-separator fs
	      Use fs for the input field separator (the	value of the FS	prede-
	      fined variable).

       -v var=val
       --assign	var=val
	      Assign  the  value val, to the variable var, before execution of
	      the program begins.  Such	variable values	are available  to  the
	      BEGIN block of an	AWK program.

       -f program-file
       --file program-file
	      Read  the	AWK program source from	the file program-file, instead
	      of from the  first  command  line	 argument.   Multiple  -f  (or
	      --file) options may be used.

       -mf NNN
       -mr NNN
	      Set various memory limits	to the value NNN.  The f flag sets the
	      maximum number of	fields,	and the	r flag sets the	maximum	record
	      size.   These two	flags and the -m option	are from the Bell Labs
	      research version of UNIX awk.  They are ignored by  gawk,	 since
	      gawk has no pre-defined limits.

       -W traditional
       -W compat
	      Run  in compatibility mode.  In compatibility mode, gawk behaves
	      identically to UNIX awk; none of the GNU-specific	extensions are
	      recognized.   The	 use  of  --traditional	 is preferred over the
	      other forms of this option.  See GNU EXTENSIONS, below, for more

       -W copyleft
       -W copyright
	      Print the	short version of the GNU copyright information message
	      on the standard output, and exits	successfully.

       -W help
       -W usage
	      Print a relatively short summary of the available	options	on the
	      standard	output.	  (Per the GNU Coding Standards, these options
	      cause an immediate, successful exit.)

       -W lint
       --lint Provide warnings about constructs	that are dubious or non-porta-
	      ble to other AWK implementations.

       -W lint-old
	      Provide  warnings	 about constructs that are not portable	to the
	      original version of Unix awk.

       -W posix
	      This turns on compatibility mode,	with the following  additional

	      o	\x escape sequences are	not recognized.

	      o	Only space and tab act as field	separators when	FS is set to a
		single space, newline does not.

	      o	The synonym func for the keyword function is not recognized.

	      o	The operators ** and **= cannot	be used	in place of ^ and  ^=.

	      o	The fflush() function is not available.

       -W re-interval
	      Enable  the  use	of  interval expressions in regular expression
	      matching (see Regular Expressions, below).  Interval expressions
	      were not traditionally available in the AWK language.  The POSIX
	      standard added them, to make awk and egrep consistent with  each
	      other.   However,	their use is likely to break old AWK programs,
	      so gawk only provides them  if  they  are	 requested  with  this
	      option, or when --posix is specified.

       -W source program-text
       --source	program-text
	      Use program-text as AWK program source code.  This option	allows
	      the easy intermixing of library functions	(used via the  -f  and
	      --file  options)	with  source code entered on the command line.
	      It is intended primarily for medium to large AWK	programs  used
	      in shell scripts.

       -W version
	      Print  version  information  for this particular copy of gawk on
	      the standard output.  This is useful mainly for knowing  if  the
	      current  copy  of	gawk on	your system is up to date with respect
	      to whatever the Free Software Foundation is distributing.	  This
	      is  also	useful when reporting bugs.  (Per the GNU Coding Stan-
	      dards, these options cause an immediate, successful exit.)

       --     Signal the end of	options.  This	is  useful  to	allow  further
	      arguments	 to  the AWK program itself to start with a "-".  This
	      is mainly	for consistency	with the argument  parsing  convention
	      used by most other POSIX programs.

       In  compatibility  mode,	 any other options are flagged as illegal, but
       are otherwise ignored.  In normal operation, as long  as	 program  text
       has  been supplied, unknown options are passed on to the	AWK program in
       the ARGV	array for processing.  This is particularly useful for running
       AWK programs via	the "#!" executable interpreter	mechanism.

       An  AWK program consists	of a sequence of pattern-action	statements and
       optional	function definitions.

	      pattern	{ action statements }
	      function name(parameter list) { statements }

       Gawk first reads	the program source from	the program-file(s) if	speci-
       fied, from arguments to --source, or from the first non-option argument
       on the command line.  The -f and	--source options may be	used  multiple
       times  on  the command line.  Gawk will read the	program	text as	if all
       the program-files and command line source texts had  been  concatenated
       together.   This	 is  useful  for  building libraries of	AWK functions,
       without having to include them in each new AWK program that uses	 them.
       It also provides	the ability to mix library functions with command line

       The environment variable	AWKPATH	specifies a search path	 to  use  when
       finding	source	files named with the -f	option.	 If this variable does
       not exist, the default path is ".:/usr/local/share/awk".	  (The	actual
       directory  may  vary, depending upon how	gawk was built and installed.)
       If a file name given to the -f option contains a	"/" character, no path
       search is performed.

       Gawk executes AWK programs in the following order.  First, all variable
       assignments specified via the -v	option are performed.  Next, gawk com-
       piles  the program into an internal form.  Then,	gawk executes the code
       in the BEGIN block(s) (if any), and then	proceeds  to  read  each  file
       named  in  the  ARGV array.  If there are no files named	on the command
       line, gawk reads	the standard input.

       If a filename on	the command line has the form var=val it is treated as
       a  variable  assignment.	  The  variable	var will be assigned the value
       val.  (This happens after any BEGIN block(s) have been  run.)   Command
       line  variable assignment is most useful	for dynamically	assigning val-
       ues to the variables AWK	uses to	 control  how  input  is  broken  into
       fields  and records.  It	is also	useful for controlling state if	multi-
       ple passes are needed over a single data	file.

       If the value of a particular element of ARGV is empty (""), gawk	 skips
       over it.

       For  each record	in the input, gawk tests to see	if it matches any pat-
       tern in the AWK program.	 For each pattern that the record matches, the
       associated  action  is  executed.  The patterns are tested in the order
       they occur in the program.

       Finally,	after all the input is exhausted, gawk executes	 the  code  in
       the END block(s)	(if any).

       AWK variables are dynamic; they come into existence when	they are first
       used.  Their values are either floating-point numbers  or  strings,  or
       both,  depending	 upon how they are used.  AWK also has one dimensional
       arrays; arrays with multiple dimensions may be simulated.  Several pre-
       defined variables are set as a program runs; these will be described as
       needed and summarized below.

       Normally, records are separated by newline characters.  You can control
       how  records are	separated by assigning values to the built-in variable
       RS.  If RS is any single	character, that	character  separates  records.
       Otherwise,  RS is a regular expression.	Text in	the input that matches
       this regular expression will separate the record.  However, in compati-
       bility  mode,  only the first character of its string value is used for
       separating records.  If RS is set to the	null string, then records  are
       separated  by blank lines.  When	RS is set to the null string, the new-
       line character always acts as a field separator,	in addition  to	 what-
       ever value FS may have.

       As each input record is read, gawk splits the record into fields, using
       the value of the	FS variable as the field separator.  If	FS is a	single
       character,  fields  are separated by that character.  If	FS is the null
       string, then each individual character becomes a	separate field.	  Oth-
       erwise, FS is expected to be a full regular expression.	In the special
       case that FS is a single	space, fields are separated by runs of	spaces
       and/or  tabs  and/or  newlines.	 (But  see  the	discussion of --posix,
       below).	Note that the value of IGNORECASE (see below) will also	affect
       how  fields  are	split when FS is a regular expression, and how records
       are separated when RS is	a regular expression.

       If the FIELDWIDTHS variable is set to a space separated	list  of  num-
       bers,  each  field is expected to have fixed width, and gawk will split
       up the record using the specified widths.  The value of FS is  ignored.
       Assigning  a  new  value	 to  FS	 overrides the use of FIELDWIDTHS, and
       restores	the default behavior.

       Each field in the input record may be referenced	by its	position,  $1,
       $2,  and	 so  on.  $0 is	the whole record.  The value of	a field	may be
       assigned	to as well.  Fields need not be	referenced by constants:

	      n	= 5
	      print $n

       prints the fifth	field in the input record.  The	variable NF is set  to
       the total number	of fields in the input record.

       References  to  non-existent fields (i.e. fields	after $NF) produce the
       null-string.  However, assigning	to a non-existent field	(e.g., $(NF+2)
       =  5) will increase the value of	NF, create any intervening fields with
       the null	string as their	value, and cause the value of $0 to be	recom-
       puted, with the fields being separated by the value of OFS.  References
       to negative numbered fields  cause  a  fatal  error.   Decrementing  NF
       causes  the  values  of	fields	past the new value to be lost, and the
       value of	$0 to be recomputed, with the fields being  separated  by  the
       value of	OFS.

   Built-in Variables
       Gawk's built-in variables are:

       ARGC	   The	number	of  command  line  arguments (does not include
		   options to gawk, or the program source).

       ARGIND	   The index in	ARGV of	the current file being processed.

       ARGV	   Array of command line arguments.  The array is indexed from
		   0  to  ARGC - 1.  Dynamically changing the contents of ARGV
		   can control the files used for data.

       CONVFMT	   The conversion format for numbers, "%.6g", by default.

       ENVIRON	   An array containing the values of the current  environment.
		   The	array  is  indexed  by the environment variables, each
		   element being the  value  of	 that  variable	 (e.g.,	 ENVI-
		   RON["HOME"]	might  be  /home/arnold).  Changing this array
		   does	not affect the environment seen	by programs which gawk
		   spawns via redirection or the system() function.  (This may
		   change in a future version of gawk.)

       ERRNO	   If a	system error occurs either  doing  a  redirection  for
		   getline,  during  a	read for getline, or during a close(),
		   then	ERRNO will contain a string describing the error.

       FIELDWIDTHS A white-space separated list	 of  fieldwidths.   When  set,
		   gawk	 parses	 the input into	fields of fixed	width, instead
		   of using the	value of the FS	variable as the	field  separa-
		   tor.	 The fixed field width facility	is still experimental;
		   the semantics may change as gawk evolves over time.

       FILENAME	   The name of the current input file.	If no files are	speci-
		   fied	 on  the  command  line, the value of FILENAME is "-".
		   However, FILENAME is	undefined inside the BEGIN block.

       FNR	   The input record number in the current input	file.

       FS	   The input field separator, a	space by default.  See Fields,

       IGNORECASE  Controls the	case-sensitivity of all	regular	expression and
		   string operations.  If IGNORECASE  has  a  non-zero	value,
		   then	 string	 comparisons  and  pattern  matching in	rules,
		   field splitting with	FS, record separating with RS, regular
		   expression  matching	 with  ~  and  !~,  and	 the gensub(),
		   gsub(), index(), match(), split(),  and  sub()  pre-defined
		   functions  will  all	ignore case when doing regular expres-
		   sion	operations.  Thus, if IGNORECASE is not	equal to zero,
		   /aB/	matches	all of the strings "ab", "aB", "Ab", and "AB".
		   As with all AWK variables, the initial value	of  IGNORECASE
		   is  zero,  so  all regular expression and string operations
		   are normally	case-sensitive.	  Under	 Unix,	the  full  ISO
		   8859-1  Latin-1  character  set is used when	ignoring case.
		   NOTE: In versions of	gawk prior  to	3.0,  IGNORECASE  only
		   affected  regular  expression  operations.	It now affects
		   string comparisons as well.

       NF	   The number of fields	in the current input record.

       NR	   The total number of input records seen so far.

       OFMT	   The output format for numbers, "%.6g", by default.

       OFS	   The output field separator, a space by default.

       ORS	   The output record separator,	by default a newline.

       RS	   The input record separator, by default a newline.

       RT	   The record terminator.  Gawk	sets RT	to the input text that
		   matched  the	 character  or regular expression specified by

       RSTART	   The index of	the first character matched by match();	 0  if
		   no match.

       RLENGTH	   The	length	of  the	 string	 matched  by match(); -1 if no

       SUBSEP	   The character used to separate multiple subscripts in array
		   elements, by	default	"\034".

       Arrays  are  subscripted	 with an expression between square brackets ([
       and ]).	If the expression is an	expression list	(expr, expr ...)  then
       the  array subscript is a string	consisting of the concatenation	of the
       (string)	value of each expression, separated by the value of the	SUBSEP
       variable.   This	 facility  is  used  to	 simulate multiply dimensioned
       arrays.	For example:

	      i	= "A"; j = "B";	k = "C"
	      x[i, j, k] = "hello, world\n"

       assigns the string "hello, world\n" to the element of the array x which
       is indexed by the string	"A\034B\034C".	All arrays in AWK are associa-
       tive, i.e. indexed by string values.

       The special operator in may be used in an if or while statement to  see
       if an array has an index	consisting of a	particular value.

	      if (val in array)
		   print array[val]

       If the array has	multiple subscripts, use (i, j)	in array.

       The in construct	may also be used in a for loop to iterate over all the
       elements	of an array.

       An element may be deleted from an array	using  the  delete  statement.
       The  delete statement may also be used to delete	the entire contents of
       an array, just by specifying the	array name without a subscript.

   Variable Typing And Conversion
       Variables and fields may	be (floating point) numbers,  or  strings,  or
       both.  How the value of a variable is interpreted depends upon its con-
       text.  If used in a numeric expression, it will be treated as a number,
       if used as a string it will be treated as a string.

       To force	a variable to be treated as a number, add 0 to it; to force it
       to be treated as	a string, concatenate it with the null string.

       When a string must be converted to a number, the	conversion  is	accom-
       plished	using atof(3).	A number is converted to a string by using the
       value of	CONVFMT	as a format string for sprintf(3),  with  the  numeric
       value  of  the variable as the argument.	 However, even though all num-
       bers in AWK are floating-point, integral	values are always converted as
       integers.  Thus,	given

	      CONVFMT =	"%2.2f"
	      a	= 12
	      b	= a ""

       the variable b has a string value of "12" and not "12.00".

       Gawk  performs  comparisons  as	follows: If two	variables are numeric,
       they are	compared numerically.  If one value is numeric and  the	 other
       has  a  string  value  that is a	"numeric string," then comparisons are
       also done numerically.  Otherwise, the numeric value is converted to  a
       string and a string comparison is performed.  Two strings are compared,
       of course, as strings.  According to the	POSIX standard,	 even  if  two
       strings	are  numeric strings, a	numeric	comparison is performed.  How-
       ever, this is clearly incorrect,	and gawk does not do this.

       Note that string	constants, such	as "57", are not numeric strings, they
       are  string  constants.	 The  idea of "numeric string" only applies to
       fields, getline input, FILENAME,	ARGV elements,	ENVIRON	 elements  and
       the  elements  of an array created by split() that are numeric strings.
       The basic idea is that user input, and  only  user  input,  that	 looks
       numeric,	should be treated that way.

       Uninitialized  variables	 have the numeric value	0 and the string value
       "" (the null, or	empty, string).

       AWK is a	line-oriented language.	 The pattern comes first, and then the
       action.	Action statements are enclosed in { and	}.  Either the pattern
       may be missing, or the action may be missing, but, of course, not both.
       If the pattern is missing, the action will be executed for every	single
       record of input.	 A missing action is equivalent	to

	      {	print }

       which prints the	entire record.

       Comments	begin with the "#" character, and continue until  the  end  of
       the line.  Blank	lines may be used to separate statements.  Normally, a
       statement ends with a newline, however, this is not the case for	 lines
       ending  in  a ",", {, ?,	:, &&, or ||.  Lines ending in do or else also
       have their statements automatically continued on	 the  following	 line.
       In  other  cases,  a  line can be continued by ending it	with a "\", in
       which case the newline will be ignored.

       Multiple	statements may be put on one line by separating	 them  with  a
       ";".   This  applies to both the	statements within the action part of a
       pattern-action pair (the	usual case), and to the	pattern-action	state-
       ments themselves.

       AWK patterns may	be one of the following:

	      /regular expression/
	      relational expression
	      pattern && pattern
	      pattern || pattern
	      pattern ?	pattern	: pattern
	      !	pattern
	      pattern1,	pattern2

       BEGIN  and  END	are two	special	kinds of patterns which	are not	tested
       against the input.  The action parts of all BEGIN patterns  are	merged
       as  if  all  the	 statements  had been written in a single BEGIN	block.
       They are	executed before	any of the input is read.  Similarly, all  the
       END blocks are merged, and executed when	all the	input is exhausted (or
       when an exit statement is executed).  BEGIN and END patterns cannot  be
       combined	 with  other  patterns	in pattern expressions.	 BEGIN and END
       patterns	cannot have missing action parts.

       For /regular expression/	patterns, the associated statement is executed
       for  each  input	 record	 that matches the regular expression.  Regular
       expressions are the same	as  those  in  egrep(1),  and  are  summarized

       A  relational  expression may use any of	the operators defined below in
       the section on actions.	These generally	test  whether  certain	fields
       match certain regular expressions.

       The  &&,	 ||, and !  operators are logical AND, logical OR, and logical
       NOT, respectively, as in	C.  They do short-circuit evaluation, also  as
       in  C,  and  are	used for combining more	primitive pattern expressions.
       As in most languages, parentheses may be	used to	change	the  order  of

       The  ?:	operator is like the same operator in C.  If the first pattern
       is true then the	pattern	used for testing is the	second pattern,	other-
       wise  it	 is  the  third.  Only one of the second and third patterns is

       The pattern1, pattern2 form of an expression is called a	range pattern.
       It  matches  all	input records starting with a record that matches pat-
       tern1, and continuing until a record that matches pattern2,  inclusive.
       It does not combine with	any other sort of pattern expression.

   Regular Expressions
       Regular	expressions  are  the  extended	kind found in egrep.  They are
       composed	of characters as follows:

       c	  matches the non-metacharacter	c.

       \c	  matches the literal character	c.

       .	  matches any character	including newline.

       ^	  matches the beginning	of a string.

       $	  matches the end of a string.

       [abc...]	  character list, matches any of the characters	abc....

       [^abc...]  negated character list, matches any character	except abc....

       r1|r2	  alternation: matches either r1 or r2.

       r1r2	  concatenation: matches r1, and then r2.

       r+	  matches one or more r's.

       r*	  matches zero or more r's.

       r?	  matches zero or one r's.

       (r)	  grouping: matches r.

       r{n,m}	  One  or two numbers inside braces denote an interval expres-
		  sion.	 If there is one number	in the braces,	the  preceding
		  regexp r is repeated n times.	 If there are two numbers sep-
		  arated by a comma, r is repeated n to	m times.  If there  is
		  one  number followed by a comma, then	r is repeated at least
		  n times.
		  Interval expressions are only	available if either --posix or
		  --re-interval	is specified on	the command line.

       \y	  matches  the empty string at either the beginning or the end
		  of a word.

       \B	  matches the empty string within a word.

       \<	  matches the empty string at the beginning of a word.

       \>	  matches the empty string at the end of a word.

       \w	  matches any word-constituent character  (letter,  digit,  or

       \W	  matches any character	that is	not word-constituent.

       \`	  matches  the	empty  string  at  the	beginning  of a	buffer

       \'	  matches the empty string at the end of a buffer.

       The escape sequences that are valid in string constants (see below) are
       also legal in regular expressions.

       Character  classes  are a new feature introduced	in the POSIX standard.
       A character class is a special notation for describing lists of charac-
       ters  that  have	 a specific attribute, but where the actual characters
       themselves can vary from	country	to country and/or from	character  set
       to  character  set.   For  example, the notion of what is an alphabetic
       character differs in the	USA and	in France.

       A character class is only valid in a regexp inside the  brackets	 of  a
       character  list.	  Character  classes consist of	[:, a keyword denoting
       the class, and :].  Here	are the	character classes defined by the POSIX

	      Alphanumeric characters.

	      Alphabetic characters.

	      Space or tab characters.

	      Control characters.

	      Numeric characters.

	      Characters  that	are  both  printable and visible.  (A space is
	      printable, but not visible, while	an a is	both.)

	      Lower-case alphabetic characters.

	      Printable	characters (characters that are	 not  control  charac-

	      Punctuation  characters (characters that are not letter, digits,
	      control characters, or space characters).

	      Space characters (such as	space, tab, and	formfeed,  to  name  a

	      Upper-case alphabetic characters.

	      Characters that are hexadecimal digits.

       For  example,  before the POSIX standard, to match alphanumeric charac-
       ters, you would have had	to write /[A-Za-z0-9]/.	 If your character set
       had other alphabetic characters in it, this would not match them.  With
       the POSIX character classes, you	can write /[[:alnum:]]/, and this will
       match  all the alphabetic and numeric characters	in your	character set.

       Two additional special sequences	can appear in character	lists.	 These
       apply  to  non-ASCII  character	sets,  which  can  have	single symbols
       (called collating elements) that	are represented	 with  more  than  one
       character,  as  well as several characters that are equivalent for col-
       lating, or sorting, purposes.  (E.g., in	French,	 a  plain  "e"	and  a
       grave-accented e` are equivalent.)

       Collating Symbols
	      A	 collating  symbols  is	 a  multi-character  collating element
	      enclosed in [.  and .].  For example, if ch is a collating  ele-
	      ment,  then  [[.ch.]]   is  a regexp that	matches	this collating
	      element, while [ch] is a regexp that matches either c or h.

       Equivalence Classes
	      An equivalence class is a	locale-specific	name  for  a  list  of
	      characters  that are equivalent.	The name is enclosed in	[= and
	      =].  For example,	the name e might be used to represent  all  of
	      "e,"  "e`,"  and	"e`."	In  this case, [[=e]] is a regexp that
	      matches any of
	       .BR e ,
	       .BR e' ,	or
	       .BR e` .

       These features are very valuable	in non-English speaking	locales.   The
       library	functions  that	gawk uses for regular expression matching cur-
       rently only recognize POSIX character classes; they  do	not  recognize
       collating symbols or equivalence	classes.

       The  \y,	\B, \<,	\>, \w,	\W, \`,	and \' operators are specific to gawk;
       they are	extensions based on facilities in the GNU regexp libraries.

       The various command line	options	control	how gawk interprets characters
       in regexps.

       No options
	      In  the  default	case, gawk provide all the facilities of POSIX
	      regexps and the GNU regexp operators described above.   However,
	      interval expressions are not supported.

	      Only POSIX regexps are supported,	the GNU	operators are not spe-
	      cial.  (E.g., \w matches a literal w).  Interval expressions are

	      Traditional Unix awk regexps are matched.	 The GNU operators are
	      not special, interval expressions	are not	available, and neither
	      are  the POSIX character classes ([[:alnum:]] and	so on).	 Char-
	      acters described by octal	and hexadecimal	escape	sequences  are
	      treated literally, even if they represent	regexp metacharacters.

	      Allow interval expressions in regexps, even if --traditional has
	      been provided.

       Action  statements  are enclosed	in braces, { and }.  Action statements
       consist of the usual assignment,	conditional,  and  looping  statements
       found  in  most	languages.   The  operators,  control  statements, and
       input/output statements available are patterned after those in C.

       The operators in	AWK, in	order of decreasing precedence,	are

       (...)	   Grouping

       $	   Field reference.

       ++ --	   Increment and decrement, both prefix	and postfix.

       ^	   Exponentiation (** may  also	 be  used,  and	 **=  for  the
		   assignment operator).

       + - !	   Unary plus, unary minus, and	logical	negation.

       * / %	   Multiplication, division, and modulus.

       + -	   Addition and	subtraction.

       space	   String concatenation.

       < >
       <= >=
       != ==	   The regular relational operators.

       ~ !~	   Regular  expression match, negated match.  NOTE: Do not use
		   a constant regular expression (/foo/) on the	left-hand side
		   of  a  ~  or	!~.  Only use one on the right-hand side.  The
		   expression /foo/ ~ exp has  the  same  meaning  as  (($0  ~
		   /foo/) ~ exp).  This	is usually not what was	intended.

       in	   Array membership.

       &&	   Logical AND.

       ||	   Logical OR.

       ?:	   The	C  conditional	expression.  This has the form expr1 ?
		   expr2 : expr3.  If expr1 is true, the value of the  expres-
		   sion	 is  expr2,  otherwise it is expr3.  Only one of expr2
		   and expr3 is	evaluated.

       = += -=
       *= /= %=	^= Assignment.	Both absolute assignment  (var	=  value)  and
		   operator-assignment (the other forms) are supported.

   Control Statements
       The control statements are as follows:

	      if (condition) statement [ else statement	]
	      while (condition)	statement
	      do statement while (condition)
	      for (expr1; expr2; expr3)	statement
	      for (var in array) statement
	      delete array[index]
	      delete array
	      exit [ expression	]
	      {	statements }

   I/O Statements
       The input/output	statements are as follows:

       close(file)	     Close file	(or pipe, see below).

       getline		     Set $0 from next input record; set	NF, NR,	FNR.

       getline <file	     Set $0 from next record of	file; set NF.

       getline var	     Set var from next input record; set NR, FNR.

       getline var <file     Set var from next record of file.

       next		     Stop  processing  the  current input record.  The
			     next input	record is read and  processing	starts
			     over  with	 the first pattern in the AWK program.
			     If	the end	of the input data is reached, the  END
			     block(s), if any, are executed.

       nextfile		     Stop processing the current input file.  The next
			     input record read comes from the next input file.
			     FILENAME  and ARGIND are updated, FNR is reset to
			     1,	and processing starts over with	the first pat-
			     tern in the AWK program.  If the end of the input
			     data is reached, the END block(s),	 if  any,  are
			     executed.	 NOTE:	Earlier	 versions of gawk used
			     next file,	as two words.	While  this  usage  is
			     still  recognized,	it generates a warning message
			     and will eventually be removed.

       print		     Prints the	current	record.	 The output record  is
			     terminated	with the value of the ORS variable.

       print expr-list	     Prints expressions.  Each expression is separated
			     by	the value of the  OFS  variable.   The	output
			     record  is	 terminated  with the value of the ORS

       print expr-list >file Prints expressions	on file.  Each	expression  is
			     separated	by the value of	the OFS	variable.  The
			     output record is terminated with the value	of the
			     ORS variable.

       printf fmt, expr-list Format and	print.

       printf fmt, expr-list >file
			     Format and	print on file.

       system(cmd-line)	     Execute the command cmd-line, and return the exit
			     status.  (This may	not be available on  non-POSIX

       fflush([file])	     Flush any buffers associated with the open	output
			     file or pipe file.	  If  file  is	missing,  then
			     standard  output is flushed.  If file is the null
			     string, then all open output files	and pipes have
			     their buffers flushed.

       Other  input/output  redirections  are  also  allowed.	For  print and
       printf, >> file appends output to the file, while | command writes on a
       pipe.  In a similar fashion, command | getline pipes into getline.  The
       getline command will return 0 on	end of file, and -1 on an error.

       NOTE: If	using a	pipe to	getline, or from  print	 or  printf  within  a
       loop, you must use close() to create new	instances of the command.  AWK
       does not	automatically close pipes when they return EOF.

   The printf Statement
       The AWK versions	of the printf statement	and  sprintf()	function  (see
       below) accept the following conversion specification formats:

       %c     An  ASCII	character.  If the argument used for %c	is numeric, it
	      is treated as a character	and printed.  Otherwise, the  argument
	      is  assumed to be	a string, and the only first character of that
	      string is	printed.

       %i     A	decimal	number (the integer part).

       %E     A	floating point number of the form [-]d.dddddde[+-]dd.  The  %E
	      format uses E instead of e.

       %f     A	floating point number of the form [-]ddd.dddddd.

       %G     Use  %e or %f conversion,	whichever is shorter, with nonsignifi-
	      cant zeros suppressed.  The %G format uses %E instead of %e.

       %o     An unsigned octal	number (also an	integer).

       %u     An unsigned decimal number (again, an integer).

       %s     A	character string.

       %X     An unsigned hexadecimal number (an integer).  The	%X format uses
	      ABCDEF instead of	abcdef.

       %%     A	single % character; no argument	is converted.

       There  are  optional,  additional parameters that may lie between the %
       and the control letter:

       -      The expression should be left-justified within its field.

       space  For numeric conversions, prefix positive values  with  a	space,
	      and negative values with a minus sign.

       +      The  plus	sign, used before the width modifier (see below), says
	      to always	supply a sign for numeric  conversions,	 even  if  the
	      data  to	be  formatted  is positive.  The + overrides the space

       #      Use an "alternate	form" for certain control  letters.   For  %o,
	      supply  a	 leading zero.	For %x,	and %X,	supply a leading 0x or
	      0X for a nonzero result.	For %e,	%E, and	%f,  the  result  will
	      always  contain a	decimal	point.	For %g,	and %G,	trailing zeros
	      are not removed from the result.

       0      A	leading	0 (zero) acts as a flag, that indicates	output	should
	      be  padded  with zeroes instead of spaces.  This applies even to
	      non-numeric output formats.  This	flag only has an  effect  when
	      the field	width is wider than the	value to be printed.

       width  The field	should be padded to this width.	 The field is normally
	      padded with spaces.  If the 0 flag has been used,	it  is	padded
	      with zeroes.

       .prec  A	number that specifies the precision to use when	printing.  For
	      the %e, %E, and %f formats, this specifies the number of	digits
	      you want printed to the right of the decimal point.  For the %g,
	      and %G formats, it specifies the maximum number  of  significant
	      digits.	For  the %d, %o, %i, %u, %x, and %X formats, it	speci-
	      fies the minimum number of digits	to print.  For	a  string,  it
	      specifies	 the maximum number of characters from the string that
	      should be	printed.

       The dynamic width and prec capabilities of the ANSI C printf() routines
       are supported.  A * in place of either the width	or prec	specifications
       will cause their	values to be taken from	the argument list to printf or

   Special File	Names
       When  doing I/O redirection from	either print or	printf into a file, or
       via getline from	a file,	 gawk  recognizes  certain  special  filenames
       internally.   These  filenames  allow  access  to open file descriptors
       inherited from gawk's parent process (usually the shell).   Other  spe-
       cial  filenames	provide	 access	 to information	about the running gawk
       process.	 The filenames are:

       /dev/pid	   Reading this	file returns the process  ID  of  the  current
		   process, in decimal,	terminated with	a newline.

       /dev/ppid   Reading this	file returns the parent	process	ID of the cur-
		   rent	process, in decimal, terminated	with a newline.

       /dev/pgrpid Reading this	file returns the process group ID of the  cur-
		   rent	process, in decimal, terminated	with a newline.

       /dev/user   Reading this	file returns a single record terminated	with a
		   newline.  The fields	are separated with spaces.  $1 is  the
		   value  of the getuid(2) system call,	$2 is the value	of the
		   geteuid(2) system call, $3 is the value  of	the  getgid(2)
		   system  call,  and $4 is the	value of the getegid(2)	system
		   call.  If there are any additional  fields,	they  are  the
		   group  IDs  returned	 by getgroups(2).  Multiple groups may
		   not be supported on all systems.

       /dev/stdin  The standard	input.

       /dev/stdout The standard	output.

       /dev/stderr The standard	error output.

       /dev/fd/n   The file associated with the	open file descriptor n.

       These are particularly useful for error messages.  For example:

	      print "You blew it!" > "/dev/stderr"

       whereas you would otherwise have	to use

	      print "You blew it!" | "cat 1>&2"

       These file names	may also be used on the	 command  line	to  name  data

   Numeric Functions
       AWK has the following pre-defined arithmetic functions:

       atan2(y,	x)   returns the arctangent of y/x in radians.

       cos(expr)     returns the cosine	of expr, which is in radians.

       exp(expr)     the exponential function.

       int(expr)     truncates to integer.

       log(expr)     the natural logarithm function.

       rand()	     returns a random number between 0 and 1.

       sin(expr)     returns the sine of expr, which is	in radians.

       sqrt(expr)    the square	root function.

       srand([expr]) uses  expr	as a new seed for the random number generator.
		     If	no expr	is provided, the time of  day  will  be	 used.
		     The return	value is the previous seed for the random num-
		     ber generator.

   String Functions
       Gawk has	the following pre-defined string functions:

       gensub(r, s, h [, t])   search the target string	t for matches  of  the
			       regular	expression r.  If h is a string	begin-
			       ning with g or G, then replace all matches of r
			       with  s.	  Otherwise,  h	is a number indicating
			       which match of r	to replace.  If	no t  is  sup-
			       plied, $0 is used instead.  Within the replace-
			       ment text s, the	sequence  \n,  where  n	 is  a
			       digit from 1 to 9, may be used to indicate just
			       the text	that matched  the  n'th	 parenthesized
			       subexpression.	The sequence \0	represents the
			       entire matched text, as does the	 character  &.
			       Unlike sub() and	gsub(),	the modified string is
			       returned	as the result of the function, and the
			       original	target string is not changed.

       gsub(r, s [, t])	       for each	substring matching the regular expres-
			       sion r in the string t, substitute  the	string
			       s,  and return the number of substitutions.  If
			       t is  not  supplied,  use  $0.	An  &  in  the
			       replacement text	is replaced with the text that
			       was actually matched.  Use \& to	get a  literal
			       &.   See	Effective AWK Programming for a	fuller
			       discussion of the rules for &'s and backslashes
			       in  the	replacement text of sub(), gsub(), and

       index(s,	t)	       returns the index of the	string t in the	string
			       s, or 0 if t is not present.

       length([s])	       returns	the  length  of	 the  string s,	or the
			       length of $0 if s is not	supplied.

       match(s,	r)	       returns the position in	s  where  the  regular
			       expression  r occurs, or	0 if r is not present,
			       and sets	the values of RSTART and RLENGTH.

       split(s,	a [, r])       splits the string s into	the  array  a  on  the
			       regular expression r, and returns the number of
			       fields.	If r is	omitted, FS is	used  instead.
			       The   array  a  is  cleared  first.   Splitting
			       behaves	 identically   to   field   splitting,
			       described above.

       sprintf(fmt, expr-list) prints  expr-list according to fmt, and returns
			       the resulting string.

       sub(r, s	[, t])	       just like gsub(), but only the  first  matching
			       substring is replaced.

       substr(s, i [, n])      returns	the at most n-character	substring of s
			       starting	at i.  If n is omitted,	the rest of  s
			       is used.

       tolower(str)	       returns	a copy of the string str, with all the
			       upper-case  characters  in  str	translated  to
			       their  corresponding  lower-case	 counterparts.
			       Non-alphabetic characters are left unchanged.

       toupper(str)	       returns a copy of the string str, with all  the
			       lower-case  characters  in  str	translated  to
			       their  corresponding  upper-case	 counterparts.
			       Non-alphabetic characters are left unchanged.

   Time	Functions
       Since  one  of the primary uses of AWK programs is processing log files
       that contain time stamp information, gawk provides  the	following  two
       functions for obtaining time stamps and formatting them.

       systime() returns  the  current	time  of  day as the number of seconds
		 since the Epoch (Midnight UTC,	January	1, 1970	on POSIX  sys-

       strftime([format	[, timestamp]])
		 formats  timestamp  according to the specification in format.
		 The timestamp should be of the	same form as returned by  sys-
		 time().   If timestamp	is missing, the	current	time of	day is
		 used.	If format is missing, a	default	format	equivalent  to
		 the  output  of  date(1) will be used.	 See the specification
		 for the strftime() function in	ANSI C for the format  conver-
		 sions	that  are guaranteed to	be available.  A public-domain
		 version of strftime(3)	and a man page for it come with	 gawk;
		 if  that version was used to build gawk, then all of the con-
		 versions described in that man	page are available to gawk.

   String Constants
       String constants	in AWK are sequences of	 characters  enclosed  between
       double quotes (").  Within strings, certain escape sequences are	recog-
       nized, as in C.	These are:

       \\   A literal backslash.

       \a   The	"alert"	character; usually the ASCII BEL character.

       \b   backspace.

       \f   form-feed.

       \n   newline.

       \r   carriage return.

       \t   horizontal tab.

       \v   vertical tab.

       \xhex digits
	    The	character represented by the string of hexadecimal digits fol-
	    lowing the \x.  As in ANSI C, all following	hexadecimal digits are
	    considered part of the escape sequence.  (This feature should tell
	    us something about language	design by committee.)  E.g., "\x1B" is
	    the	ASCII ESC (escape) character.

       \ddd The	character represented by the 1-, 2-, or	 3-digit  sequence  of
	    octal digits.  E.g., "\033"	is the ASCII ESC (escape) character.

       \c   The	literal	character c.

       The  escape  sequences may also be used inside constant regular expres-
       sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).

       In compatibility	mode, the characters represented by octal and hexadec-
       imal  escape  sequences	are treated literally when used	in regexp con-
       stants.	Thus, /a\52b/ is equivalent to /a\*b/.

       Functions in AWK	are defined as follows:

	      function name(parameter list) { statements }

       Functions are executed when they	are called from	within expressions  in
       either patterns or actions.  Actual parameters supplied in the function
       call are	used to	instantiate the	 formal	 parameters  declared  in  the
       function.   Arrays  are passed by reference, other variables are	passed
       by value.

       Since functions were not	originally part	of the AWK language, the  pro-
       vision for local	variables is rather clumsy: They are declared as extra
       parameters in the parameter list.  The convention is to separate	 local
       variables  from	real parameters	by extra spaces	in the parameter list.
       For example:

	      function	f(p, q,	    a, b)   # a	& b are	local

	      /abc/	{ ... ;	f(1, 2)	; ... }

       The left	parenthesis in a function call is required to immediately fol-
       low the function	name, without any intervening white space.  This is to
       avoid a syntactic ambiguity  with  the  concatenation  operator.	  This
       restriction does	not apply to the built-in functions listed above.

       Functions  may  call each other and may be recursive.  Function parame-
       ters used as local variables are	initialized to the null	string and the
       number zero upon	function invocation.

       Use return expr to return a value from a	function.  The return value is
       undefined if no value is	 provided,  or	if  the	 function  returns  by
       "falling	off" the end.

       If  --lint  has	been provided, gawk will warn about calls to undefined
       functions at parse time,	instead	of at run time.	 Calling an  undefined
       function	at run time is a fatal error.

       The word	func may be used in place of function.

       Print and sort the login	names of all users:

	    BEGIN     {	FS = ":" }
		 { print $1 | "sort" }

       Count lines in a	file:

		 { nlines++ }
	    END	 { print nlines	}

       Precede each line by its	number in the file:

	    { print FNR, $0 }

       Concatenate and line number (a variation	on a theme):

	    { print NR,	$0 }

       egrep(1),  getpid(2),  getppid(2),  getpgrp(2),	getuid(2), geteuid(2),
       getgid(2), getegid(2), getgroups(2)

       The AWK Programming Language, Alfred V. Aho, Brian W. Kernighan,	 Peter
       J. Weinberger, Addison-Wesley, 1988.  ISBN 0-201-07981-X.

       Effective  AWK Programming, Edition 1.0,	published by the Free Software
       Foundation, 1995.

       A primary goal for gawk is compatibility	with the  POSIX	 standard,  as
       well  as	with the latest	version	of UNIX	awk.  To this end, gawk	incor-
       porates the following user visible features which are not described  in
       the  AWK	book, but are part of the Bell Labs version of awk, and	are in
       the POSIX standard.

       The -v option for assigning variables before program  execution	starts
       is  new.	 The book indicates that command line variable assignment hap-
       pens when awk would otherwise open the argument as  a  file,  which  is
       after  the  BEGIN  block	 is executed.  However,	in earlier implementa-
       tions, when such	an assignment appeared	before	any  file  names,  the
       assignment  would  happen before	the BEGIN block	was run.  Applications
       came to depend on this "feature."  When awk was changed	to  match  its
       documentation,  this  option was	added to accommodate applications that
       depended	upon the old behavior.	(This feature was agreed upon by  both
       the AT&T	and GNU	developers.)

       The  -W	option	for implementation specific features is	from the POSIX

       When processing arguments, gawk uses the	special	option "--" to	signal
       the  end	 of arguments.	In compatibility mode, it will warn about, but
       otherwise ignore, undefined options.  In	normal operation,  such	 argu-
       ments are passed	on to the AWK program for it to	process.

       The  AWK	 book  does not	define the return value	of srand().  The POSIX
       standard	has it return the seed it was using, to	allow keeping track of
       random  number  sequences.   Therefore srand() in gawk also returns its
       current seed.

       Other new features are: The use of multiple -f options (from MKS	 awk);
       the  ENVIRON array; the \a, and \v escape sequences (done originally in
       gawk and	fed back into AT&T's); the tolower()  and  toupper()  built-in
       functions  (from	 AT&T);	 and  the  ANSI	C conversion specifications in
       printf (done first in AT&T's version).

       Gawk has	a number of extensions to POSIX	awk.  They  are	 described  in
       this  section.	All  the  extensions described here can	be disabled by
       invoking	gawk with the --traditional option.

       The following features of gawk are not available	in POSIX awk.

	      o	The \x escape sequence.	 (Disabled with	--posix.)

	      o	The fflush() function.	(Disabled with --posix.)

	      o	The systime(), strftime(), and gensub()	functions.

	      o	The special file names available for I/O redirection  are  not

	      o	The ARGIND, ERRNO, and RT variables are	not special.

	      o	The  IGNORECASE	 variable  and its side-effects	are not	avail-

	      o	The FIELDWIDTHS	variable and fixed-width field splitting.

	      o	The use	of RS as a regular expression.

	      o	The ability to split out individual characters using the  null
		string	as  the	 value	of  FS,	 and  as the third argument to

	      o	No path	search is performed for	files named via	the -f option.
		Therefore the AWKPATH environment variable is not special.

	      o	The use	of nextfile to abandon processing of the current input

	      o	The use	of delete array	to delete the entire  contents	of  an

       The  AWK	book does not define the return	value of the close() function.
       Gawk's close() returns the value	from  fclose(3),  or  pclose(3),  when
       closing a file or pipe, respectively.

       When  gawk is invoked with the --traditional option, if the fs argument
       to the -F option	is "t",	then FS	will be	 set  to  the  tab  character.
       Note  that  typing  gawk	-F\t ...  simply causes	the shell to quote the
       "t,", and does not pass "\t" to the -F option.  Since this is a	rather
       ugly  special case, it is not the default behavior.  This behavior also
       does not	occur if --posix has been specified.   To  really  get	a  tab
       character as the	field separator, it is best to use quotes: gawk	-F'\t'

       There are two features of historical AWK	implementations	that gawk sup-
       ports.	First,	it  is possible	to call	the length() built-in function
       not only	with no	argument, but even without parentheses!	 Thus,

	      a	= length     # Holy Algol 60, Batman!

       is the same as either of

	      a	= length()
	      a	= length($0)

       This feature is marked as "deprecated" in the POSIX standard, and  gawk
       will  issue  a warning about its	use if --lint is specified on the com-
       mand line.

       The other feature is the	use of either the continue or the break	state-
       ments  outside  the  body of a while, for, or do	loop.  Traditional AWK
       implementations have treated such  usage	 as  equivalent	 to  the  next
       statement.   Gawk  will	support	 this  usage if	--traditional has been

       If POSIXLY_CORRECT exists in the	environment, then gawk behaves exactly
       as  if  --posix	had been specified on the command line.	 If --lint has
       been specified, gawk will issue a warning message to this effect.

       The AWKPATH environment variable	can be	used  to  provide  a  list  of
       directories  that gawk will search when looking for files named via the
       -f and --file options.

       The -F option is	not necessary given the	command	line variable  assign-
       ment feature; it	remains	only for backwards compatibility.

       If  your	 system	 actually  has	support	for /dev/fd and	the associated
       /dev/stdin, /dev/stdout,	and /dev/stderr	files, you may	get  different
       output  from  gawk  than	you would get on a system without those	files.
       When gawk interprets these files	internally, it synchronizes output  to
       the  standard output with output	to /dev/stdout,	while on a system with
       those files, the	output is actually to different	 open  files.	Caveat

       Syntactically  invalid  single  character programs tend to overflow the
       parse stack, generating a rather	unhelpful message.  Such programs  are
       surprisingly  difficult to diagnose in the completely general case, and
       the effort to do	so really is not worth it.

       This man	page documents gawk, version 3.0.6.

       The original version of UNIX awk	was designed and implemented by	Alfred
       Aho,  Peter  Weinberger,	 and Brian Kernighan of	AT&T Bell Labs.	 Brian
       Kernighan continues to maintain and enhance it.

       Paul Rubin and Jay Fenlason, of the  Free  Software  Foundation,	 wrote
       gawk,  to be compatible with the	original version of awk	distributed in
       Seventh Edition UNIX.  John Woods contributed a number  of  bug	fixes.
       David  Trueman,	with contributions from	Arnold Robbins,	made gawk com-
       patible with the	new version of UNIX awk.  Arnold Robbins is  the  cur-
       rent maintainer.

       The  initial  DOS  port	was  done  by Conrad Kwok and Scott Garfinkle.
       Scott Deifik is the current DOS maintainer.  Pat	Rankin did the port to
       VMS,  and  Michal Jaegermann did	the port to the	Atari ST.  The port to
       OS/2 was	done by	Kai Uwe	Rommel,	with contributions and help from  Dar-
       rel Hankerson.  Fred Fish supplied support for the Amiga.

       If  you	find  a	 bug  in  gawk,	 please	 send  electronic mail to bug-  Please include your operating system and	its  revision,
       the version of gawk, what C compiler you	used to	compile	it, and	a test
       program and data	that are as small  as  possible	 for  reproducing  the

       Before  sending a bug report, please do two things.  First, verify that
       you have	the latest version of gawk.  Many bugs (usually	 subtle	 ones)
       are fixed at each release, and if yours is out of date, the problem may
       already have been solved.  Second, please read this man	page  and  the
       reference  manual  carefully  to	 be  sure that what you	think is a bug
       really is, instead of just a quirk in the language.

       Whatever	you do,	do NOT post a bug report in comp.lang.awk.  While  the
       gawk  developers	 occasionally read this	newsgroup, posting bug reports
       there is	an unreliable way to report bugs.   Instead,  please  use  the
       electronic mail addresses given above.

       Brian  Kernighan	of Bell	Labs provided valuable assistance during test-
       ing and debugging.  We thank him.

       Copyright (C) 1996-2000 Free Software Foundation, Inc.

       Permission is granted to	make and distribute verbatim  copies  of  this
       manual  page  provided  the copyright notice and	this permission	notice
       are preserved on	all copies.

       Permission is granted to	copy and distribute modified versions of  this
       manual  page  under  the	conditions for verbatim	copying, provided that
       the entire resulting derived work is distributed	under the terms	 of  a
       permission notice identical to this one.

       Permission  is granted to copy and distribute translations of this man-
       ual page	into another language, under the above conditions for modified
       versions,  except that this permission notice may be stated in a	trans-
       lation approved by the Foundation.

Free Software Foundation	  May 17 2000			       GAWK(1)


Want to link to this manual page? Use this URL:

home | help