Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
GAWK(1)			       Utility Commands			       GAWK(1)

       gawk - pattern scanning and processing language

       gawk [ POSIX or GNU style options ] -f program-file [ --	] file ...
       gawk [ POSIX or GNU style options ] [ --	] program-text file ...

       Gawk  is	 the  GNU Project's implementation of the AWK programming lan-
       guage.  It conforms to the definition of	 the  language	in  the	 POSIX
       1003.2  Command	Language And Utilities Standard.  This version in turn
       is based	on the description in The AWK Programming  Language,  by  Aho,
       Kernighan,  and	Weinberger,  with the additional features found	in the
       System V	Release	4 version of UNIX awk.	Gawk also provides more	recent
       Bell Labs awk extensions, and some GNU-specific extensions.

       The  command  line  consists of options to gawk itself, the AWK program
       text (if	not supplied via the -f	or --file options), and	values	to  be
       made available in the ARGC and ARGV pre-defined AWK variables.

       Gawk options may	be either the traditional POSIX	one letter options, or
       the GNU style long options.  POSIX options start	 with  a  single  "-",
       while long options start	with "--".  Long options are provided for both
       GNU-specific features and for POSIX mandated features.

       Following the POSIX standard, gawk-specific options  are	 supplied  via
       arguments  to  the -W option.  Multiple -W options may be supplied Each
       -W option has a corresponding long option, as  detailed	below.	 Argu-
       ments  to  long options are either joined with the option by an = sign,
       with no intervening spaces, or they may be provided in the next command
       line  argument.	Long options may be abbreviated, as long as the	abbre-
       viation remains unique.

       Gawk accepts the	following options.

       -F fs
       --field-separator fs
	      Use fs for the input field separator (the	value of the FS	prede-
	      fined variable).

       -v var=val
       --assign	var=val
	      Assign  the  value val, to the variable var, before execution of
	      the program begins.  Such	variable values	are available  to  the
	      BEGIN block of an	AWK program.

       -f program-file
       --file program-file
	      Read  the	AWK program source from	the file program-file, instead
	      of from the  first  command  line	 argument.   Multiple  -f  (or
	      --file) options may be used.

       -mf NNN
       -mr NNN
	      Set various memory limits	to the value NNN.  The f flag sets the
	      maximum number of	fields,	and the	r flag sets the	maximum	record
	      size.   These two	flags and the -m option	are from the Bell Labs
	      research version of UNIX awk.  They are ignored by  gawk,	 since
	      gawk has no pre-defined limits.

       -W traditional
       -W compat
	      Run  in compatibility mode.  In compatibility mode, gawk behaves
	      identically to UNIX awk; none of the GNU-specific	extensions are
	      recognized.   The	 use  of  --traditional	 is preferred over the
	      other forms of this option.  See GNU EXTENSIONS, below, for more

       -W copyleft
       -W copyright
	      Print the	short version of the GNU copyright information message
	      on the standard output, and exits	successfully.

       -W help
       -W usage
	      Print a relatively short summary of the available	options	on the
	      standard	output.	  (Per the GNU Coding Standards, these options
	      cause an immediate, successful exit.)

       -W lint
       --lint Provide warnings about constructs	that are dubious or non-porta-
	      ble to other AWK implementations.

       -W lint-old
	      Provide  warnings	 about constructs that are not portable	to the
	      original version of Unix awk.

       -W posix
	      This turns on compatibility mode,	with the following  additional

	      o	\x escape sequences are	not recognized.

	      o	Only space and tab act as field	separators when	FS is set to a
		single space, newline does not.

	      o	The synonym func for the keyword function is not recognized.

	      o	The operators ** and **= cannot	be used	in place of ^ and ^=.

	      o	The fflush() function is not available.

       -W re-interval
	      Enable the use of	interval  expressions  in  regular  expression
	      matching (see Regular Expressions, below).  Interval expressions
	      were not traditionally available in the AWK language.  The POSIX
	      standard	added them, to make awk	and egrep consistent with each
	      other.  However, their use is likely to break old	AWK  programs,
	      so  gawk	only provides them if they are requested with this op-
	      tion, or when --posix is specified.

       -W source program-text
       --source	program-text
	      Use program-text as AWK program source code.  This option	allows
	      the  easy	 intermixing of	library	functions (used	via the	-f and
	      --file options) with source code entered on  the	command	 line.
	      It  is  intended primarily for medium to large AWK programs used
	      in shell scripts.

       -W version
	      Print version information	for this particular copy  of  gawk  on
	      the  standard  output.  This is useful mainly for	knowing	if the
	      current copy of gawk on your system is up	to date	 with  respect
	      to  whatever the Free Software Foundation	is distributing.  This
	      is also useful when reporting bugs.  (Per	the GNU	 Coding	 Stan-
	      dards, these options cause an immediate, successful exit.)

       --     Signal  the end of options.  This	is useful to allow further ar-
	      guments to the AWK program itself	to start with a	"-".  This  is
	      mainly for consistency with the argument parsing convention used
	      by most other POSIX programs.

       In compatibility	mode, any other	options	are flagged  as	 illegal,  but
       are  otherwise  ignored.	  In normal operation, as long as program text
       has been	supplied, unknown options are passed on	to the AWK program  in
       the ARGV	array for processing.  This is particularly useful for running
       AWK programs via	the "#!" executable interpreter	mechanism.

       An AWK program consists of a sequence of	pattern-action statements  and
       optional	function definitions.

	      pattern	{ action statements }
	      function name(parameter list) { statements }

       Gawk  first reads the program source from the program-file(s) if	speci-
       fied, from arguments to --source, or from the first non-option argument
       on  the command line.  The -f and --source options may be used multiple
       times on	the command line.  Gawk	will read the program text as  if  all
       the  program-files  and command line source texts had been concatenated
       together.  This is useful for  building	libraries  of  AWK  functions,
       without	having to include them in each new AWK program that uses them.
       It also provides	the ability to mix library functions with command line

       The  environment	 variable  AWKPATH specifies a search path to use when
       finding source files named with the -f option.  If this	variable  does
       not  exist,  the	default	path is	".:/usr/local/share/awk".  (The	actual
       directory may vary, depending upon how gawk was built  and  installed.)
       If a file name given to the -f option contains a	"/" character, no path
       search is performed.

       Gawk executes AWK programs in the following order.  First, all variable
       assignments specified via the -v	option are performed.  Next, gawk com-
       piles the program into an internal form.	 Then, gawk executes the  code
       in  the	BEGIN  block(s)	 (if any), and then proceeds to	read each file
       named in	the ARGV array.	 If there are no files named  on  the  command
       line, gawk reads	the standard input.

       If a filename on	the command line has the form var=val it is treated as
       a variable assignment.  The variable var	will  be  assigned  the	 value
       val.   (This  happens after any BEGIN block(s) have been	run.)  Command
       line variable assignment	is most	useful for dynamically assigning  val-
       ues  to	the  variables	AWK  uses  to control how input	is broken into
       fields and records.  It is also useful for controlling state if	multi-
       ple passes are needed over a single data	file.

       If  the value of	a particular element of	ARGV is	empty (""), gawk skips
       over it.

       For each	record in the input, gawk tests	to see if it matches any  pat-
       tern in the AWK program.	 For each pattern that the record matches, the
       associated action is executed.  The patterns are	tested	in  the	 order
       they occur in the program.

       Finally,	 after	all  the input is exhausted, gawk executes the code in
       the END block(s)	(if any).

       AWK variables are dynamic; they come into existence when	they are first
       used.   Their  values  are either floating-point	numbers	or strings, or
       both, depending upon how	they are used.	AWK also has  one  dimensional
       arrays; arrays with multiple dimensions may be simulated.  Several pre-
       defined variables are set as a program runs; these will be described as
       needed and summarized below.

       Normally, records are separated by newline characters.  You can control
       how records are separated by assigning values to	the built-in  variable
       RS.   If	 RS is any single character, that character separates records.
       Otherwise, RS is	a regular expression.  Text in the input that  matches
       this regular expression will separate the record.  However, in compati-
       bility mode, only the first character of	its string value is  used  for
       separating  records.  If	RS is set to the null string, then records are
       separated by blank lines.  When RS is set to the	null string, the  new-
       line  character	always acts as a field separator, in addition to what-
       ever value FS may have.

       As each input record is read, gawk splits the record into fields, using
       the value of the	FS variable as the field separator.  If	FS is a	single
       character, fields are separated by that character.  If FS is  the  null
       string,	then each individual character becomes a separate field.  Oth-
       erwise, FS is expected to be a full regular expression.	In the special
       case  that FS is	a single space,	fields are separated by	runs of	spaces
       and/or tabs and/or newlines.  (But see the discussion of	 --posix,  be-
       low).   Note  that the value of IGNORECASE (see below) will also	affect
       how fields are split when FS is a regular expression, and  how  records
       are separated when RS is	a regular expression.

       If  the	FIELDWIDTHS  variable is set to	a space	separated list of num-
       bers, each field	is expected to have fixed width, and gawk  will	 split
       up  the record using the	specified widths.  The value of	FS is ignored.
       Assigning a new value to	FS overrides the use of	FIELDWIDTHS,  and  re-
       stores the default behavior.

       Each  field  in the input record	may be referenced by its position, $1,
       $2, and so on.  $0 is the whole record.	The value of a	field  may  be
       assigned	to as well.  Fields need not be	referenced by constants:

	      n	= 5
	      print $n

       prints  the fifth field in the input record.  The variable NF is	set to
       the total number	of fields in the input record.

       References to non-existent fields (i.e. fields after $NF)  produce  the
       null-string.  However, assigning	to a non-existent field	(e.g., $(NF+2)
       = 5) will increase the value of NF, create any intervening fields  with
       the  null string	as their value,	and cause the value of $0 to be	recom-
       puted, with the fields being separated by the value of OFS.  References
       to  negative  numbered  fields  cause  a	 fatal error.  Decrementing NF
       causes the values of fields past	the new	value  to  be  lost,  and  the
       value  of  $0  to be recomputed,	with the fields	being separated	by the
       value of	OFS.

   Built-in Variables
       Gawk's built-in variables are:

       ARGC	   The number of command line arguments	(does not include  op-
		   tions to gawk, or the program source).

       ARGIND	   The index in	ARGV of	the current file being processed.

       ARGV	   Array of command line arguments.  The array is indexed from
		   0 to	ARGC - 1.  Dynamically changing	the contents  of  ARGV
		   can control the files used for data.

       CONVFMT	   The conversion format for numbers, "%.6g", by default.

       ENVIRON	   An  array containing	the values of the current environment.
		   The array is	indexed	by the environment variables, each el-
		   ement  being	 the  value  of	 that  variable	 (e.g.,	 ENVI-
		   RON["HOME"] might be	/home/arnold).	 Changing  this	 array
		   does	not affect the environment seen	by programs which gawk
		   spawns via redirection or the system() function.  (This may
		   change in a future version of gawk.)

       ERRNO	   If  a  system  error	 occurs	either doing a redirection for
		   getline, during a read for getline, or  during  a  close(),
		   then	ERRNO will contain a string describing the error.

       FIELDWIDTHS A  white-space  separated  list  of fieldwidths.  When set,
		   gawk	parses the input into fields of	fixed  width,  instead
		   of  using the value of the FS variable as the field separa-
		   tor.	 The fixed field width facility	is still experimental;
		   the semantics may change as gawk evolves over time.

       FILENAME	   The name of the current input file.	If no files are	speci-
		   fied	on the command line, the value	of  FILENAME  is  "-".
		   However, FILENAME is	undefined inside the BEGIN block.

       FNR	   The input record number in the current input	file.

       FS	   The input field separator, a	space by default.  See Fields,

       IGNORECASE  Controls the	case-sensitivity of all	regular	expression and
		   string  operations.	 If  IGNORECASE	 has a non-zero	value,
		   then	string comparisons  and	 pattern  matching  in	rules,
		   field splitting with	FS, record separating with RS, regular
		   expression matching	with  ~	 and  !~,  and	the  gensub(),
		   gsub(),  index(),  match(),	split(), and sub() pre-defined
		   functions will all ignore case when doing  regular  expres-
		   sion	operations.  Thus, if IGNORECASE is not	equal to zero,
		   /aB/	matches	all of the strings "ab", "aB", "Ab", and "AB".
		   As  with all	AWK variables, the initial value of IGNORECASE
		   is zero, so all regular expression  and  string  operations
		   are	normally  case-sensitive.   Under  Unix,  the full ISO
		   8859-1 Latin-1 character set	is used	 when  ignoring	 case.
		   NOTE: In versions of	gawk prior to 3.0, IGNORECASE only af-
		   fected  regular  expression	operations.   It  now  affects
		   string comparisons as well.

       NF	   The number of fields	in the current input record.

       NR	   The total number of input records seen so far.

       OFMT	   The output format for numbers, "%.6g", by default.

       OFS	   The output field separator, a space by default.

       ORS	   The output record separator,	by default a newline.

       RS	   The input record separator, by default a newline.

       RT	   The record terminator.  Gawk	sets RT	to the input text that
		   matched the character or regular  expression	 specified  by

       RSTART	   The	index  of the first character matched by match(); 0 if
		   no match.

       RLENGTH	   The length of the string  matched  by  match();  -1	if  no

       SUBSEP	   The character used to separate multiple subscripts in array
		   elements, by	default	"\034".

       Arrays are subscripted with an expression between  square  brackets  ([
       and ]).	If the expression is an	expression list	(expr, expr ...)  then
       the array subscript is a	string consisting of the concatenation of  the
       (string)	value of each expression, separated by the value of the	SUBSEP
       variable.  This facility	is used	to simulate multiply  dimensioned  ar-
       rays.  For example:

	      i	= "A"; j = "B";	k = "C"
	      x[i, j, k] = "hello, world\n"

       assigns the string "hello, world\n" to the element of the array x which
       is indexed by the string	"A\034B\034C".	All arrays in AWK are associa-
       tive, i.e. indexed by string values.

       The  special operator in	may be used in an if or	while statement	to see
       if an array has an index	consisting of a	particular value.

	      if (val in array)
		   print array[val]

       If the array has	multiple subscripts, use (i, j)	in array.

       The in construct	may also be used in a for loop to iterate over all the
       elements	of an array.

       An  element  may	 be  deleted from an array using the delete statement.
       The delete statement may	also be	used to	delete the entire contents  of
       an array, just by specifying the	array name without a subscript.

   Variable Typing And Conversion
       Variables  and  fields  may be (floating	point) numbers,	or strings, or
       both.  How the value of a variable is interpreted depends upon its con-
       text.  If used in a numeric expression, it will be treated as a number,
       if used as a string it will be treated as a string.

       To force	a variable to be treated as a number, add 0 to it; to force it
       to be treated as	a string, concatenate it with the null string.

       When  a	string must be converted to a number, the conversion is	accom-
       plished using atof(3).  A number	is converted to	a string by using  the
       value  of  CONVFMT  as a	format string for sprintf(3), with the numeric
       value of	the variable as	the argument.  However,	even though  all  num-
       bers in AWK are floating-point, integral	values are always converted as
       integers.  Thus,	given

	      CONVFMT =	"%2.2f"
	      a	= 12
	      b	= a ""

       the variable b has a string value of "12" and not "12.00".

       Gawk performs comparisons as follows: If	 two  variables	 are  numeric,
       they  are  compared numerically.	 If one	value is numeric and the other
       has a string value that is a "numeric  string,"	then  comparisons  are
       also  done numerically.	Otherwise, the numeric value is	converted to a
       string and a string comparison is performed.  Two strings are compared,
       of  course,  as	strings.  According to the POSIX standard, even	if two
       strings are numeric strings, a numeric comparison is  performed.	  How-
       ever, this is clearly incorrect,	and gawk does not do this.

       Note that string	constants, such	as "57", are not numeric strings, they
       are string constants.  The idea of "numeric  string"  only  applies  to
       fields,	getline	 input,	 FILENAME, ARGV	elements, ENVIRON elements and
       the elements of an array	created	by split() that	are  numeric  strings.
       The  basic idea is that user input, and only user input,	that looks nu-
       meric, should be	treated	that way.

       Uninitialized variables have the	numeric	value 0	and the	 string	 value
       "" (the null, or	empty, string).

       AWK is a	line-oriented language.	 The pattern comes first, and then the
       action.	Action statements are enclosed in { and	}.  Either the pattern
       may be missing, or the action may be missing, but, of course, not both.
       If the pattern is missing, the action will be executed for every	single
       record of input.	 A missing action is equivalent	to

	      {	print }

       which prints the	entire record.

       Comments	 begin	with  the "#" character, and continue until the	end of
       the line.  Blank	lines may be used to separate statements.  Normally, a
       statement  ends with a newline, however,	this is	not the	case for lines
       ending in a ",",	{, ?, :, &&, or	||.  Lines ending in do	or  else  also
       have  their  statements	automatically continued	on the following line.
       In other	cases, a line can be continued by ending it  with  a  "\",  in
       which case the newline will be ignored.

       Multiple	 statements  may  be put on one	line by	separating them	with a
       ";".  This applies to both the statements within	the action part	 of  a
       pattern-action  pair (the usual case), and to the pattern-action	state-
       ments themselves.

       AWK patterns may	be one of the following:

	      /regular expression/
	      relational expression
	      pattern && pattern
	      pattern || pattern
	      pattern ?	pattern	: pattern
	      !	pattern
	      pattern1,	pattern2

       BEGIN and END are two special kinds of patterns which  are  not	tested
       against	the  input.  The action	parts of all BEGIN patterns are	merged
       as if all the statements	had been written  in  a	 single	 BEGIN	block.
       They  are executed before any of	the input is read.  Similarly, all the
       END blocks are merged, and executed when	all the	input is exhausted (or
       when  an	exit statement is executed).  BEGIN and	END patterns cannot be
       combined	with other patterns in pattern	expressions.   BEGIN  and  END
       patterns	cannot have missing action parts.

       For /regular expression/	patterns, the associated statement is executed
       for each	input record that matches the regular expression.  Regular ex-
       pressions are the same as those in egrep(1), and	are summarized below.

       A  relational  expression may use any of	the operators defined below in
       the section on actions.	These generally	test  whether  certain	fields
       match certain regular expressions.

       The  &&,	 ||, and !  operators are logical AND, logical OR, and logical
       NOT, respectively, as in	C.  They do short-circuit evaluation, also  as
       in  C,  and  are	used for combining more	primitive pattern expressions.
       As in most languages, parentheses may be	used to	change	the  order  of

       The  ?:	operator is like the same operator in C.  If the first pattern
       is true then the	pattern	used for testing is the	second pattern,	other-
       wise  it	 is  the  third.  Only one of the second and third patterns is

       The pattern1, pattern2 form of an expression is called a	range pattern.
       It  matches  all	input records starting with a record that matches pat-
       tern1, and continuing until a record that matches pattern2,  inclusive.
       It does not combine with	any other sort of pattern expression.

   Regular Expressions
       Regular	expressions  are  the  extended	kind found in egrep.  They are
       composed	of characters as follows:

       c	  matches the non-metacharacter	c.

       \c	  matches the literal character	c.

       .	  matches any character	including newline.

       ^	  matches the beginning	of a string.

       $	  matches the end of a string.

       [abc...]	  character list, matches any of the characters	abc....

       [^abc...]  negated character list, matches any character	except abc....

       r1|r2	  alternation: matches either r1 or r2.

       r1r2	  concatenation: matches r1, and then r2.

       r+	  matches one or more r's.

       r*	  matches zero or more r's.

       r?	  matches zero or one r's.

       (r)	  grouping: matches r.

       r{n,m}	  One or two numbers inside braces denote an interval  expres-
		  sion.	  If  there is one number in the braces, the preceding
		  regexp r is repeated n times.	 If there are two numbers sep-
		  arated  by a comma, r	is repeated n to m times.  If there is
		  one number followed by a comma, then r is repeated at	 least
		  n times.
		  Interval expressions are only	available if either --posix or
		  --re-interval	is specified on	the command line.

       \y	  matches the empty string at either the beginning or the  end
		  of a word.

       \B	  matches the empty string within a word.

       \<	  matches the empty string at the beginning of a word.

       \>	  matches the empty string at the end of a word.

       \w	  matches  any	word-constituent  character (letter, digit, or

       \W	  matches any character	that is	not word-constituent.

       \`	  matches the empty  string  at	 the  beginning	 of  a	buffer

       \'	  matches the empty string at the end of a buffer.

       The escape sequences that are valid in string constants (see below) are
       also legal in regular expressions.

       Character classes are a new feature introduced in the  POSIX  standard.
       A character class is a special notation for describing lists of charac-
       ters that have a	specific attribute, but	where  the  actual  characters
       themselves  can	vary from country to country and/or from character set
       to character set.  For example, the notion of  what  is	an  alphabetic
       character differs in the	USA and	in France.

       A  character  class  is only valid in a regexp inside the brackets of a
       character list.	Character classes consist of [:,  a  keyword  denoting
       the class, and :].  Here	are the	character classes defined by the POSIX

	      Alphanumeric characters.

	      Alphabetic characters.

	      Space or tab characters.

	      Control characters.

	      Numeric characters.

	      Characters that are both printable and  visible.	 (A  space  is
	      printable, but not visible, while	an a is	both.)

	      Lower-case alphabetic characters.

	      Printable	 characters  (characters  that are not control charac-

	      Punctuation characters (characters that are not letter,  digits,
	      control characters, or space characters).

	      Space  characters	 (such	as space, tab, and formfeed, to	name a

	      Upper-case alphabetic characters.

	      Characters that are hexadecimal digits.

       For example, before the POSIX standard, to match	 alphanumeric  charac-
       ters, you would have had	to write /[A-Za-z0-9]/.	 If your character set
       had other alphabetic characters in it, this would not match them.  With
       the POSIX character classes, you	can write /[[:alnum:]]/, and this will
       match all the alphabetic	and numeric characters in your character set.

       Two additional special sequences	can appear in character	lists.	 These
       apply  to  non-ASCII  character	sets,  which  can  have	single symbols
       (called collating elements) that	are represented	 with  more  than  one
       character,  as  well as several characters that are equivalent for col-
       lating, or sorting, purposes.  (E.g., in	French,	 a  plain  "e"	and  a
       grave-accented e` are equivalent.)

       Collating Symbols
	      A	 collating  symbols is a multi-character collating element en-
	      closed in	[.  and	.].  For example, if ch	is  a  collating  ele-
	      ment, then [[.ch.]]  is a	regexp that matches this collating el-
	      ement, while [ch]	is a regexp that matches either	c or h.

       Equivalence Classes
	      An equivalence class is a	locale-specific	name  for  a  list  of
	      characters  that are equivalent.	The name is enclosed in	[= and
	      =].  For example,	the name e might be used to represent  all  of
	      "e,"  "e`,"  and	"e`."	In  this case, [[=e]] is a regexp that
	      matches any of
	       .BR e ,
	       .BR e' ,	or
	       .BR e` .

       These features are very valuable	in non-English speaking	locales.   The
       library	functions  that	gawk uses for regular expression matching cur-
       rently only recognize POSIX character classes; they  do	not  recognize
       collating symbols or equivalence	classes.

       The  \y,	\B, \<,	\>, \w,	\W, \`,	and \' operators are specific to gawk;
       they are	extensions based on facilities in the GNU regexp libraries.

       The various command line	options	control	how gawk interprets characters
       in regexps.

       No options
	      In  the  default	case, gawk provide all the facilities of POSIX
	      regexps and the GNU regexp operators described above.   However,
	      interval expressions are not supported.

	      Only POSIX regexps are supported,	the GNU	operators are not spe-
	      cial.  (E.g., \w matches a literal w).  Interval expressions are

	      Traditional Unix awk regexps are matched.	 The GNU operators are
	      not special, interval expressions	are not	available, and neither
	      are  the POSIX character classes ([[:alnum:]] and	so on).	 Char-
	      acters described by octal	and hexadecimal	escape	sequences  are
	      treated literally, even if they represent	regexp metacharacters.

	      Allow interval expressions in regexps, even if --traditional has
	      been provided.

       Action statements are enclosed in braces, { and }.   Action  statements
       consist	of  the	 usual assignment, conditional,	and looping statements
       found in	most languages.	 The operators,	control	 statements,  and  in-
       put/output statements available are patterned after those in C.

       The operators in	AWK, in	order of decreasing precedence,	are

       (...)	   Grouping

       $	   Field reference.

       ++ --	   Increment and decrement, both prefix	and postfix.

       ^	   Exponentiation  (**	may  also be used, and **= for the as-
		   signment operator).

       + - !	   Unary plus, unary minus, and	logical	negation.

       * / %	   Multiplication, division, and modulus.

       + -	   Addition and	subtraction.

       space	   String concatenation.

       < >
       <= >=
       != ==	   The regular relational operators.

       ~ !~	   Regular expression match, negated match.  NOTE: Do not  use
		   a constant regular expression (/foo/) on the	left-hand side
		   of a	~ or !~.  Only use one on the  right-hand  side.   The
		   expression  /foo/  ~	 exp  has  the	same meaning as	(($0 ~
		   /foo/) ~ exp).  This	is usually not what was	intended.

       in	   Array membership.

       &&	   Logical AND.

       ||	   Logical OR.

       ?:	   The C conditional expression.  This has the	form  expr1  ?
		   expr2  : expr3.  If expr1 is	true, the value	of the expres-
		   sion	is expr2, otherwise it is expr3.  Only	one  of	 expr2
		   and expr3 is	evaluated.

       = += -=
       *= /= %=	^= Assignment.	Both absolute assignment (var =	value) and op-
		   erator-assignment (the other	forms) are supported.

   Control Statements
       The control statements are as follows:

	      if (condition) statement [ else statement	]
	      while (condition)	statement
	      do statement while (condition)
	      for (expr1; expr2; expr3)	statement
	      for (var in array) statement
	      delete array[index]
	      delete array
	      exit [ expression	]
	      {	statements }

   I/O Statements
       The input/output	statements are as follows:

       close(file)	     Close file	(or pipe, see below).

       getline		     Set $0 from next input record; set	NF, NR,	FNR.

       getline <file	     Set $0 from next record of	file; set NF.

       getline var	     Set var from next input record; set NR, FNR.

       getline var <file     Set var from next record of file.

       next		     Stop processing the current  input	 record.   The
			     next  input  record is read and processing	starts
			     over with the first pattern in the	 AWK  program.
			     If	 the end of the	input data is reached, the END
			     block(s), if any, are executed.

       nextfile		     Stop processing the current input file.  The next
			     input record read comes from the next input file.
			     FILENAME and ARGIND are updated, FNR is reset  to
			     1,	and processing starts over with	the first pat-
			     tern in the AWK program.  If the end of the input
			     data  is  reached,	 the END block(s), if any, are
			     executed.	NOTE: Earlier versions	of  gawk  used
			     next  file,  as  two  words.  While this usage is
			     still recognized, it generates a warning  message
			     and will eventually be removed.

       print		     Prints  the current record.  The output record is
			     terminated	with the value of the ORS variable.

       print expr-list	     Prints expressions.  Each expression is separated
			     by	 the  value  of	 the OFS variable.  The	output
			     record is terminated with the value  of  the  ORS

       print expr-list >file Prints  expressions  on file.  Each expression is
			     separated by the value of the OFS variable.   The
			     output record is terminated with the value	of the
			     ORS variable.

       printf fmt, expr-list Format and	print.

       printf fmt, expr-list >file
			     Format and	print on file.

       system(cmd-line)	     Execute the command cmd-line, and return the exit
			     status.   (This may not be	available on non-POSIX

       fflush([file])	     Flush any buffers associated with the open	output
			     file  or  pipe  file.   If	 file is missing, then
			     standard output is	flushed.  If file is the  null
			     string, then all open output files	and pipes have
			     their buffers flushed.

       Other input/output  redirections	 are  also  allowed.   For  print  and
       printf, >> file appends output to the file, while | command writes on a
       pipe.  In a similar fashion, command | getline pipes into getline.  The
       getline command will return 0 on	end of file, and -1 on an error.

       NOTE:  If  using	 a  pipe  to getline, or from print or printf within a
       loop, you must use close() to create new	instances of the command.  AWK
       does not	automatically close pipes when they return EOF.

   The printf Statement
       The  AWK	 versions  of the printf statement and sprintf() function (see
       below) accept the following conversion specification formats:

       %c     An ASCII character.  If the argument used	for %c is numeric,  it
	      is  treated as a character and printed.  Otherwise, the argument
	      is assumed to be a string, and the only first character of  that
	      string is	printed.

       %i     A	decimal	number (the integer part).

       %E     A	 floating point	number of the form [-]d.dddddde[+-]dd.	The %E
	      format uses E instead of e.

       %f     A	floating point number of the form [-]ddd.dddddd.

       %G     Use %e or	%f conversion, whichever is shorter, with  nonsignifi-
	      cant zeros suppressed.  The %G format uses %E instead of %e.

       %o     An unsigned octal	number (also an	integer).

       %u     An unsigned decimal number (again, an integer).

       %s     A	character string.

       %X     An unsigned hexadecimal number (an integer).  The	%X format uses
	      ABCDEF instead of	abcdef.

       %%     A	single % character; no argument	is converted.

       There are optional, additional parameters that may lie  between	the  %
       and the control letter:

       -      The expression should be left-justified within its field.

       space  For  numeric  conversions,  prefix positive values with a	space,
	      and negative values with a minus sign.

       +      The plus sign, used before the width modifier (see below),  says
	      to  always  supply  a  sign for numeric conversions, even	if the
	      data to be formatted is positive.	 The  +	 overrides  the	 space

       #      Use  an  "alternate  form" for certain control letters.  For %o,
	      supply a leading zero.  For %x, and %X, supply a leading	0x  or
	      0X  for  a  nonzero result.  For %e, %E, and %f, the result will
	      always contain a decimal point.  For %g, and %G, trailing	 zeros
	      are not removed from the result.

       0      A	 leading 0 (zero) acts as a flag, that indicates output	should
	      be padded	with zeroes instead of spaces.	This applies  even  to
	      non-numeric  output  formats.  This flag only has	an effect when
	      the field	width is wider than the	value to be printed.

       width  The field	should be padded to this width.	 The field is normally
	      padded  with  spaces.  If	the 0 flag has been used, it is	padded
	      with zeroes.

       .prec  A	number that specifies the precision to use when	printing.  For
	      the  %e, %E, and %f formats, this	specifies the number of	digits
	      you want printed to the right of the decimal point.  For the %g,
	      and  %G  formats,	it specifies the maximum number	of significant
	      digits.  For the %d, %o, %i, %u, %x, and %X formats,  it	speci-
	      fies  the	 minimum  number of digits to print.  For a string, it
	      specifies	the maximum number of characters from the string  that
	      should be	printed.

       The dynamic width and prec capabilities of the ANSI C printf() routines
       are supported.  A * in place of either the width	or prec	specifications
       will cause their	values to be taken from	the argument list to printf or

   Special File	Names
       When doing I/O redirection from either print or printf into a file,  or
       via  getline from a file, gawk recognizes certain special filenames in-
       ternally.  These	filenames allow	access to open file descriptors	inher-
       ited  from  gawk's  parent  process (usually the	shell).	 Other special
       filenames provide access	to information about the running gawk process.
       The filenames are:

       /dev/pid	   Reading  this  file	returns	 the process ID	of the current
		   process, in decimal,	terminated with	a newline.

       /dev/ppid   Reading this	file returns the parent	process	ID of the cur-
		   rent	process, in decimal, terminated	with a newline.

       /dev/pgrpid Reading  this file returns the process group	ID of the cur-
		   rent	process, in decimal, terminated	with a newline.

       /dev/user   Reading this	file returns a single record terminated	with a
		   newline.   The fields are separated with spaces.  $1	is the
		   value of the	getuid(2) system call, $2 is the value of  the
		   geteuid(2)  system  call,  $3 is the	value of the getgid(2)
		   system call,	and $4 is the value of the  getegid(2)	system
		   call.   If  there  are  any additional fields, they are the
		   group IDs returned by getgroups(2).	 Multiple  groups  may
		   not be supported on all systems.

       /dev/stdin  The standard	input.

       /dev/stdout The standard	output.

       /dev/stderr The standard	error output.

       /dev/fd/n   The file associated with the	open file descriptor n.

       These are particularly useful for error messages.  For example:

	      print "You blew it!" > "/dev/stderr"

       whereas you would otherwise have	to use

	      print "You blew it!" | "cat 1>&2"

       These  file  names  may	also  be used on the command line to name data

   Numeric Functions
       AWK has the following pre-defined arithmetic functions:

       atan2(y,	x)   returns the arctangent of y/x in radians.

       cos(expr)     returns the cosine	of expr, which is in radians.

       exp(expr)     the exponential function.

       int(expr)     truncates to integer.

       log(expr)     the natural logarithm function.

       rand()	     returns a random number between 0 and 1.

       sin(expr)     returns the sine of expr, which is	in radians.

       sqrt(expr)    the square	root function.

       srand([expr]) uses expr as a new	seed for the random number  generator.
		     If	 no  expr  is  provided, the time of day will be used.
		     The return	value is the previous seed for the random num-
		     ber generator.

   String Functions
       Gawk has	the following pre-defined string functions:

       gensub(r, s, h [, t])   search  the  target string t for	matches	of the
			       regular expression r.  If h is a	string	begin-
			       ning with g or G, then replace all matches of r
			       with s.	Otherwise, h is	 a  number  indicating
			       which  match  of	r to replace.  If no t is sup-
			       plied, $0 is used instead.  Within the replace-
			       ment  text  s,  the  sequence  \n, where	n is a
			       digit from 1 to 9, may be used to indicate just
			       the  text  that	matched	the n'th parenthesized
			       subexpression.  The sequence \0 represents  the
			       entire  matched	text, as does the character &.
			       Unlike sub() and	gsub(),	the modified string is
			       returned	as the result of the function, and the
			       original	target string is not changed.

       gsub(r, s [, t])	       for each	substring matching the regular expres-
			       sion  r	in the string t, substitute the	string
			       s, and return the number	of substitutions.   If
			       t  is  not  supplied,  use $0.  An & in the re-
			       placement text is replaced with the  text  that
			       was  actually matched.  Use \& to get a literal
			       &.  See Effective AWK Programming for a	fuller
			       discussion of the rules for &'s and backslashes
			       in the replacement text of sub(),  gsub(),  and

       index(s,	t)	       returns the index of the	string t in the	string
			       s, or 0 if t is not present.

       length([s])	       returns the length of  the  string  s,  or  the
			       length of $0 if s is not	supplied.

       match(s,	r)	       returns the position in s where the regular ex-
			       pression	r occurs, or 0 if r  is	 not  present,
			       and sets	the values of RSTART and RLENGTH.

       split(s,	a [, r])       splits  the  string  s  into the	array a	on the
			       regular expression r, and returns the number of
			       fields.	 If  r is omitted, FS is used instead.
			       The array a is cleared  first.	Splitting  be-
			       haves identically to field splitting, described

       sprintf(fmt, expr-list) prints expr-list	according to fmt, and  returns
			       the resulting string.

       sub(r, s	[, t])	       just  like  gsub(), but only the	first matching
			       substring is replaced.

       substr(s, i [, n])      returns the at most n-character substring of  s
			       starting	 at i.	If n is	omitted, the rest of s
			       is used.

       tolower(str)	       returns a copy of the string str, with all  the
			       upper-case  characters  in  str	translated  to
			       their  corresponding  lower-case	 counterparts.
			       Non-alphabetic characters are left unchanged.

       toupper(str)	       returns	a copy of the string str, with all the
			       lower-case  characters  in  str	translated  to
			       their  corresponding  upper-case	 counterparts.
			       Non-alphabetic characters are left unchanged.

   Time	Functions
       Since one of the	primary	uses of	AWK programs is	processing  log	 files
       that  contain  time  stamp information, gawk provides the following two
       functions for obtaining time stamps and formatting them.

       systime() returns the current time of day  as  the  number  of  seconds
		 since	the Epoch (Midnight UTC, January 1, 1970 on POSIX sys-

       strftime([format	[, timestamp]])
		 formats timestamp according to	the specification  in  format.
		 The  timestamp	should be of the same form as returned by sys-
		 time().  If timestamp is missing, the current time of day  is
		 used.	 If  format is missing,	a default format equivalent to
		 the output of date(1) will be used.   See  the	 specification
		 for  the strftime() function in ANSI C	for the	format conver-
		 sions that are	guaranteed to be available.   A	 public-domain
		 version  of strftime(3) and a man page	for it come with gawk;
		 if that version was used to build gawk, then all of the  con-
		 versions described in that man	page are available to gawk.

   String Constants
       String  constants  in  AWK are sequences	of characters enclosed between
       double quotes (").  Within strings, certain escape sequences are	recog-
       nized, as in C.	These are:

       \\   A literal backslash.

       \a   The	"alert"	character; usually the ASCII BEL character.

       \b   backspace.

       \f   form-feed.

       \n   newline.

       \r   carriage return.

       \t   horizontal tab.

       \v   vertical tab.

       \xhex digits
	    The	character represented by the string of hexadecimal digits fol-
	    lowing the \x.  As in ANSI C, all following	hexadecimal digits are
	    considered part of the escape sequence.  (This feature should tell
	    us something about language	design by committee.)  E.g., "\x1B" is
	    the	ASCII ESC (escape) character.

       \ddd The	 character  represented	 by the	1-, 2-,	or 3-digit sequence of
	    octal digits.  E.g., "\033"	is the ASCII ESC (escape) character.

       \c   The	literal	character c.

       The escape sequences may	also be	used inside constant  regular  expres-
       sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).

       In compatibility	mode, the characters represented by octal and hexadec-
       imal escape sequences are treated literally when	used  in  regexp  con-
       stants.	Thus, /a\52b/ is equivalent to /a\*b/.

       Functions in AWK	are defined as follows:

	      function name(parameter list) { statements }

       Functions  are executed when they are called from within	expressions in
       either patterns or actions.  Actual parameters supplied in the function
       call  are  used	to  instantiate	 the formal parameters declared	in the
       function.  Arrays are passed by reference, other	variables  are	passed
       by value.

       Since  functions	were not originally part of the	AWK language, the pro-
       vision for local	variables is rather clumsy: They are declared as extra
       parameters  in the parameter list.  The convention is to	separate local
       variables from real parameters by extra spaces in the  parameter	 list.
       For example:

	      function	f(p, q,	    a, b)   # a	& b are	local

	      /abc/	{ ... ;	f(1, 2)	; ... }

       The left	parenthesis in a function call is required to immediately fol-
       low the function	name, without any intervening white space.  This is to
       avoid  a	syntactic ambiguity with the concatenation operator.  This re-
       striction does not apply	to the built-in	functions listed above.

       Functions may call each other and may be	recursive.   Function  parame-
       ters used as local variables are	initialized to the null	string and the
       number zero upon	function invocation.

       Use return expr to return a value from a	function.  The return value is
       undefined if no value is	provided, or if	the function returns by	"fall-
       ing off"	the end.

       If --lint has been provided, gawk will warn about  calls	 to  undefined
       functions  at parse time, instead of at run time.  Calling an undefined
       function	at run time is a fatal error.

       The word	func may be used in place of function.

       Print and sort the login	names of all users:

	    BEGIN     {	FS = ":" }
		 { print $1 | "sort" }

       Count lines in a	file:

		 { nlines++ }
	    END	 { print nlines	}

       Precede each line by its	number in the file:

	    { print FNR, $0 }

       Concatenate and line number (a variation	on a theme):

	    { print NR,	$0 }

       egrep(1), getpid(2),  getppid(2),  getpgrp(2),  getuid(2),  geteuid(2),
       getgid(2), getegid(2), getgroups(2)

       The  AWK	Programming Language, Alfred V.	Aho, Brian W. Kernighan, Peter
       J. Weinberger, Addison-Wesley, 1988.  ISBN 0-201-07981-X.

       Effective AWK Programming, Edition 1.0, published by the	Free  Software
       Foundation, 1995.

       A  primary  goal	 for gawk is compatibility with	the POSIX standard, as
       well as with the	latest version of UNIX awk.  To	this end, gawk	incor-
       porates	the following user visible features which are not described in
       the AWK book, but are part of the Bell Labs version of awk, and are  in
       the POSIX standard.

       The  -v	option for assigning variables before program execution	starts
       is new.	The book indicates that	command	line variable assignment  hap-
       pens when awk would otherwise open the argument as a file, which	is af-
       ter the BEGIN block is executed.	 However, in earlier  implementations,
       when  such an assignment	appeared before	any file names,	the assignment
       would happen before the BEGIN block was run.  Applications came to  de-
       pend  on	 this "feature."  When awk was changed to match	its documenta-
       tion, this option was added to accommodate applications	that  depended
       upon  the old behavior.	(This feature was agreed upon by both the AT&T
       and GNU developers.)

       The -W option for implementation	specific features is  from  the	 POSIX

       When  processing	arguments, gawk	uses the special option	"--" to	signal
       the end of arguments.  In compatibility mode, it	will warn  about,  but
       otherwise  ignore,  undefined options.  In normal operation, such argu-
       ments are passed	on to the AWK program for it to	process.

       The AWK book does not define the	return value of	 srand().   The	 POSIX
       standard	has it return the seed it was using, to	allow keeping track of
       random number sequences.	 Therefore srand() in gawk  also  returns  its
       current seed.

       Other  new features are:	The use	of multiple -f options (from MKS awk);
       the ENVIRON array; the \a, and \v escape	sequences (done	originally  in
       gawk  and  fed  back into AT&T's); the tolower()	and toupper() built-in
       functions (from AT&T); and the  ANSI  C	conversion  specifications  in
       printf (done first in AT&T's version).

       Gawk  has  a  number of extensions to POSIX awk.	 They are described in
       this section.  All the extensions described here	can be disabled	by in-
       voking gawk with	the --traditional option.

       The following features of gawk are not available	in POSIX awk.

	      o	The \x escape sequence.	 (Disabled with	--posix.)

	      o	The fflush() function.	(Disabled with --posix.)

	      o	The systime(), strftime(), and gensub()	functions.

	      o	The  special  file names available for I/O redirection are not

	      o	The ARGIND, ERRNO, and RT variables are	not special.

	      o	The IGNORECASE variable	and its	side-effects  are  not	avail-

	      o	The FIELDWIDTHS	variable and fixed-width field splitting.

	      o	The use	of RS as a regular expression.

	      o	The  ability to	split out individual characters	using the null
		string as the value of	FS,  and  as  the  third  argument  to

	      o	No path	search is performed for	files named via	the -f option.
		Therefore the AWKPATH environment variable is not special.

	      o	The use	of nextfile to abandon processing of the current input

	      o	The  use  of  delete array to delete the entire	contents of an

       The AWK book does not define the	return value of	the close()  function.
       Gawk's  close()	returns	 the  value from fclose(3), or pclose(3), when
       closing a file or pipe, respectively.

       When gawk is invoked with the --traditional option, if the fs  argument
       to  the	-F  option  is	"t", then FS will be set to the	tab character.
       Note that typing	gawk -F\t ...  simply causes the shell	to  quote  the
       "t,",  and does not pass	"\t" to	the -F option.	Since this is a	rather
       ugly special case, it is	not the	default	behavior.  This	behavior  also
       does  not  occur	 if  --posix  has been specified.  To really get a tab
       character as the	field separator, it is best to use quotes: gawk	-F'\t'

       There are two features of historical AWK	implementations	that gawk sup-
       ports.  First, it is possible to	call the  length()  built-in  function
       not only	with no	argument, but even without parentheses!	 Thus,

	      a	= length     # Holy Algol 60, Batman!

       is the same as either of

	      a	= length()
	      a	= length($0)

       This  feature is	marked as "deprecated" in the POSIX standard, and gawk
       will issue a warning about its use if --lint is specified on  the  com-
       mand line.

       The other feature is the	use of either the continue or the break	state-
       ments outside the body of a while, for, or do  loop.   Traditional  AWK
       implementations	have  treated  such  usage  as	equivalent to the next
       statement.  Gawk	will support this  usage  if  --traditional  has  been

       If POSIXLY_CORRECT exists in the	environment, then gawk behaves exactly
       as if --posix had been specified	on the command line.   If  --lint  has
       been specified, gawk will issue a warning message to this effect.

       The  AWKPATH  environment variable can be used to provide a list	of di-
       rectories that gawk will	search when looking for	files named via	the -f
       and --file options.

       The  -F option is not necessary given the command line variable assign-
       ment feature; it	remains	only for backwards compatibility.

       If your system actually has support  for	 /dev/fd  and  the  associated
       /dev/stdin,  /dev/stdout,  and /dev/stderr files, you may get different
       output from gawk	than you would get on a	system	without	 those	files.
       When  gawk interprets these files internally, it	synchronizes output to
       the standard output with	output to /dev/stdout, while on	a system  with
       those  files,  the  output is actually to different open	files.	Caveat

       Syntactically invalid single character programs tend  to	 overflow  the
       parse  stack, generating	a rather unhelpful message.  Such programs are
       surprisingly difficult to diagnose in the completely general case,  and
       the effort to do	so really is not worth it.

       This man	page documents gawk, version 3.0.6.

       The original version of UNIX awk	was designed and implemented by	Alfred
       Aho, Peter Weinberger, and Brian	Kernighan of AT&T  Bell	 Labs.	 Brian
       Kernighan continues to maintain and enhance it.

       Paul  Rubin  and	 Jay  Fenlason,	of the Free Software Foundation, wrote
       gawk, to	be compatible with the original	version	of awk distributed  in
       Seventh	Edition	 UNIX.	 John Woods contributed	a number of bug	fixes.
       David Trueman, with contributions from Arnold Robbins, made  gawk  com-
       patible	with  the new version of UNIX awk.  Arnold Robbins is the cur-
       rent maintainer.

       The initial DOS port was	done  by  Conrad  Kwok	and  Scott  Garfinkle.
       Scott Deifik is the current DOS maintainer.  Pat	Rankin did the port to
       VMS, and	Michal Jaegermann did the port to the Atari ST.	 The  port  to
       OS/2  was done by Kai Uwe Rommel, with contributions and	help from Dar-
       rel Hankerson.  Fred Fish supplied support for the Amiga.

       If you find a  bug  in  gawk,  please  send  electronic	mail  to  bug-   Please  include your operating system and its revision,
       the version of gawk, what C compiler you	used to	compile	it, and	a test
       program	and  data  that	 are  as small as possible for reproducing the

       Before sending a	bug report, please do two things.  First, verify  that
       you  have  the latest version of	gawk.  Many bugs (usually subtle ones)
       are fixed at each release, and if yours is out of date, the problem may
       already	have  been  solved.  Second, please read this man page and the
       reference manual	carefully to be	sure that what you think is a bug  re-
       ally is,	instead	of just	a quirk	in the language.

       Whatever	 you do, do NOT	post a bug report in comp.lang.awk.  While the
       gawk developers occasionally read this newsgroup, posting  bug  reports
       there  is  an  unreliable  way to report	bugs.  Instead,	please use the
       electronic mail addresses given above.

       Brian Kernighan of Bell Labs provided valuable assistance during	 test-
       ing and debugging.  We thank him.

       Copyright (C) 1996-2000 Free Software Foundation, Inc.

       Permission  is  granted	to make	and distribute verbatim	copies of this
       manual page provided the	copyright notice and  this  permission	notice
       are preserved on	all copies.

       Permission  is granted to copy and distribute modified versions of this
       manual page under the conditions	for verbatim  copying,	provided  that
       the  entire  resulting derived work is distributed under	the terms of a
       permission notice identical to this one.

       Permission is granted to	copy and distribute translations of this  man-
       ual page	into another language, under the above conditions for modified
       versions, except	that this permission notice may	be stated in a	trans-
       lation approved by the Foundation.

Free Software Foundation	  May 17 2000			       GAWK(1)


Want to link to this manual page? Use this URL:

home | help