Skip site navigation (1)Skip section navigation (2)

FreeBSD Man Pages

Man Page or Keyword Search:
Man Architecture
Apropos Keyword Search (all sections) Output format
home | help
nawk(1)				 User Commands			       nawk(1)

NAME
       nawk - pattern scanning and processing language

SYNOPSIS
       /usr/bin/nawk   [-F ERE]	 [-v assignment]  'program'  |	-f progfile...
       [argument...]

       /usr/xpg4/bin/awk  [-F ERE]  [-v	assignment...]	'program'  |  -f prog-
       file... [argument...]

DESCRIPTION
       The  /usr/bin/nawk  and	/usr/xpg4/bin/awk  utilities  execute programs
       written in the nawk programming language, which is specialized for tex-
       tual  data  manipulation.  A nawk program is a sequence of patterns and
       corresponding actions. The string specifying program must  be  enclosed
       in  single  quotes  (') to protect it from interpretation by the	shell.
       The sequence of pattern - action	statements can	be  specified  in  the
       command line as program or in one, or more, file(s) specified by	the -f
       progfile	option.	 When input is read that matches a pattern, the	action
       associated with the pattern is performed.

       Input  is interpreted as	a sequence of records. By default, a record is
       a line, but this	can be changed by  using  the  RS  built-in  variable.
       Each  record  of	 input	is matched to each pattern in the program. For
       each pattern matched, the associated action is executed.

       The nawk	utility	interprets each	input record as	a sequence  of	fields
       where,  by  default, a field is a string	of non-blank characters.  This
       default white-space field delimiter (blanks and/or tabs)	can be changed
       by  using the FS	built-in variable or the -F ERE	option.	The nawk util-
       ity denotes the first field in a	record	$1,  the  second  $2,  and  so
       forth.  The  symbol  $0	refers to the entire record; setting any other
       field causes the	reevaluation of	$0. Assigning to $0 resets the	values
       of all fields and the NF	built-in variable.

OPTIONS
       The following options are supported:

       -F ERE
	     Define  the  input	 field	separator  to  be the extended regular
	     expression	ERE, before any	input is read (can be a	character).

       -f progfile
	     Specifies the pathname of the file	 progfile  containing  a  nawk
	     program.  If multiple instances of	this option are	specified, the
	     concatenation of the files	specified as  progfile	in  the	 order
	     specified is the nawk program.
	      The  nawk	 program can alternatively be specified	in the command
	     line as a single argument.

       -v assignment
	     The assignment argument must be in	the same form as an assignment
	     operand.  The  assignment	is of the form var=value, where	var is
	     the name of one of	the variables described	below.	The  specified
	     assignment	 occurs	 before	 executing the nawk program, including
	     the actions associated with BEGIN patterns	 (if  any).   Multiple
	     occurrences of this option	can be specified.

OPERANDS
       The following operands are supported:

       program
	     If	 no  -f	 option	is specified, the first	operand	to nawk	is the
	     text of the nawk program.	The application	supplies  the  program
	     operand as	a single argument to nawk. If the text does not	end in
	     a newline character, nawk interprets the text as if it did.

       argument
	     Either of the following two types of argument can be intermixed:

	     file  A pathname of a file	that contains the input	 to  be	 read,
		   which  is  matched  against the set of patterns in the pro-
		   gram.  If no	file operands are specified, or	if a file  op-
		   erand is -, the standard input is used.

	     assignment
		   An  operand	that  begins  with an underscore or alphabetic
		   character from the portable character set,  followed	 by  a
		   sequence  of	 underscores,  digits and alphabetics from the
		   portable character set, followed by the = character	speci-
		   fies	 a  variable  assignment  rather  than a pathname. The
		   characters before the = represent the name of a nawk	 vari-
		   able;  if that name is a nawk reserved word the behavior is
		   undefined. The  characters  following  the  equal  sign  is
		   interpreted	as  if	they appeared in the nawk program pre-
		   ceded and followed by a double-quote	(")  character,	 as  a
		   STRING  token  ,  except  that  if the last character is an
		   unescaped backslash,	it is interpreted as a	literal	 back-
		   slash  rather  than	as the first character of the sequence
		   "\".	The variable is	assigned  the  value  of  that	STRING
		   token.  If  the  value  is  considered a numericstring, the
		   variable is assigned	its numeric value. Each	such  variable
		   assignment  is  performed just before the processing	of the
		   following file, if any.  Thus,  an  assignment  before  the
		   first file argument is executed after the BEGIN actions (if
		   any), while an assignment after the last file  argument  is
		   executed  before the	END actions (if	any).  If there	are no
		   file	arguments, assignments are executed before  processing
		   the standard	input.

INPUT FILES
       Input files to the nawk program from any	of the following sources:

	  o  any file operands or their	equivalents, achieved by modifying the
	     nawk variables ARGV and ARGC

	  o  standard input in the absence of any file operands

	  o  arguments to the getline function

       must be text files. Whether the variable	RS is set  to  a  value	 other
       than  a newline character or not, for these files, implementations sup-
       port records terminated with the	specified separator up	to  {LINE_MAX}
       bytes and may support longer records.

       If  -f  progfile	 is specified, the files named by each of the progfile
       option-arguments	must be	text files containing an nawk program.

       The standard input are used only	if no file operands are	specified,  or
       if a file operand is -.

EXTENDED DESCRIPTION
       A nawk program is composed of pairs of the form:

	      pattern {	action }

       Either the pattern or the action	(including the enclosing brace charac-
       ters) can be omitted.  Pattern-action statements	 are  separated	 by  a
       semicolon or by a newline.

       A  missing pattern matches any record of	input, and a missing action is
       equivalent to an	action that writes the	matched	 record	 of  input  to
       standard	output.

       Execution  of  the  nawk	 program starts	by first executing the actions
       associated with all BEGIN patterns in the order they occur in the  pro-
       gram.  Then each	file operand (or standard input	if no files were spec-
       ified) is processed by reading data from	the file until a record	 sepa-
       rator  is  seen (a newline character by default), splitting the current
       record into fields using	the current value of FS	, evaluating each pat-
       tern  in	 the  program  in  the	order of occurrence, and executing the
       action associated with each pattern that	matches	 the  current  record.
       The  action for a matching pattern is executed before evaluating	subse-
       quent patterns.	Last, the actions associated with all END patterns  is
       executed	in the order they occur	in the program.

   Expressions in nawk
       Expressions  describe computations used in patterns and actions.	In the
       following table,	valid expression operations are	given in  groups  from
       highest	precedence  first to lowest precedence last, with equal-prece-
       dence operators grouped between horizontal lines. In expression evalua-
       tion, where the grammar is formally ambiguous, higher precedence	opera-
       tors are	evaluated before lower precedence operators.   In  this	 table
       expr,  expr1,  expr2,  and expr3	represent any expression, while	lvalue
       represents any entity that can be assigned to (that  is,	 on  the  left
       side of an assignment operator).

       Syntax		 Name			    Type of Result     Associativity
       ( expr )		 Grouping		    type of expr	n/a
       $expr		 Field reference	    string	       n/a
       ++ lvalue	 Pre-increment		    numeric	       n/a
	--lvalue	 Pre-decrement		    numeric	       n/a
       lvalue ++	 Post-increment		    numeric	       n/a
       lvalue --	 Post-decrement		    numeric	       n/a
       expr ^
       expr		 Exponentiation		    numeric	       right
       ! expr		 Logical not		    numeric	       n/a
       + expr		 Unary plus		    numeric	       n/a
       - expr		 Unary minus		    numeric	       n/a
	expr * expr	 Multiplication		    numeric	       left
       expr / expr	 Division		    numeric	       left
       expr % expr	 Modulus		    numeric	       left
       expr + expr	 Addition		    numeric	       left
       expr -
       expr		 Subtraction		    numeric	       left
       expr expr	 String	concatenation	    string	       left
       expr < expr	 Less than		    numeric	       none
       expr <= expr	 Less than or equal to	    numeric	       none
       expr != expr	 Not equal to		    numeric	       none
       expr  ==	expr	 Equal to		    numeric	       none
       expr > expr	 Greater than		    numeric	       none
       expr >= expr	 Greater than or equal to   numeric	       none
       expr ~ expr	 ERE match		    numeric	       none
       expr !~ expr	 ERE non-match		     numeric	       none
       expr in array	 Array membership	    numeric	       left
       ( index ) in	 Multi-dimension array	    numeric	       left
	   array	     membership
       expr &&
       expr		 Logical AND		    numeric	       left
       expr ||
       expr		 Logical OR		    numeric	       left
       expr1 ?
       expr2		 Conditional expression	    type of selected   right
	   : expr3				       expr2 or

       expr3
       lvalue ^=
       expr		 Exponentiation		    numeric	       right
			 assignment
       lvalue %= expr	 Modulus assignment	    numeric	       right
       lvalue *= expr	 Multiplication		    numeric	       right
			 assignment
       lvalue /= expr	 Division assignment	    numeric	       right
       lvalue +=  expr	 Addition assignment	    numeric	       right
       lvalue -=
       expr		 Subtraction assignment	    numeric	       right
       lvalue =
       expr		 Assignment		    type of expr       right

       Each  expression	 has  either  a	string value, a	numeric	value or both.
       Except as stated	for specific contexts, the value of an	expression  is
       implicitly  converted to	the type needed	for the	context	in which it is
       used.  A	string value is	converted to a numeric value by	the equivalent
       of the following	calls:

	      setlocale(LC_NUMERIC, "");
	      numeric_value = atof(string_value);

       A  numeric  value  that	is exactly equal to the	value of an integer is
       converted to a string by	the equivalent of a call to the	sprintf	 func-
       tion with the string %d as the fmt argument and the numeric value being
       converted as the	first and only expr argument.  Any other numeric value
       is  converted  to  a  string by the equivalent of a call	to the sprintf
       function	with the value of the variable CONVFMT as the fmt argument and
       the  numeric value being	converted as the first and only	expr argument.

       A string	value is considered to be a numeric string  in	the  following
       case:

       1. Any leading and trailing blank characters is ignored.

       2. If the first unignored character is a	+ or -,	it is ignored.

       3. If  the remaining unignored characters would be lexically recognized
	  as a NUMBER token, the string	is considered a	numeric	string.

       If a - character	is ignored in the above	steps, the  numeric  value  of
       the  numeric  string is the negation of the numeric value of the	recog-
       nized NUMBER token. Otherwise the numeric value of the  numeric	string
       is  the	numeric	value of the recognized	NUMBER token. Whether or not a
       string is a numeric string is relevant only in contexts where that term
       is used in this section.

       When  an	 expression  is	used in	a Boolean context, if it has a numeric
       value, a	value of zero is treated as  false  and	 any  other  value  is
       treated	as  true.  Otherwise,  a  string  value	 of the	null string is
       treated as false	and any	other value is treated as true.	A Boolean con-
       text is one of the following:

	  o  the first subexpression of	a conditional expression.

	  o  an	expression operated on by logical NOT, logical AND, or logical
	     OR.

	  o  the second	expression of a	for statement.

	  o  the expression of an if statement.

	  o  the expression of the while clause	in either a while  or  do  ...
	     while statement.

	  o  an	 expression  used  as  a pattern (as in	Overall	Program	Struc-
	     ture).

       The nawk	language supplies arrays that are used for storing numbers  or
       strings.	 Arrays	 need  not  be declared. They are initially empty, and
       their sizes changes dynamically.	The  subscripts,  or  element  identi-
       fiers,  are  strings, providing a type of associative array capability.
       An array	name followed by a subscript within  square  brackets  can  be
       used  as	 an  lvalue and	as an expression, as described in the grammar.
       Unsubscripted array names are used in only the following	contexts:

	  o  a parameter in a function definition or function call.

	  o  the NAME token following any use of the keyword in.

       A valid array index consists of one  or	more  comma-separated  expres-
       sions, similar to the way in which multi-dimensional arrays are indexed
       in some programming languages.  Because nawk  arrays  are  really  one-
       dimensional,  such  a  comma-separated  list  is	 converted to a	single
       string by concatenating the string values of the	separate  expressions,
       each separated from the other by	the value of the SUBSEP	variable.

       Thus, the following two index operations	are equivalent:

       var[expr1, expr2, ... exprn]
       var[expr1 SUBSEP	expr2 SUBSEP ... SUBSEP	exprn]

       A  multi-dimensioned  index  used  with	the in operator	must be	put in
       parentheses. The	in operator, which tests for the existence of  a  par-
       ticular	array  element,	 does  not  create  the	element	if it does not
       exist.  Any other reference to a	non-existent array  element  automati-
       cally creates it.

   Variables and Special Variables
       Variables can be	used in	an nawk	program	by referencing them.  With the
       exception of function parameters, they  are  not	 explicitly  declared.
       Uninitialized  scalar  variables	and array elements have	both a numeric
       value of	zero and a string value	of the empty string.

       Field variables are designated by a $ followed by a number or numerical
       expression.  The	 effect	 of  the field number expression evaluating to
       anything	other than a non-negative integer is  unspecified;  uninitial-
       ized variables or string	values need not	be converted to	numeric	values
       in this context.	 New field variables are created by assigning a	 value
       to them.	 References to non-existent fields (that is, fields after $NF)
       produce the null	string.	 However, assigning to	a  non-existent	 field
       (for example, $(NF+2) = 5) increases the	value of NF, create any	inter-
       vening fields with the null string as their values and cause the	 value
       of $0 to	be recomputed, with the	fields being separated by the value of
       OFS. Each field variable	has a  string  value  when  created.   If  the
       string,	with  any  occurrence  of the decimal-point character from the
       current locale changed to a period character, is	considered  a  numeric
       string (see Expressions in nawk above), the field variable also has the
       numeric value of	the numeric string.

       nawk sets the following special variables:

       ARGC  The number	of elements in the ARGV	array.

       ARGV  An	array of command line arguments,  excluding  options  and  the
	     program argument, numbered	from zero to ARGC-1.

	     The  arguments  in	 ARGV can be modified or added to; ARGC	can be
	     altered.  As each input file ends,	nawk treats the	next  non-null
	     element of	ARGV, up to the	current	value of ARGC-1, inclusive, as
	     the name of the next input	file.  Setting an element of  ARGV  to
	     null  means  that it is not treated as an input file.  The	name -
	     indicates the standard input.  If an argument matches the	format
	     of	 an assignment operand,	this argument is treated as an assign-
	     ment rather than a	file argument.

   /usr/xpg4/bin/awk
       CONVFMT
	     The printf	format for converting numbers to strings  (except  for
	     output statements,	where OFMT is used); %.6g by default.

       ENVIRON
	     The  variable  ENVIRON  is	an array representing the value	of the
	     environment. The indices of the array are strings	consisting  of
	     the  names	 of  the  environment variables, and the value of each
	     array element is a	string consisting of the value of  that	 vari-
	     able.  If	the  value  of an environment variable is considered a
	     numeric string, the array element also has	its numeric value.

	     In	all cases where	nawk behavior is affected by environment vari-
	     ables  (including	the environment	of any commands	that nawk exe-
	     cutes via the system function or via pipeline  redirections  with
	     the  print	 statement, the	printf statement, or the getline func-
	     tion), the	environment used is the	environment at the  time  nawk
	     began executing.

       FILENAME
	     A	pathname  of the current input file. Inside a BEGIN action the
	     value is undefined. Inside	an END action the value	is the name of
	     the last input file processed.

       FNR   The  ordinal  number  of  the current record in the current file.
	     Inside a BEGIN action the value is	zero. Inside an	END action the
	     value is the number of the	last record processed in the last file
	     processed.

       FS    Input field separator regular expression; a  space	 character  by
	     default.

       NF    The  number  of  fields  in  the  current	record.	Inside a BEGIN
	     action, the use of	NF is  undefined  unless  a  getline  function
	     without  a	 var  argument	is  executed previously. Inside	an END
	     action, NF	retains	the value it had for  the  last	 record	 read,
	     unless  a	subsequent, redirected,	getline	function without a var
	     argument is performed prior to entering the END action.

       NR    The ordinal number	of the current record from the start of	input.
	     Inside a BEGIN action the value is	zero. Inside an	END action the
	     value is the number of the	last record processed.

       OFMT  The printf	format for converting numbers  to  strings  in	output
	     statements	 "%.6g"	 by  default.  The result of the conversion is
	     unspecified if the	value of OFMT is not a	floating-point	format
	     specification.

       OFS   The  print	statement output field separator; a space character by
	     default.

       ORS   The  print	 output	 record	 separator;  a	newline	 character  by
	     default.

       LENGTH
	     The length	of the string matched by the match function.

       RS    The first character of the	string value of	RS is the input	record
	     separator;	a newline character by default.	If  RS	contains  more
	     than  one	character, the results are unspecified.	If RS is null,
	     then records are separated	by sequences  of  one  or  more	 blank
	     lines:  leading  or  trailing  blank  lines  do not produce empty
	     records at	the beginning or end of	input, and the field separator
	     is	always newline,	no matter what the value of FS.

       RSTART
	     The  starting  position  of the string matched by the match func-
	     tion, numbering from 1.  This is always equivalent	to the	return
	     value of the match	function.

       SUBSEP
	     The  subscript separator string for multi-dimensional arrays; the
	     default value is 1

   Regular Expressions
       The nawk	utility	makes use of the extended regular expression  notation
       (see  regex(5)) except that it allows the use of	C-language conventions
       to escape special characters within the EREs, namely \\,	 \a,  \b,  \f,
       \n,  \r,	 \t,  \v,  and	those specified	in the following table.	 These
       escape sequences	are recognized both inside and outside bracket expres-
       sions.	Note  that records need	not be separated by newline characters
       and string constants can	contain	newline	characters,  so	 even  the  \n
       sequence	 is  valid  in	nawk EREs.  Using a slash character within the
       regular expression requires escaping as shown in	the table below:

       Escape Sequence	       Description		     Meaning
	     \"		 Backslash quotation-mark   Quotation-mark character
	     \/		 Backslash slash	    Slash character
	    \ddd	 A  backslash	character   The	character encoded by
			 followed  by the longest   the	one-, two- or three-
			 sequence of one, two, or   digit   octal   integer.
			 three	octal-digit char-   Multi-byte	  characters
			 acters	 (01234567).   If   require  multiple,	con-
			 all of	the digits are 0,   catenated	      escape
			 (that is, representation   sequences, including the
			 of  the NULL character),   leading \ for each byte.
			 the  behavior	is  unde-
			 fined.
	     \c		 A   backslash	character   Undefined
			 followed by any  charac-
			 ter   not  described  in
			 this  table  or  special
			 characters  (\\, \a, \b,
			 \f, \n, \r, \t, \v).

       A regular expression can	be matched against a specific field or	string
       by  using  one  of the two regular expression matching operators, ~ and
       !~. These operators interpret their right-hand  operand	as  a  regular
       expression  and	their  left-hand  operand  as a	string.	If the regular
       expression matches the string, the ~ expression evaluates to the	 value
       1,  and	the  !~	 expression  evaluates	to the value 0.	If the regular
       expression does not match the string, the ~ expression evaluates	to the
       value  0, and the !~ expression evaluates to the	value 1. If the	right-
       hand operand is any expression other than the lexical  token  ERE,  the
       string  value  of  the expression is interpreted	as an extended regular
       expression, including the escape	conventions described above. Note that
       these  same  escape conventions also are	applied	in the determining the
       value of	a string literal (the lexical token STRING), and is applied  a
       second time when	a string literal is used in this context.

       When an ERE token appears as an expression in any context other than as
       the right-hand of the ~ or !~ operator or as one	of the built-in	 func-
       tion  arguments	described below, the value of the resulting expression
       is the equivalent of:

       $0 ~ /ere/

       The ere argument	to the gsub, match, sub	functions, and the fs argument
       to the split function (see String Functions) is interpreted as extended
       regular expressions. These  can	be  either  ERE	 tokens	 or  arbitrary
       expressions,  and  are interpreted in the same manner as	the right-hand
       side of the ~ or	!~ operator.

       An extended regular expression can be used to separate fields by	 using
       the -F ERE option or by assigning a string containing the expression to
       the built-in variable FS. The default value of the  FS  variable	 is  a
       single space character. The following describes FS behavior:

       1. If FS	is a single character:

	     o	If  FS is the space character, skip leading and	trailing blank
		characters; fields are delimited by sets of one	or more	 blank
		characters.

	     o	Otherwise,  if	FS is any other	character c, fields are	delim-
		ited by	each single occurrence of c.

       2. Otherwise, the string	value of FS is considered to  be  an  extended
	  regular  expression.	Each  occurrence  of  a	 sequence matching the
	  extended regular expression delimits fields.

       Except in the gsub, match, split, and sub built-in  functions,  regular
       expression  matching is based on	input records; that is,	record separa-
       tor characters (the first character of the value	of the variable	RS,  a
       newline character by default) cannot be embedded	in the expression, and
       no expression matches the record	separator character.   If  the	record
       separator  is  not  a newline character,	newline	characters embedded in
       the expression can be matched.  In those	four built-in functions, regu-
       lar  expression	matching  are based on text strings. So, any character
       (including the newline character	 and  the  record  separator)  can  be
       embedded	in the pattern and an appropriate pattern will match any char-
       acter.  However,	in all nawk regular expression matching,  the  use  of
       one  or more NUL	characters in the pattern, input record	or text	string
       produces	undefined results.

   Patterns
       A pattern is any	valid expression, a range specified by two expressions
       separated by comma, or one of the two special patterns BEGIN or END.

   Special Patterns
       The  nawk  utility recognizes two special patterns, BEGIN and END. Each
       BEGIN pattern is	matched	once and its associated	action executed	before
       the  first  record of input is read (except possibly by use of the get-
       line function in	a prior	BEGIN action) and before command line  assign-
       ment  is	 done.	Each  END  pattern  is matched once and	its associated
       action executed after the last record of	input has been read. These two
       patterns	have associated	actions.

       BEGIN  and  END do not combine with other patterns.  Multiple BEGIN and
       END patterns are	allowed. The actions associated	with  the  BEGIN  pat-
       terns  are  executed  in	the order specified in the program, as are the
       END actions. An END pattern can precede a BEGIN pattern in a program.

       If an nawk program consists of only actions with	the pattern BEGIN, and
       the BEGIN action	contains no getline function, nawk exits without read-
       ing its input when the last statement in	the last BEGIN action is  exe-
       cuted.	If  an	nawk program consists of only actions with the pattern
       END or only actions with	the patterns BEGIN and END, the	input is  read
       before the statements in	the END	actions	are executed.

   Expression Patterns
       An  expression  pattern	is  evaluated as if it were an expression in a
       Boolean context.	 If the	result is true,	the pattern is	considered  to
       match,  and  the	associated action (if any) is executed.	 If the	result
       is false, the action is not executed.

   Pattern Ranges
       A pattern range consists	of two expressions separated by	a  comma.   In
       this  case,  the	action is performed for	all records between a match of
       the first expression and	the following match of the second  expression,
       inclusive.   At	this point, the	pattern	range can be repeated starting
       at input	records	subsequent to the end of the matched range.

   Actions
       An action is a sequence of statements. A	statement may be  one  of  the
       following:

	      if ( expression )	statement [ else statement ]
	      while ( expression ) statement
	      do statement while ( expression )
	      for ( expression ; expression ; expression ) statement
	      for ( var	in array ) statement
	      delete array[subscript] #delete an array element
	      break
	      continue
	      {	[ statement ] ... }
	      expression	# commonly variable = expression
	      print [ expression-list ]	[ >expression ]
	      printf format [ ,expression-list ] [ >expression ]
	      next		# skip remaining patterns on this input	line
	      exit [expr] # skip the rest of the input;	exit status is expr
	      return [expr]

       Any  single  statement  can be replaced by a statement list enclosed in
       braces.	The statements are terminated by newline characters  or	 semi-
       colons, and are executed	sequentially in	the order that they appear.

       The  next  statement causes all further processing of the current input
       record to be abandoned. The behavior is undefined if a  next  statement
       appears or is invoked in	a BEGIN	or END action.

       The  exit  statement invokes all	END actions in the order in which they
       occur in	the program source and	then  terminate	 the  program  without
       reading	further	 input.	 An exit statement inside an END action	termi-
       nates the program without further execution  of	END  actions.	If  an
       expression  is specified	in an exit statement, its numeric value	is the
       exit status of nawk, unless subsequent errors are encountered or	a sub-
       sequent exit statement with an expression is executed.

   Output Statements
       Both  print  and	printf statements write	to standard output by default.
       The output is written to	the location specified	by  output_redirection
       if one is supplied, as follows:

       > expression
       >> expression
       | expression

       In  all	cases, the expression is evaluated to produce a	string that is
       used as a full pathname to write	into (for > or >>) or as a command  to
       be  executed  (for  |).	Using the first	two forms, if the file of that
       name is not currently open, it is opened, creating it if	necessary  and
       using the first form, truncating	the file.  The output then is appended
       to the file.  As	long as	the file remains  open,	 subsequent  calls  in
       which expression	evaluates to the same string value simply appends out-
       put to the file.	The file remains open until the	close function,	 which
       is called with an expression that evaluates to the same string value.

       The third form writes output onto a stream piped	to the input of	a com-
       mand.  The stream is created if no stream is currently  open  with  the
       value of	expression as its command name.	 The stream created is equiva-
       lent to one created by a	call to	the popen(3C) function with the	 value
       of  expression  as  the	command	 argument and a	value of w as the mode
       argument.  As long as the stream	 remains  open,	 subsequent  calls  in
       which  expression  evaluates  to	the same string	value writes output to
       the existing stream. The	stream will remain open	until the close	 func-
       tion  is	 called	 with  an expression that evaluates to the same	string
       value.  At that time, the stream	is closed as  if  by  a	 call  to  the
       pclose function.

       These  output  statements  take	a comma-separated list of expression s
       referred	 in  the  grammar  by  the  non-terminal  symbols   expr_list,
       print_expr_list	or  print_expr_list_opt. This list is referred to here
       as the expression list, and each	member is referred to as an expression
       argument.

       The  print  statement writes the	value of each expression argument onto
       the indicated output stream separated by	the current output field sepa-
       rator  (see  variable  OFS  above), and terminated by the output	record
       separator (see variable ORS above). All expression arguments  is	 taken
       as  strings,  being converted if	necessary; with	the exception that the
       printf format in	OFMT is	used instead of	the value in CONVFMT. An empty
       expression list stands for the whole input record ($0).

       The printf statement produces output based on a notation	similar	to the
       File Format Notation used to describe file  formats  in	this  document
       Output  is  produced as specified with the first	expression argument as
       the string format and subsequent	expression arguments  as  the  strings
       arg1 to argn, inclusive,	with the following exceptions:

       1. The  format  is  an  actual character	string rather than a graphical
	  representation. Therefore, it	cannot contain empty  character	 posi-
	  tions.  The  space  character	 in  the format	string,	in any context
	  other	than a flag of a conversion specification, is  treated	as  an
	  ordinary character that is copied to the output.

       2. If  the  character set contains a Delta character and	that character
	  appears in the format	string,	it is treated as an ordinary character
	  that is copied to the	output.

       3. The escape sequences beginning with a	backslash character is treated
	  as sequences of ordinary characters that are copied to  the  output.
	  Note that these same sequences is interpreted	lexically by nawk when
	  they appear in literal strings, but they is not treated specially by
	  the printf statement.

       4. A  field  width  or  precision  can  be specified as the * character
	  instead of a digit string. In	this case the next argument  from  the
	  expression  list is fetched and its numeric value taken as the field
	  width	or precision.

       5. The implementation does not precede or follow	output from the	d or u
	  conversion specifications with blank characters not specified	by the
	  format string.

       6. The implementation does not precede output  from  the	 o  conversion
	  specification	with leading zeros not specified by the	format string.

       7. For the c conversion specification: if the argument  has  a  numeric
	  value, the character whose encoding is that value is output.	If the
	  value	is zero	or is not the encoding of any character	in the charac-
	  ter set, the behavior	is undefined.  If the argument does not	have a
	  numeric value, the first character of	the string value will be  out-
	  put;	if  the	string does not	contain	any characters the behavior is
	  undefined.

       8. For each conversion specification that  consumes  an	argument,  the
	  next	expression  argument  will be evaluated. With the exception of
	  the c	conversion, the	value will be  converted  to  the  appropriate
	  type for the conversion specification.

       9. If  there  are  insufficient expression arguments to satisfy all the
	  conversion specifications in the  format  string,  the  behavior  is
	  undefined.

       10.
	  If any character sequence in the format string begins	with a % char-
	  acter, but does not  form  a	valid  conversion  specification,  the
	  behavior is unspecified.

       Both print and printf can output	at least {LINE_MAX} bytes.

   Functions
       The  nawk  language  has	 a  variety of built-in	functions: arithmetic,
       string, input/output and	general.

   Arithmetic Functions
       The arithmetic functions, except	for int, are based on the ISO C	 stan-
       dard. The behavior is undefined in cases	where the ISO C	standard spec-
       ifies that an error be returned or  that	 the  behavior	is  undefined.
       Although	the grammar permits built-in functions to appear with no argu-
       ments or	parentheses, unless the	argument or parentheses	are  indicated
       as  optional  in	 the following list (by	displaying them	within the [ ]
       brackets), such use is undefined.

       atan2(y,x)
	     Return arctangent of y/x.

       cos(x)
	     Return cosine of x, where x is in radians.

       sin(x)
	     Return sine of x, where x is in radians.

       exp(x)
	     Return the	exponential function of	x.

       log(x)
	     Return the	natural	logarithm of x.

       sqrt(x)
	     Return the	square root of x.

       int(x)
	     Truncate its argument to an integer. It will be truncated	toward
	     0 when x >	0.

       rand()
	     Return a random number n, such that 0 <= n	< 1.

       srand([expr])
	     Set  the  seed  value  for	rand to	expr or	use the	time of	day if
	     expr is omitted. The previous seed	value will be returned.

   String Functions
       The string functions in the following list shall	be supported. Although
       the  grammar  permits built-in functions	to appear with no arguments or
       parentheses, unless  the	 argument  or  parentheses  are	 indicated  as
       optional	 in  the  following  list  (by	displaying them	within the [ ]
       brackets), such use is undefined.

       gsub(ere,repl[,in])
	     Behave like sub (see below), except  that	it  will  replace  all
	     occurrences of the	regular	expression (like the ed	utility	global
	     substitute) in $0 or in the in argument, when specified.

       index(s,t)
	     Return the	position, in characters, numbering from	1, in string s
	     where string t first occurs, or zero if it	does not occur at all.

       length[([s])]
	     Return the	length,	in characters, of  its	argument  taken	 as  a
	     string, or	of the whole record, $0, if there is no	argument.

       match(s,ere)
	     Return the	position, in characters, numbering from	1, in string s
	     where the extended	regular	expression ere occurs, or zero	if  it
	     does  not	occur at all. RSTART will be set to the	starting posi-
	     tion (which is the	same as	the returned value), zero if no	 match
	     is	 found;	 RLENGTH  will	be  set	 to  the length	of the matched
	     string, -1	if no match is found.

       split(s,a[,fs])
	     Split the string s	into array elements a[1], a[2],	..., a[n], and
	     return  n.	 The separation	will be	done with the extended regular
	     expression	fs or with the field separator FS if fs	is not	given.
	     Each  array element will have a string value when created.	If the
	     string assigned to	any array element, with	any occurrence of  the
	     decimal-point  character  from  the  current  locale changed to a
	     period character, would be	considered a numeric string; the array
	     element  will  also have the numeric value	of the numeric string.
	     The effect	of a null string as the	value of fs is unspecified.

       sprintf(fmt,expr,expr,...)
	     Format the	expressions according to the printf  format  given  by
	     fmt and return the	resulting string.

       sub(ere,repl[,in])
	     Substitute	 the string repl in place of the first instance	of the
	     extended regular expression ERE in	string in and return the  num-
	     ber  of substitutions. An ampersand ( & ) appearing in the	string
	     repl will be replaced by the string from in that matches the reg-
	     ular expression. For each occurrence of backslash (\) encountered
	     when scanning the string repl from	beginning  to  end,  the  next
	     character	is  taken literally and	loses its special meaning (for
	     example, \& will be interpreted as	a  literal  ampersand  charac-
	     ter).  Except  for	 &  and	 \, it is unspecified what the special
	     meaning of	any such character is. If in is	specified  and	it  is
	     not  an  lvalue the behavior is undefined.	If in is omitted, nawk
	     will substitute in	the current record ($0).

       substr(s,m[,n])
	     Return the	at most	n-character substring  of  s  that  begins  at
	     position  m, numbering from 1. If n is missing, the length	of the
	     substring will be limited by the length of	the string s.

       tolower(s)
	     Return a string based on the string s. Each character in  s  that
	     is	 an  upper-case	 letter	specified to have a tolower mapping by
	     the LC_CTYPE category of the current locale will be  replaced  in
	     the  returned  string  by	the lower-case letter specified	by the
	     mapping. Other characters in s will be unchanged in the  returned
	     string.

       toupper(s)
	     Return  a	string based on	the string s. Each character in	s that
	     is	a lower-case letter specified to have a	toupper	mapping	by the
	     LC_CTYPE  category	 of the	current	locale will be replaced	in the
	     returned string by	the upper-case letter specified	 by  the  map-
	     ping.  Other  characters  in  s will be unchanged in the returned
	     string.

       All of the preceding functions that take	ERE as a  parameter  expect  a
       pattern	or  a string valued expression that is a regular expression as
       defined below.

   Input/Output	and General Functions
       The input/output	and general functions are:

       close(expression)
	     Close the file or pipe opened by a	print or printf	statement or a
	     call  to  getline	with the same string-valued expression.	If the
	     close was successful, the function	will return 0;	otherwise,  it
	     will return non-zero.

       expression|getline[var]
	     Read  a  record of	input from a stream piped from the output of a
	     command. The stream will be created if  no	 stream	 is  currently
	     open with the value of expression as its command name. The	stream
	     created will be equivalent	to one created by a call to the	 popen
	     function with the value of	expression as the command argument and
	     a value of	r as the mode argument.	As long	as the stream  remains
	     open,  subsequent calls in	which expression evaluates to the same
	     string value will read subsequent	records	 from  the  file.  The
	     stream  will  remain open until the close function	is called with
	     an	expression that	evaluates to the same string  value.  At  that
	     time,  the	 stream	 will  be closed as if by a call to the	pclose
	     function. If var is missing, $0 and NF will  be  set;  otherwise,
	     var will be set.

	     The getline operator can form ambiguous constructs	when there are
	     operators that are	not in parentheses (including concatenate)  to
	     the  left of the |	(to the	beginning of the expression containing
	     getline). In the context of the $ operator, | behaves  as	if  it
	     had  a  lower  precedence	than $.	The result of evaluating other
	     operators is unspecified, and all such uses of portable  applica-
	     tions must	be put in parentheses properly.

       getline
	     Set $0 to the next	input record from the current input file. This
	     form of getline will set the NF, NR, and FNR variables.

       getline var
	     Set variable var to the next input	record from the	current	 input
	     file. This	form of	getline	will set the FNR and NR	variables.

       getline [var] < expression
	     Read  the	next record of input from a named file.	The expression
	     will be evaluated to produce a string that	 is  used  as  a  full
	     pathname. If the file of that name	is not currently open, it will
	     be	opened.	As long	as the stream remains open,  subsequent	 calls
	     in	 which expression evaluates to the same	string value will read
	     subsequent	records	from the file. The file	will remain open until
	     the close function	is called with an expression that evaluates to
	     the same string value. If var is missing, $0 and NF will be  set;
	     otherwise,	var will be set.

	     The getline operator can form ambiguous constructs	when there are
	     binary operators that are not in parentheses (including  concate-
	     nate) to the right	of the < (up to	the end	of the expression con-
	     taining the getline). The result of evaluating such  a  construct
	     is	 unspecified,  and all such uses of portable applications must
	     be	put in parentheses properly.

       system(expression)
	     Execute the command given by expression in	a manner equivalent to
	     the  system(3C)  function	and return the exit status of the com-
	     mand.

       All forms of getline will return	1 for successful input,	0 for  end  of
       file, and -1 for	an error.

       Where  strings  are used	as the name of a file or pipeline, the strings
       must be textually identical.  The  terminology  ``same  string  value''
       implies	that  ``equivalent  strings'',	even those that	differ only by
       space characters, represent different files.

   User-defined	Functions
       The nawk	language also provides user-defined functions. Such  functions
       can be defined as:

       function	name(args,...) { statements }

       A  function can be referred to anywhere in an nawk program; in particu-
       lar, its	use can	precede	its definition.	The scope of a	function  will
       be global.

       Function	 arguments  can	 be  either scalars or arrays; the behavior is
       undefined if an array name is passed as an argument that	 the  function
       uses  as	 a  scalar, or if a scalar expression is passed	as an argument
       that the	function uses as an array. Function arguments will  be	passed
       by  value if scalar and by reference if array name. Argument names will
       be local	to the function; all other variable names will be global.  The
       same  name will not be used as both an argument name and	as the name of
       a function or a special nawk variable. The same name must not  be  used
       both  as	 a  variable name with global scope and	as the name of a func-
       tion. The same name must	not be used within the same scope  both	 as  a
       scalar variable and as an array.

       The  number of parameters in the	function definition need not match the
       number of parameters in the function call. Excess formal	parameters can
       be  used	as local variables. If fewer arguments are supplied in a func-
       tion call than are in the function  definition,	the  extra  parameters
       that  are used in the function body as scalars will be initialized with
       a string	value of the null string and a numeric value of	zero, and  the
       extra  parameters  that are used	in the function	body as	arrays will be
       initialized as empty arrays. If more arguments are supplied in a	 func-
       tion  call  than	 are in	the function definition, the behavior is unde-
       fined.

       When invoking a function, no white space	 can  be  placed  between  the
       function	name and the opening parenthesis. Function calls can be	nested
       and recursive calls can be made upon functions. Upon  return  from  any
       nested  or  recursive  function	call, the values of all	of the calling
       function's parameters will be unchanged,	except	for  array  parameters
       passed  by  reference.  The  return  statement  can be used to return a
       value. If a return statement appears outside of a function  definition,
       the behavior is undefined.

       In  the function	definition, newline characters are optional before the
       opening brace and after the closing  brace.  Function  definitions  can
       appear  anywhere	in the program where a pattern-action pair is allowed.

USAGE
       The index, length, match, and substr functions should not  be  confused
       with  similar  functions	 in the	ISO C standard;	the nawk versions deal
       with characters,	while the ISO C	standard deals with bytes.

       Because the concatenation operation is represented by adjacent  expres-
       sions  rather  than  an explicit	operator, it is	often necessary	to use
       parentheses to enforce the proper evaluation precedence.

       See largefile(5)	for the	description  of	 the  behavior	of  nawk  when
       encountering files greater than or equal	to 2 Gbyte ( 2**31 bytes).

EXAMPLES
       The nawk	program	specified in the command line is most easily specified
       within single-quotes (for example, 'program')  for  applications	 using
       sh,  because nawk programs commonly contain characters that are special
       to the shell, including double-quotes. In the cases where a  nawk  pro-
       gram contains single-quote characters, it is usually easiest to specify
       most of the program as strings within single-quotes concatenated	by the
       shell with quoted single-quote characters.  For example:

       awk '/'\''/ { print "quote:", $0	}'

       prints  all  lines  from	 the  standard input containing	a single-quote
       character, prefixed with	quote:.

       The following are examples of simple nawk programs:

       Example 1: Write	to the standard	output all input lines for which field
       3 is greater than 5:

       $3 > 5

       Example 2: Write	every tenth line:

       (NR % 10) == 0

       Example 3: Write	any line with a	substring matching the regular expres-
       sion:

       /(G|D)(2[0-9][[:alpha:]]*)/

       Example 4: Print	any line with a	substring containing a G  or  D,  fol-
       lowed by	a sequence of digits and characters:

       This  example uses character classes digit and alpha to match language-
       independent digit and alphabetic	characters, respectively.

       /(G|D)([[:digit:][:alpha:]]*)/

       Example 5: Write	any line in which the second field matches the regular
       expression and the fourth field does not:

       $2 ~ /xyz/ && $4	!~ /xyz/

       Example	6:  Write  any line in which the second	field contains a back-
       slash:

       $2 ~ /\\/

       Example 7: Write	any line in which the second field  contains  a	 back-
       slash (alternate	method):

       Note that backslash escapes are interpreted twice, once in lexical pro-
       cessing of the string and once in processing the	regular	expression.

       $2 ~ "\\\\"

       Example 8: Write	the second to the last and  the	 last  field  in  each
       line, separating	the fields by a	colon:

       {OFS=":";print $(NF-1), $NF}

       Example 9: Write	the line number	and number of fields in	each line:

       The  three strings representing the line	number,	the colon and the num-
       ber of fields are concatenated and that string is written  to  standard
       output.

       {print NR ":" NF}

       Example 10: Write lines longer than 72 characters:

       {length($0) > 72}

       Example	11:  Write first two fields in opposite	order separated	by the
       OFS:

       { print $2, $1 }

       Example 12: Same, with input fields separated by	comma or space and tab
       characters, or both:

       BEGIN { FS = ",[\t]*|[\t]+" }
	     { print $2, $1 }

       Example 13: Add up first	column,	print sum and average:

	   {s += $1 }
       END {print "sum is ", s,	" average is", s/NR}

       Example 14: Write fields	in reverse order, one per line (many lines out
       for each	line in):

       { for (i	= NF; i	> 0; --i) print	$i }

       Example 15: Write all lines between occurrences of the strings  "start"
       and "stop":

       /start/,	/stop/

       Example	16:  Write  all	 lines whose first field is different from the
       previous	one:

       $1 != prev { print; prev	= $1 }

       Example 17: Simulate the	echo command:

       BEGIN  {
	      for (i = 1; i < ARGC; ++i)
		    printf "%s%s", ARGV[i], i==ARGC-1?"\n":""
	      }

       Example 18: Write the path prefixes contained in	the  PATH  environment
       variable, one per line:

       BEGIN  {
	      n	= split	(ENVIRON["PATH"], path,	":")
	      for (i = 1; i <= n; ++i)
		     print path[i]
	      }

       Example 19: Print the file "input", filling in page numbers starting at
       5:

       If there	is a file named	input containing page headers of the form

       Page#

       and a file named	program	that contains

       /Page/{ $2 = n++; }
       { print }

       then the	command	line

       nawk -f program n=5 input

       will print the file input, filling in page numbers starting at 5.

ENVIRONMENT VARIABLES
       See environ(5) for descriptions of the following	environment  variables
       that  affect execution: LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH.

       LC_NUMERIC
	     Determine the radix  character  used  when	 interpreting  numeric
	     input,  performing	 conversions between numeric and string	values
	     and formatting numeric output. Regardless of locale,  the	period
	     character	(the  decimal-point  character of the POSIX locale) is
	     the decimal-point character recognized in processing awk programs
	     (including	assignments in command-line arguments).

EXIT STATUS
       The following exit values are returned:

       0     All input files were processed successfully.

       >0    An	error occurred.

       The  exit  status  can  be  altered within the program by using an exit
       expression.

ATTRIBUTES
       See attributes(5) for descriptions of the following attributes:

   /usr/bin/nawk
       +-----------------------------+-----------------------------+
       |      ATTRIBUTE	TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWcsu			   |
       +-----------------------------+-----------------------------+

   /usr/xpg4/bin/awk
       +-----------------------------+-----------------------------+
       |      ATTRIBUTE	TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWxcu4			   |
       +-----------------------------+-----------------------------+

SEE ALSO
       awk(1),	ed(1),	egrep(1),  grep(1),  lex(1),   sed(1),	 popen	 (3C),
       printf(3C),   system(3C),   attributes(5),   environ(5),	 largefile(5),
       regex(5), XPG4(5)

       Aho, A. V., B. W. Kernighan, and	P. J. Weinberger, The AWK  Programming
       Language, Addison-Wesley, 1988.

DIAGNOSTICS
       If any file operand is specified	and the	named file cannot be accessed,
       nawk will write a diagnostic message to standard	 error	and  terminate
       without any further action.

       If  the	program	 specified by either the program operand or a progfile
       operand is not a	valid nawk program (as specified in EXTENDED  DESCRIP-
       TION), the behavior is undefined.

NOTES
       Input white space is not	preserved on output if fields are involved.

       There are no explicit conversions between numbers and strings. To force
       an expression to	be treated as a	number add 0 to	it; to force it	to  be
       treated as a string concatenate the null	string ("") to it.

SunOS 5.9			  10 Feb 1999			       nawk(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPERANDS | INPUT FILES | EXTENDED DESCRIPTION | USAGE | EXAMPLES | ENVIRONMENT VARIABLES | EXIT STATUS | ATTRIBUTES | SEE ALSO | DIAGNOSTICS | NOTES

Want to link to this manual page? Use this URL:
<http://www.freebsd.org/cgi/man.cgi?query=nawk&sektion=1&manpath=SunOS+5.9>

home | help