Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help

       awka-elmref - Awka API Reference	for use	with Awka-ELM libraries.

       Awka  is	 a  translator	of  AWK	programs to ANSI-C code, and a library
       (libawka.a) against which the code is  linked  to  create  executables.
       Awka is described in the	awka manpage.

       The  Extended  Library  Methods (ELM) provide a way of adding new func-
       tions to	the AWK	language, so that they appear in your AWK code	as  if
       they  were builtin functions such as substr() or	index().  The awka-elm
       manpage contains	an introduction	to Awka-ELM.

       This page lists the available data structures,  definitions,  functions
       and  macros  provided  by  libawka.h that you may use in	creating C li-
       braries that link with awka-generated code.

       I have broken the page into the following main sections:	BASIC VARIABLE
       SION METHODS.  So, without further ado...

       Data Structures


	    typedef struct {
	      double dval;	    /* the variable's numeric value */
	      char * ptr;	    /* pointer to string, array	or RE structure	*/
	      unsigned int slen;    /* length of string	ptr as per strlen */
	      unsigned int allc;    /* space mallocated	for string ptr */
	      char type;	    /* records current cast of variable	*/
	      char type2;	    /* double-typed variable flag, explained later. */
	      char temp;	    /* TRUE if a temporary variable */
	    } a_VAR;

	    The	a_VAR structure	is used	to store everything related to AWK
	    variables.	This includes those named & used in your program, and
	    transient variables	created	to return values from functions	and other
	    operations like string concatenation.  As such, this structure is
	    ubiquitous throughout libawka and awka-generated code.

	    The	type value is set to one of a number of	#define	values,
	    described in the Defines paragraph below.  Many functions and macros
	    exist for working with the contents	of a_VARs - see	the Functions &
	    Macros paragraph for details.


	    typedef struct {
	      a_VAR *var[256];
	      int used;
	    } a_VARARG;

	    This structure is typically	used to	pass variable numbers of a_VARs	to
	    functions.	Up to 256 a_VARs may be	referenced by an a_VARARG, and the
	    used value contains	the number of a_VARs present.

	  struct gvar_struct

	    struct gvar_struct {
	      char *name;
	      a_VAR *var;

	    Provides a mapping of the global variable names in an AWK script to	pointers
	    to their a_VAR structures.

       Internal	Libawka	Variables

	  a_VAR	* a_bivar[a_BIVARS]
	    This array contains	all the	AWK internal variables,	such as	ARGV, ARGV,
	    CONVFMT, ENVIRON and so on,	along with $0 and the field variables $1..$n.
	    a_BIVARS is	a define, as are the identities	of which element in the
	    array belongs to which variable.  Again, look for functions	that manage
	    these variables rather than	working	with them directly if possible.

	  extern struct	gvar_struct *_gvar;
	    This is actually created & populated by the	translated C code generated
	    by awka, rather than by libawka.a.	It is a	NULL-terminated	array
	    of the gvar_struct structure defined earlier in this page, and contains
	    the	names of all global variables in an AWK	script,	mapped to their	a_VAR


	  a_VARNUL - the type value of an a_VAR	if the variable	is unused.

	  a_VARDBL - the type value for	an a_VAR cast to a number.

	  a_VARSTR - type where	the a_VAR has been cast	to a string.

	  a_VARARR - type where	the a_VAR contains an array.

	  a_VARREG - type where	the a_VAR contains a regular expression.

	  a_VARUNK - type where	the a_VAR is a string, but could also be a
		     number.  Variables	populated by getline, the FILENAME variable,
		     and elements of an	array created by split(), are all of this
		     special type.

	  a_DBLSET - for a string a_VAR	that has been read in context as a number, the
		     type2 flag	is set to this #define to prevent the string-to-number
		     conversion	being done again.

	  a_STRSET - the opposite of the above.	 The variable is a number, has been read
		     as	a string, hence	the value of ptr is current, and the type2
		     flag is set to this #define.

	  a_BIVARS provides the	number of elements in the a_bivar[] array.

       a_FS,  a_NF,  a_NR,  a_OFMT,  a_OFS,  a_ORS, a_RLENGTH, a_RS, a_RSTART,
       provide	indexes	to which elements in the a_bivar[] array are for which
       AWK internal variable.

       Functions _ Macros

	  awka_getd(a_VAR *)
	    This macro calls the awka_getdval()	function, appending the	calling	file & line number
	    for	debug purposes.	 It read-casts the variable to a number, and returns the double
	    value of the variable.  By read-cast, we mean that if the variable is a string it
	    remains so,	but dval is set, and type2 is set to a_DBLSET.	But if the a_VAR
	    is a regular expression, the re structure is dropped and the variable converted to a
	    number.  If	you're not sure	whether	an a_VAR you're	about to read is a number, and you
	    want to read it as one, simply call	awka_getd(varname) - its the easiest way.

	  awka_getd1(a_VAR *)
	    Same as awka_getd, except this will	be faster if the a_VAR * is a variable.	 Do not
	    use	this if	the a_VAR * is a function call return value, as	it'll call the function
	    several times!  In this case, use awka_getd() instead.

	  awka_gets(a_VAR *)
	    Similar to awka_getd(), this read-casts an a_VAR to	a string, and returns the character
	    array pointed to by	ptr.

	  awka_gets1(a_VAR *)
	    Use	this where the a_VAR * is a variable, not a function call that returns an a_VAR	*,
	    for	faster performance.

	  awka_getre(a_VAR *)
	    Write-casts	the a_VAR * to a regular expression, and returns the pointer to	the awka_regexp
	    structure.	Write-cast means that the existing contents of the variable are	dropped	in
	    favour of the new contents.

	  static char *awka_strcpy(a_VAR *var, char *str)
	    This function sets var to string type, and copies to it the	contents of str.
	    It returns a pointer to var-_ptr.

	  a_VAR	*awka_varcpy(a_VAR *va,	a_VAR *vb)
	    This function copies the contents of scalar	a_VAR *vb to scalar a_VAR *va,
	    and	returns	a pointer to va.

	  double awka_varcmp(a_VAR *va,	a_VAR *vb)
	    This function compares the contents	of the two scalar variables, and returns 0 if the
	    variables are equal, -1 if va is less than vb, or 1	if va is greater.  Numerical
	    comparison is used where possible, otherwise string.

	  a_VAR	*awka_vardup(a_VAR *va)
	    This function creates a new	a_VAR *, copies	the contents of	va, and	returns	a pointer
	    to the new structure.

	  awka_varinit(a_VAR *)
	    A macro that takes a NULL a_VAR *, mallocs space for it, and initialises it	to a_VARNUL.

	  void awka_killvar(a_VAR *)
	    Frees all memory used by the a_VAR,	except the structure itself.

	  static a_VAR * awka_argv()
	    You	can use	a_bivar[a_ARGV]	directly when reading the value	of elements in the array,
	    but	when you want to write to the array, use the above function instead, as	it will
	    make sure the changes are recognised elsewhere in libawka.

	  static a_VAR * awka_argc()
	    You	can use	a_bivar[a_ARGC]	directly when reading its value, but when you want to write
	    to it, use the above function instead, as it will make sure	the change is recognised
	    elsewhere in libawka.

       Data Structures _ Variables
       These are strictly internal to the array	module within libawka.	If you
       need  functionality  other than that provided by	the array functions, I
       recommend creating your own custom array	data structures	and  interface
       functions,  otherwise  you could	cause serious problems.	 The structure
       definitions are too lengthy to list here, and the  foolhardy  may  find
       them in lib/array.h within the awka distribution.


	    The	'type' of an array that	has not	been initialised, or has been deleted.

	    The	'type' of an array populated by	the split() builtin function.

	    The	'type' of arrays populated within the AWK script, eg. arr["pigs"] = cows.

	    When searching arrays, specifies that an element is	to be created if it doesn't
	    already exist in the array.

	    When searching arrays, this	will not create	a new element if it doesn't already

	    In an array	search,	this flag will cause the element to be deleted from the	array.


	  void awka_arraycreate( a_VAR *var, char type );
	    Allocates an array structure of type type, makes var-_ptr point
	    to it, and sets var-_type to a_VARARR.  The	type argument may
	    be one of a_ARR_TYPE_NULL, a_ARR_TYPE_SPLIT	or a_ARR_TYPE_HSH, according to
	    how	the array will be populated.

	  void awka_arrayclear(	a_VAR *var );
	    Assumes var	is an a_VARARR,	this deletes the contents of the array structure
	    pointed to by var-_ptr.

	  a_VAR	* awka_arraysearch1( a_VAR *v, a_VAR *element, char create, int	set );
	    Searches array variable v for index	element.  If it	does not exist,	and
	    create is a_ARR_CREATE, a new element in the array for this	value will be added.
	    If the element is found (or	created) and create is not a_ARR_DELETE, the
	    function will return a pointer to the a_VAR	for that element.  For a_ARR_DELETE, the
	    element will be deleted from the array.  The set value should be FALSE.

	  a_VAR	* awka_arraysearch( a_VAR *v, a_VARARG *va, char create	);
	    Searches array variable v as per awka_arraysearch1(), except that this works
	    with multiple index	subscripts (eg,	arr[x, y]).

	  double awka_arraysplitstr( char *str,	a_VAR *v, a_VAR	*fs, int max );
	    The	AWK builtin split() function.  It splits str into array	variable v,
	    based on fs, up to max number of fields.  If fs is NULL, then
	    a_bivar[a_FS] will be used.	 Otherwise fs may contain an empty string, a
	    single-character string, or	a regular expression.  The number of fields created
	    in v is returned.

	  int awka_arrayloop( a_ListHdr	*ah, a_VAR *v );
	    This function implements the "for (i in j)"	feature	in AWK.	 You provide ah,
	    making sure	it is initialised to zeroes.

	    The	best way to understand how to call this	function is to type:

	      awka 'BEGIN { for	(i in j) x = j[i]; }'

	    and	see what is generated as a result.  You	don't have to understand the a_ListHdr
	    structure or sub-structures	to use this function.

	  int awka_arraynext( a_VAR *v,	a_ListHdr *ah, int pos );
	    Given that ah has been populated by	a call to awka_arrayloop(), this function
	    copies the (string or integer) element at position pos in the list to v,
	    then returns pos+1,	or zero	if there are no	more elements in the array list.

	  void awka_alistfree( a_ListHdr *ah );
	    Frees the last list	element	in ah.

	  void awka_alistfreeall( a_ListHdr *ah	);
	    Frees all memory held by ah, and sets its contents to zero/NULL.

	  a_VAR	* awka_dol0(int	set);
	    The	best means of accessing	the $0 a_VAR, as it updates its	contents with any pending
	    changes.  Make set zero if you're reading the value	of $0, but if you want to
	    set	$0, make it 1.

	  a_VAR	* awka_doln(int	fld, int set);
	    This function returns the a_VAR * of the $1..$n variable identified	by fld,
	    updating the field array with any refreshed	$0 contents first if necessary.	 If you
	    want to read the value of $fld, make set zero, otherwise it	should be 1.

       These  are  documented  in lib/builtin.h	in the awka distribution.  You
       can call	any of the builtin functions as	often as you like.  Those that
       return  a_VAR's	also  provide a	keep flag that,	if TRUE, will return a
       variable	that you must free, otherwise they will	use a temporary	 vari-
       able  that  you	don't  have to worry about freeing, but	will be	reused
       elsewhere sooner	or later.  The functions  should  be  pretty  much  as
       you'd  expect  them, except that	many require an	a_VARARG as input, and
       we haven't discussed how	to create one -	we will	now.

	  a_VARARG * awka_arg0(char);

	  a_VARARG * awka_arg1(char, a_VAR *);

	  a_VARARG * awka_arg2(char, a_VAR *, a_VAR *);

	  a_VARARG * awka_arg3(char, a_VAR *, a_VAR *, a_VAR *);

	  a_VARARG * awka_vararg(char, a_VAR *var, ...);
	    These functions populate & return a	pointer	to an a_VARARG structure.  The char
	    argument, if TRUE, will make you responsible for freeing the structure, otherwise
	    it'll be a temporary one that libawka will manage.	awka_arg0() will return	an
	    empty structure (ie. no args), awka_arg1() will have one a_VAR * in	it, and	so
	    on.	 Where you want	to put more than four a_VAR *'s	inside an a_VARARG, you	can
	    call awka_vararg with as many as you like, or if there's seriously a lot, maybe
	    write your own loop	of code	to populate an a_VARARG	- its not rocket science.

       Data Structures _ Variables


	    typedef struct {
	      char *name;	/* name	of output file or device */
	      FILE *fp;		/* file	pointer	*/
	      char *buf;	/* input buffer	*/
	      char *current;	/* where up to in buffer */
	      char *end;	/* end of data in buffer */
	      int alloc;	/* size	of input buffer	*/
	      char io;		/* input or output stream flag */
	      char pipe;	/* true/false */
	      char interactive;	/* whether from	a /dev/xxx stream or not */
	    } _a_IOSTREAM;

	    extern _a_IOSTREAM *_a_iostream;
	    extern int _a_ioallc, _a_ioused;

	    Controls input and output streams used by AWK's getline, print and printf
	    builtin functions.	The two	int variables record the space allocated in the
	    _a_iostream	array, and the number of elements used,	respectively.  I list this
	    information	here in	case you wish to create	fread, fwrite and fseek	functions for
	    awka, as these will	need low-level access to the streams.


	  a_VAR	* awka_getline(char keep, a_VAR	*target, char *input, int pipe,	char main);
	    As previously described, keep controls whether you want to be responsible for
	    freeing the	a_VAR the function returns or not.  Moving on, target is the a_VAR
	    to hold the	line of	data to	be read	(you provide this one).	 input is the name
	    of the input file or command.  pipe	is TRUE	if input is a command rather
	    than a file, eg. "sort stuff | getline x".	main should always be false.

	    If input is	NULL, getline will try to read from the	file identified	by
	    a_bivar[a_FILENAME], or from the next element in the a_bivar[a_ARGV] array.

       I won't go into detail about awka_fflush, awka_close, awka_printf &  so
       on,  as	these  should  be  easy	 enough	to understand and use, and the
       chances are you should use the native C variety anyway where possible.

       Ah, now we're in	murky water indeed, as awka inherited its  RE  library
       from  Libc,  and	 treats	it like	a magical black	box that does its bid-
       ding.  Want my advice?  Treating	the RE	library	 &  structure  like  a
       black box is a wise thing to do,	as its ugly-looking stuff.

       Ok,  we know that when an a_VAR has been	set to a_VARREG, its ptr value
       will point to an	awka_regexp structure.	Do we need to know  what's  in
       this  structure?	  I don't think	so.  What we do	need are the functions
       that help us compile and	execute	regular	 expressions.	Oops,  getting
       ahead  of  myself.   RE's are like C programs, they need	to be compiled
       before they can be used to search strings.  This	basically is a parsing
       of  the	RE  pattern  into  a tree structure that is easier to navigate
       while searching,	and is a one-off task.

	  awka_getre(a_VAR *)
	    This macro is the easiest method of	creating & compiling a regexp.	Providing you've
	    set	the a_VAR to the string	value of the re	pattern, this macro call works a treat.

	  a_VAR	*awka_match(char keep, char fcall, a_VAR *va, a_VAR *rva);
	    This function is the implementation	of AWK's match() function, and is the most
	    simple way of evaluating an	RE against a string.  keep is as previously
	    discussed, fcall should be set to TRUE if you want a_bivar[a_RSTART] and
	    a_bivar[a_RLENGTH] to be set, otherwise FALSE, va contains the string, and
	    rva	contains the regular expression.  The numerical	a_VAR returned is
	    1 on success, zero on failure.

       I was going to describe the lower-level methods of compiling and	match-
       ing  against RE's, but when I looked at it, there seemed	to be a	lot of
       complexity for no real gain in functionality.  All you get is the abil-
       ity  to avoid using a_VAR structures to manage the regular expressions,
       and honestly I don't see	what you'd gain	from this given	how much  more
       complexity you'd	have to	deal with.

       I haven't described all of the functions	available in libawka.h,	not by
       any means.  But I have tried to avoid functions that  are  really  only
       meant  for internal use,	or that	are only needed	by translated code and
       should be done in other ways by library code.  In  the  same  way  I've
       avoided	describing  structures that were intended to remain privy to a
       module within libawka, and you really shouldn't	need  to  tamper  with

       Any  questions  at  all,	or suggestions for improving this page,	let me
       know via	 Make sure you preface any message ti-
       tle with	the word "awka"	so I know its not spam.

       awka(1),	awka-elm(5).

Version	0.7.x			  Aug 8	2000			AWKA-ELMREF(5)


Want to link to this manual page? Use this URL:

home | help