Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help

       awka-elm	- Awka Extended	Library	Methods

       Awka  is	 a  translator	of  AWK	programs to ANSI-C code, and a library
       (libawka.a) against which the code is  linked  to  create  executables.
       Awka is described in the	awka manpage.

       The  Extended  Library  Methods (ELM) provide a way of adding new func-
       tions to	the AWK	language, so that they appear in your AWK code	as  if
       they were builtin functions such	as substr() or index().

       ELM  code  interfaces  with  the	 internal Awka variable	structures and
       functions, and is suitable for anyone with some experience  and	profi-
       ciency in C programming.

       This  document  is a step-by-step introduction to how the ELM works, so
       by the end of it	you can	write your own libraries  to  extend  the  AWK
       programming  language  using Awka.  For example,	you could write	an in-
       terface to allow	AWK programs to	communicate with  ODBC	databases,  or
       solve  the  travelling salesman problem given input of town locations -
       whatever	you require AWK	to do should now be possible.

       The C code produced by awka from	AWK programs is	heavily	populated with
       calls  to  functions  in	the awka library (libawka).  Hence after it is
       compiled, this code must	be linked to the library to produce a  working

       When  parsing  an AWK program, awka checks to see if each function call
       in the program is (a) a core builtin function, (b) a call to a user-de-
       fined AWK function in the program, or (c) a call	to one of the extended
       builtin functions.  The above order of priority is applied, so a	 user-
       defined function	(b) overrides (c), and (a) overrides (b) to avoid con-

       If none of these	prove to be true, the function call is written in  the
       code  in	 the format of a user-defined function,	even though that func-
       tion doesn't exist to its knowledge.  Awka is  assuming	that  by  link
       time  you will provide another object file or library that contains the
       missing function	and resolve the	call.

       So if I pass awka the following code:

	  BEGIN	{ print	mymath(3,4) }

       The call	it generates will look like this...

	  mymath_fn(awka_arg2(a_TEMP, _litd0_awka, _litd1_awka))

       So all we need to do is write the mymath_fn()  function,	 and  link  it
       with the	awka-generated code, and bingo!	 AWK has been extended by you,
       to do what you want.  And the only restrictions on what a function like
       mymath_fn() might do are	those imposed by the C language!

       So,  you	 write the function, compile it	into a library,	use it in your
       AWK program, translate it, link it in, and you're away -	its that  sim-
       ple (fingers crossed).

       Ok,  the	 first	thing  to  notice is that the function name in the AWK
       code, mymath, has been appended with _fn	in the C code.	 This  happens
       with all	unresolved AWK function	calls (also with user-defined function
       names, but that doesn't matter here).  It's done	to avoid unintentional
       conflicts with functions	in other libraries.

       The definition of any function is this:-

	  funcname_fn( a_VARARG	* )

       Ugh!   What's  this a_VARARG thingy?  Yes, learned reader, the time has
       come to get acquainted with the dreaded	Awka  data  structures.	  Well
       they're	pretty	simple	actually.   The	two you	need to	know about are
       a_VAR and a_VARARG, and as the latter contains arrays  of  the  former,
       I'll deal with a_VAR first.

	  The a_VAR Structure

	  typedef struct {
	      double dval;	    /* the variable's numeric value */
	      char * ptr;	    /* pointer to string, array	or RE structure	*/
	      unsigned int slen;    /* length of string	ptr as per strlen */
	      unsigned int allc;    /* space mallocated	for string ptr */
	      char type;	    /* records current cast of variable	*/
	      char type2;	    /* special flag for	dual-type variables */
	      char temp;	    /* TRUE if a temporary variable */
	    } a_VAR;

       These  are used prolifically throughout the AWK library,	and are	at the
       heart of	how it manipulates data.  Remember, AWK	variables  are	essen-
       tially  typeless,  as they can be cast to number, string	or regular ex-
       pression	at your	whim throughout	a program.  The	only thing  you	 can't
       cast  to	 &  from is arrays, as a variable is only either an array or a
       scalar (the other types).

       Recall our mymath example earlier.   In	the  AWK  code,	 we  had  "my-
       math(3,4)",   but   the	 C   code   was	  "mymath_fn(awka_arg2(a_TEMP,
       _litd0_awka, _litd1_awka))".

       The numeric value of 3 has  been	 changed  to  _litd0_awka,  and	 4  to
       _litd1_awka.   If  you run awka with this example program & examine the
       output, you'll see that both _litd0_awka	and _litd1_awka	 are  pointers
       to  a_VAR  structures, and each has been	set to the appropriate numeric
       values.	Hence, all data	passed to our functions	will be	 embodied  in-
       side a_VAR's.

       Confused?  Yes?	No?  Take heart, it doesn't get	much worse, and	with a
       few more	examples I hope	things should be clearer.  Looking at the call
       to mymath_fn above, you'll notice a call	to awka_arg2().	 Remember that
       mymath_fn only takes a pointer to an a_VARARG, so awka_arg2() obviously
       returns one of these.

       What an a_VARARG	contains is an array of	a_VARs,	and an integer showing
       how many	there are in the array - thats all!  Don't believe  me?	  Then
       here's the structure in all its glory:

	  The a_VARARG Structure

	  typedef struct {
	      a_VAR *var[256];
	      int used;
	    } a_VARARG;

       The  a_VARARG structure gives us	an easy	means of passing around	flexi-
       ble numbers of a_VARS to	functions, much	as you'd use  vararg  in  a  C
       program.	  If you don't know what vararg	does and have some time, check
       the stdarg manpage.

       So, to conclude,	awka_arg2() takes two a_VARs and packages them	nicely
       into  an	a_VARARG to make life easy for our function.  Another thing to
       note - the a_VARARG function allows up to 256  arguments.   No  parame-
       ters, only arguments, and they always win them!	Sorry, on with the se-
       rious stuff...

       So when we come to write	mymath_fn, what	type of	thing should  it  con-
       tain?   Ok,  lets  assume  we want mymath to add	the two	numbers	it re-
       ceives as arguments, then add on	the two	numbers	multiplied, and	return
       the result, ie. (n1+n2)+n1*n2.

       Well, here goes...

	  #include <libawka.h>

	    a_VAR *
	    mymath_fn( a_VARARG	*va )
	      a_VAR *ret = NULL;

	      if (va->used < 2)
		awka_error("function mymath expecting 2	arguments, only	got %d.\n",va->used);

	      ret = awka_getdoublevar(FALSE);
	      ret->dval	= (awka_getd(va->var[0]) + awka_getd(va->var[1])) +
			      va->var[0]->dval * va->var[1]->dval;

	      return ret;

       Ok, there's not a lot to	it, so lets start at the top.  You need	to in-
       clude libawka.h,	as it defines the data structures plus the whole  Awka
       API that	you'll be calling.

       The  definition	of mymath_fn is	as described earlier.  It will need to
       return a	numeric	value, but as we're in AWK (conceptually),  this  will
       need to be enclosed in an a_VAR,	hence the existence of ret.

       The  incoming a_VARARG can contain any number of	a_VAR's	- we only care
       about the first two, so we check	to see whether these exist, and	if not
       spit  an	 error	through	the awka_error function	(or you	could use your
       own error handler).  When writing your own functions,  you'll  need  to
       remember	 that  any  number  of	arguments could	be passed in, and they
       could be	of any type, so	you'll need to check them.

       So far, ret is NULL, so we need to create a structure to	point  it  to.
       Better  than  that, we call awka_getdoublevar(),	which gets us a	tempo-
       rary variable, already initialised to contain  a	 numeric  value.   You
       guessed	it,  there's  an  awka_getstringvar() that we could use	if our
       function	was to	return	a  string.   The  value	 of  FALSE  passed  to
       awka_getdoublevar()  means  that	 we  don't  want to be responsible for
       freeing this structure, but prefer to leave it  to  libawka's  internal
       garbage	collection.  I can't see any reason why	you'd choose TRUE, but
       its there just in case.

       The next	2 lines	do the core stuff.  Ok,	ret->dval is set,  that	 makes
       sense.	The  expression	 refers	to the contents	of the a_VARARG->a_VAR
       array, again this is expected.  At first, though, it calls  awka_getd()
       for  each of the	arguments, but on the next line	it references the dval
       value directly.	Why the	calls to awka_getd?

       Because it can't	be sure	that the incoming variables are	 already  cast
       to numbers, so these functions (actually	macros)	do the casting for us,
       and return the value of dval after the cast is done.  Subsequently,  we
       can look	at dval	directly as we know its	been set to the	current	numer-
       ical value of the variable.

       Lastly, we return ret.

       Alright,	let's get this working.	 Follow	these steps:

	    1. Create mymath.c with mymath_fn(), exactly as its	written	above.
	    2. Create mymath.h containing:  a_VAR * mymath_fn( a_VARARG	*va );
	    3. gcc -c mymath.c	  (or use whatever C compiler you have).
	    4. awka -i mymath.h	'BEGIN { print mymath(3,4) }' >test.c
	    5. gcc -I. test.c mymath.o -lawka -lm -o mytest
	    6. mytest

       The output from running mytest should be	19.  Magic!

       A more comprehensive example is the awkatk library available  from  the
       awka website.  Hopefully	you'll find it helpful,	and who	knows, you may
       even use	it to write GUI	interfaces from	AWK!

       Obviously, this is intended to extend the limits	of the	AWK  universe,
       as  you could introduce any functionality written in C as a new builtin
       function	within AWK.

       There may be complex functions you've written in	AWK and	 use  all  the
       time that are just plain	inefficient, even using	Awka.  They're stable,
       you have	the skill to implement them in C, so now you can, and your AWK
       programs	 become	 shorter in the	process.  It's no longer a choice of C
       or AWK, now you can migrate sections to C as & when you like.

       There are many functions	in standard C libraries	that AWK doesn't have.
       Things  like strcasecmp(), fread(), cbrt(), and so on.  Now you can im-
       plement them.

       Lastly, I'd love	to see Awka have functions to read & write proprietary
       formats	like  MS Excel,	to communicate with ODBC databases, to perform
       complex mathematical or scientific operations, to implement true	multi-
       dimensional  arrays,  to	 provide  Fast Fourier Transform functions - I
       know its	possible.  If you do develop something neat like this, it'd be
       very cool if you	were to	make it	available for everyone to share.  Just
       send an email to,	and I'd	be happy  to  host  it
       on, or link it from the Awka website.

       So  you've  created  quite a few	Awka-ELM functions that	you've put to-
       gether into a library.  Let's say they calculate	 the  time  needed  to
       build the Sydney	Harbour	Bridge given a volume of manpower and the num-
       ber of supervisors.  Internally,	there's	quite a	 few  algorithms  that
       take into account strikes by unions, material shortages,	and casualties
       as workers fall off the bridge.

       Because of this complexity, within your library functions will need  to
       call  other  functions.	This is	fine.  What you	need to	do is not have
       an API function call another API	function, but instead keep  any	 func-
       tions they call hidden within the library, and also ensure these	inter-
       nal functions do	not use	the  awka_getdoublevar(),  awka_getstringvar()
       or awka_tmpvar()	calls.

       Apart  from  keeping  your  library structure nice and hierarchical and
       your API	simple,	it avoids overloading awka's internal pool  of	tempo-
       rary  variables.	  If this pool is overloaded, random chaos will	ensue,
       so please avoid it.

       All global variables in your AWK	program	are accessible by your library
       functions.  Herein lies the potential for great danger, so be careful!

       Global  variables  are,	of  course,  pointers to a_VAR structures, and
       their name is the same as in the	AWK script, with  _awk	appended.   So
       the variable 'myvar' in the script would	be myvar_awk in	the translated
       C code.	If you know what the variable name is, you can put  an	extern
       declaration  of it in your library code then work with it directly, but
       this may	be very	restrictive, as	it would mean that every  script  that
       uses  your  library  would need that variable name reserved.  There are
       other methods.

       One of the easiest is with arrays.  You can pass	them in	 as  arguments
       to  your	 functions, as their address is	passed over rather than	a copy
       of their	contents.  Scalars are not as easy.   Just  say	 our  function
       will  work with a global	variable, however it expects a string argument
       to contain the variable name in order to	 identify  which  variable  to
       work with - this	would make it pretty flexible.

       You  have  available  to	 you  the gvar_struct variable _gvar (both de-
       scribed in awka-elmref(5)).  This contains the  name  of	 every	global
       variable	in the script, and its a simple	matter to search down the list
       to find a pointer to the	a_VAR structure	of the variable	 you  want  to

       Looking	again  at the a_VAR structure, you may note that it contains a
       char * pointer that can reference strings, arrays and  regular  expres-
       sions.	There  is no reason why	you couldn't introduce your own	custom
       data structure and attach it to a global	variable within	 one  of  your
       functions, as long as you adhere	to the following rules:

       1. Don't	set the	variable to anything in	AWK after you set it to	your
	  customised  value,  as libawka will try (and fail) to	free the value
	  causing all sorts of flow-on problems.

       2. Don't	use the	AWK language to	copy or	compare	this variable to  oth-
	  even	with  two  variables  of  the same custom type (ie. custvar1 =
	  as libawka will have no idea how the copy should  be	done,  and  it
       will stuff
	  it up.  Instead, provide your	own copy and comparison	functions.

       3.  If your structures are memory intensive, you	may consider providing
       a method
	  of freeing the structures when they are no longer needed.

       4. Document what	your data structures and  methods  do,	and  how  they
       should be used
	  in  the  AWK script.	Please,	please do this,	as it could save you a
       lot of grief
	  later.  If your library becomes publicly  available  this  is	 espe-
       cially necessary.

       This has	been a very brief introduction indeed, but hopefully enough to
       get you started.	 I recommend you refer to the  awka-elmref(5)  manpage
       for  a  listing	of key libawka API functions and data definitions that
       are available for you to	use (but hopefully not abuse).	 If  you  have
       any  questions  at all, don't be	afraid to contact me (andrewsumner@ya-  Put the word "awka" at the front of your message title so  I
       know its	not spam.

       awka(1),	awka-elmref(5),	gcc(1)

       Bound to	be plenty.  Let	me know	if you find a bug with the libawka in-
       terface,	or get stuck with a problem.  I	am not,	though,	in any way re-
       sponsible  for  bugs  that are introduced by your code, nor am I	liable
       for any damages or expenses incurred as a result.  Nor am I liable  for
       anything	you do using Awka.

       I'll help where I can, and I'll usually help debug someone's library if
       I have a	personal interest in it.  If you're not	sure, try  me  anyway,
       the  worst  I  can do is	say no,	and I might be able to help.  I	really
       like folk who send fixes	along with bug reports,	though.	  And  I  love
       the  folk who send cash inducements (at last count, um, zero folk).  Oh
       well, enough rambling, time to finish.

       Andrew Sumner, August 2000 (

Version	0.7.x			  Aug 8	2000			   AWKA-ELM(5)


Want to link to this manual page? Use this URL:

home | help