Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
bt_postprocess(3)		    btparse		     bt_postprocess(3)

       bt_postprocess -	post-processing	of BibTeX strings, values, and entries

	  void bt_postprocess_string (char * s,
				      ushort options)

	  char * bt_postprocess_value (AST *   value,
				       ushort  options,
				       boolean replace);

	  char * bt_postprocess_field (AST *   field,
				       ushort  options,
				       boolean replace);

	  void bt_postprocess_entry (AST *  entry,
				     ushort options);

       When btparse parses a BibTeX entry, it initially	stores the results in
       an abstract syntax tree (AST), in a form	exactly	mirroring the parsed
       data.  For example, the entry

	    AuThOr = "Bob   Jones" # and # "Jim	Smith ",
	    TITLE = "Feeding Habits of
		     the Common	Cockroach",
	    JoUrNaL = j_ent,
	    YEAR = 1997

       would parse to an AST that could	be represented as follows:

	      (string,"Bob   Jones")
	      (string,"Jim Smith ")
	      (string,"Feeding Habits of	       the Common Cockroach")

       The advantage of	this form is that all the important information	in the
       entry is	readily	available by traversing	the tree using the functions
       described in bt_traversal.  This	obvious	problem	is that	the data is a
       little too raw to be immediately	useful:	entry types and	field names
       are inconsistently capitalized, strings are full	of unwanted white-
       space, field values not reduced to single strings, and so forth.

       All of these problems are addressed by btparse's	post-processing	func-
       tions, described	here.  Normally, you won't have	to call	these func-
       tions---the library does	the Right Thing	for you	after parsing each en-
       try, and	you can	customize what exactly the Right Thing is for your ap-
       plication.  (For	instance, you can tell it to expand macros, but	not to
       concatenate substrings together.)  However, it's	conceivable that you
       might wish to move the post-processing into your	own code and out of
       the library's control.  More likely, you	could have strings that	come
       from something other than BibTeX	files that you would like to have
       treated as BibTeX strings; for that situation, the post-processing
       functions are essential.	 Finally, you might just be curious about what
       exactly happens to your data after it's parsed.	If so, you've come to
       the right place for excruciatingly detailed explanations.

       btparse offers four points of entry to its post-processing code.	 Of
       these, probably only the	first and last---for processing	individual
       strings and whole entries---will	be commonly used.

       Post-processing entry points

       To understand why four entry points are offered,	an explanation of the
       sample AST shown	above will help.  First	of all,	the whole entry	is
       represented by the "(entry,"Article")" node; this node has the entry
       key and all its field/value pairs as children.  Entry nodes are re-
       turned by "bt_parse_entry()" and	"bt_parse_entry_s()" (see bt_input) as
       well as "bt_next_entry()" (which	traverses a list of entries returned
       from "bt_parse_file()"---see bt_traversal).  Whole entries may be post-
       processed with "bt_postprocess_entry()".

       You may also need to post-process a single field, or just the value as-
       sociated	with it.  (The difference is that processing the field can
       change the field	name---e.g. to lowercase---in addition to the field
       value.)	The "(field,"AuThOr")" node above is an	example	of a field
       sub-AST,	and "(string,"Bob   Jones")" is	the first node in the list of
       simple values representing that field's value.  (Recall that a field
       value is, in general, a list of simple values.)	Field nodes are	re-
       turned by "bt_next_field()", value nodes	by "bt_next_value()".  The
       former may be passed to "bt_postprocess_field()"	for post-processing,
       the latter to "bt_postprocess_value()".

       Finally,	individual strings may wander into your	program	from many
       places other than a btparse AST.	 For that reason, "bt_postpro-
       cess_string()" is available for post-processing arbitrary strings.

       Post-processing options

       All of the post-processing routines have	an "options" parameter,	which
       you can use to fine-tune	the post-processing.  (This is just like the
       per-metatype string-processing options that you can set before parsing
       entries;	see "bt_set_stringopts()" in bt_input.)	 Like elsewhere	in the
       library,	"options" is a bitmap constructed by or'ing together various
       predefined constants.  These constants and their	effects	are documented
       in "String processing option macros" in btparse.

       bt_postprocess_string ()
	      void bt_postprocess_string (char * s,
					  ushort options)

	   Post-processes an individual	string,	"s", which is modified in
	   place.  The only post-processing option that	makes sense on indi-
	   vidual strings is whether to	collapse whitespace according to the
	   BibTeX rules; thus, if "options & BTO_COLLAPSE" is false, this
	   function has	no effect.  (Although it makes a complete pass over
	   the string anyways.	This is	for future expansion.)

	   The exact rules for collapsing whitespace are simple: non-space
	   whitespace characters (tabs and newlines mainly) are	converted to
	   space, any strings of more than one space within are	collapsed to a
	   single space, and any leading or trailing spaces are	deleted.  (En-
	   suring that all whitespace is spaces	is actually done by btparse's
	   lexical scanner, so strings in btparse ASTs will never have white-
	   space apart from space.  Likewise, any strings passed to bt_post-
	   process_string() should not contain non-space whitespace charac-

       bt_postprocess_value ()
	      char * bt_postprocess_value (AST *   value,
					   ushort  options,
					   boolean replace);

	   Post-processes a single field value,	which is the head of a list of
	   simple values as returned by	"bt_next_value()".  All	of the rele-
	   vant	string-processing options come into play here: conversion of
	   numbers to strings ("BTO_CONVERT"), macro expansion ("BTO_EXPAND"),
	   collapsing of whitespace ("BTO_COLLAPSE"), and string pasting
	   ("BTO_PASTE").  Since pasting substrings together without first ex-
	   panding macros and converting numbers would be nonsensical, at-
	   tempting to do so is	a fatal	error.

	   If "replace"	is true, then the list headed by "value" will be re-
	   placed by a list representing the processed value.  That is,	if
	   string pasting is turned on ("options & BTO_PASTE" is true),	then
	   this	list will be collapsed to a single node	containing the single
	   string that results from pasting together all the substrings.  If
	   string pasting is not on, then each node in the list	will be	left
	   intact, but will have its text replaced by processed	text.

	   If "replace"	is false, then a new string will be built on the fly
	   and returned	by the function.  Note that if pasting is not on in
	   this	case, you will only get	the last string	in the list.  (It
	   doesn't really make a lot of	sense to post-process a	value without
	   pasting unless you're replacing it with the new value, though.)

	   Returns the string that resulted from processing the	whole value,
	   which only makes sense if pasting was on or there was only one
	   value in the	list.  If a multiple-value list	was processed without
	   pasting, the	last string in the list	is returned (after process-

	   Consider what might be done to the value of the "author" field in
	   the above example, which is the concatenation of a string, a	macro,
	   and another string.	Assume that the	macro "and" expands to " and
	   ", and that the variable "value" points to the sub-AST for this
	   value.  The original	sub-AST	corresponding to this value is

	      (string,"Bob   Jones")
	      (string,"Jim Smith ")

	   To fully process this value in-place, you would call

	      bt_postprocess_value (value, BTO_FULL, TRUE);

	   This	would convert the value	to a single-element list,

	      (string,"Bob Jones and Jim Smith")

	   and return the fully-processed string "Bob Jones and	Jim Smith".
	   Note	that the "and" macro has been expanded,	interpolated between
	   the two literal strings, everything pasted together,	and finally
	   whitespace collapsed.  (Collapsing whitespace before	concatenating
	   the strings would be	a bad idea.)

	   (Incidentally, "BTO_FULL" is	just a macro for the combination of
	   all possible	string-processing options, currently:


	   There are two other similar shortcut	macros:	"BTO_MACRO" to express
	   the special string-processing done on macro values, which is	the
	   same	as "BTO_FULL" except for the absence of	"BTO_COLLAPSE";	and
	   "BTO_MINIMAL", which	means no string-processing is to be done.)

	   Let's say you'd rather preserve the list nature of the value, while
	   expanding macros and	converting any numbers to strings.  (This con-
	   version is trivial: it just changes the type	of the node from
	   "BTAST_NUMBER" to "BTAST_STRING".  "Number" values are always
	   stored as a string of digits, just as they appear in	the file.)
	   This	would be done with the call


	   which would change the list to

	      (string,"Bob Jones")
	      (string,"Jim Smith")

	   Note	that whitespace	is collapsed here before any concatenation can
	   be done; this is probably a bad idea.  But you can do it if you
	   wish.  (If you get any ideas	about cooking up your own value	post-
	   processing scheme by	doing it in little steps like this, take a
	   look	at the source to "bt_postprocess_value()"; it should dissuade
	   you from such a venture.)

       bt_postprocess_field ()
	      char * bt_postprocess_field (AST *   field,
					   ushort  options,
					   boolean replace);

	   This	is little more than a front-end	to "bt_postprocess_value()";
	   the only difference is that you pass	it a "field" AST node (eg. the
	   "(field,"AuThOr")" in the above example), and that it transforms
	   the field name in addition to its value.  In	particular, the	field
	   name	is forced to lowercase;	this behaviour is (currently) not op-

	   Returns the string returned by "bt_postprocess_value()".

       bt_postprocess_entry ()
	      void bt_postprocess_entry	(AST *	entry,
					 ushort	options);

	   Post-processes all values in	an entry.  If "entry" points to	the
	   AST for a "regular" or "macro definition" entry, then the values
	   are just what you'd expect: everything on the right-hand side of a
	   field or macro "assignment."	 You can also post-process comment and
	   preamble entries, though.  Comment entries are essentially one big
	   string, so only whitespace collapsing makes sense on	them.  Pream-
	   bles	may have multiple strings pasted together, so all the string-
	   processing options apply to them.  (And there's nothing to prevent
	   you from using macros in a preamble.)

       btparse,	bt_input, bt_traversal

       Greg Ward <>

btparse, version 0.34		  2003-10-25		     bt_postprocess(3)


Want to link to this manual page? Use this URL:

home | help