Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
bt_misc(3)			    btparse			    bt_misc(3)

       bt_misc - miscellaneous BibTeX-like string-processing utilities

	  void bt_purify_string	(char *	string,	ushort options);
	  void bt_change_case (char transform, char * string, ushort options);

	      void bt_purify_string (char * string, ushort options);

	   "Purifies" a	"string" in the	BibTeX way (usually used for generat-
	   ing sort keys).  "string" is	modified in-place.  "options" is cur-
	   rently unused; just set it to zero for future compatibility.	 Pu-
	   rification consists of copying alphanumeric characters, converting
	   hyphens and ties to space, copying spaces, and skipping (almost)
	   everything else.

	   "Almost" because "special characters" (used for accented and	non-
	   English letters) are	handled	specially.  Recall that	a BibTeX spe-
	   cial	character is any brace-group that starts at brace-depth	zero
	   whose first character is a backslash.  For instance,	the string

	      {\foo bar}Herr M\"uller went from	{P{\r r}erov} to {\AA}rhus

	   contains two	special	characters: "{\foo bar}" and "\AA".  Neither
	   the "\"u" nor the "\r r" are	special	characters, because they are
	   not at the right brace depth.

	   Special characters are handled as follows: if the control sequence
	   (the	TeX command that follows the backslash)	is recognized as one
	   of LaTeX's "foreign letters"	("\oe",	"\ae", "\o", "\l", "\ae",
	   "\ss", plus uppercase versions), then it is converted to a reason-
	   able	English	approximation by stripping the backslash and convert-
	   ing the second character (if	any) to	lowercase; thus, "{\AA}" in
	   the above example would become simply "Aa".	All other control se-
	   quences in a	special	character are stripped,	as are all non-alpha-
	   betic characters.

	   For example the above string, after "purification," becomes

	      barHerr Muller went from Pr rerov	to Aarhus

	   Obviously, something	has gone wrong with the	word "P{\r r}erov" (a
	   town	in the Czech Republic).	 The accented `r' should be a special
	   character, starting at brace-depth zero.  If	the original string
	   were	instead

	      {\foo bar}Herr M\"uller went from	P{\r r}erov to {\AA}rhus

	   then	the purified result would be more sensible:

	      barHerr Muller went from Prerov to Aarhus

	   Note	the use	of a "nonsense"	special	character "{\foo bar}":	this
	   trick is often used to put certain text in a	string solely for gen-
	   erating sort	keys; the text is then ignored when the	document is
	   processed by	TeX (as	long as	"\foo" is defined as a no-op TeX
	   macro).  This assumes, of course, that the output is	eventually
	   processed by	TeX; if	not, then this trick will backfire on you.

	   Also, "bt_purify_string()" is adequate for generating sort keys
	   when	you want to sort according to English-language conventions.
	   To follow the conventions of	other languages, though, a more	so-
	   phisticated approach	will be	needed;	hopefully, future versions of
	   btparse will	address	this deficiency.

	      void bt_change_case (char	transform, char	* string, ushort options);

	   Converts a string to	lowercase, uppercase, or "non-book title capi-
	   talization",	with special attention paid to BibTeX special charac-
	   ters	and other brace-groups.	 The form of conversion	is selected by
	   the single character	"transform": 'u' to convert to uppercase, 'l'
	   for lowercase, and 't' for "title capitalization".  "string"	is
	   modified in-place, and "options" is currently unused; set it	to
	   zero	for future compatibility.

	   Lowercase and uppercase conversion are obvious, with	the proviso
	   that	text in	braces is treated differently (explained below).  Ti-
	   tle capitalization simply means that	everything is converted	to
	   lowercase, except the first letter of the first word, and words im-
	   mediately following a colon or sentence-ending punctuation.	For

	      Flying Squirrels:	Their Peculiar Habits. Part One

	   would be converted to

	      Flying squirrels:	Their peculiar habits. Part one

	   Text	within braces is handled as follows.  First, in	a "special
	   character" (see above for definition), control sequences that con-
	   stitute one of LaTeX's non-English letters are converted appropri-
	   ately---e.g., when converting to lowercase, "\AE" becomes "\ae").
	   Any other control sequence in a special character (including	ac-
	   cents) is preserved,	and all	text in	a special character, regard-
	   less	of depth and punctuation, is converted to lowercase or upper-
	   case.  (For "title capitalization," all text	in a special character
	   is converted	to lowercase.)

	   Brace groups	that are not special characters	are left completely
	   untouched: neither text nor control sequences within	non-special
	   character braces are	touched.

	   For example,	the string

	      A	Guide to \LaTeXe: Document Preparation ...

	   would, when "transform" is 't' (title capitalization), be converted

	      A	guide to \latexe: Document preparation ...

	   which is probably not the desired result.  A	better attempt is

	      A	Guide to {\LaTeXe}: Document Preparation ...

	   which becomes

	      A	guide to {\LaTeXe}: Document preparation ...

	   However, if you go back and re-read the description of "bt_pu-
	   rify_string()", you'll discover that	"{\LaTeXe}" here is a special
	   character, but not a	non-English letter: thus, the control sequence
	   is stripped.	 Thus, a sort key generated from this title would be

	      A	Guide to  Document Preparation

	   ...oops!  The right solution	(and this applies to any title with a
	   TeX command that becomes actual text) is to bury the	control	se-
	   quence at brace-depth two:

	      A	Guide to {{\LaTeXe}}: Document Preparation ...


       Greg Ward <>

btparse, version 0.34		  2003-10-25			    bt_misc(3)


Want to link to this manual page? Use this URL:

home | help