Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
BT_MISC(1)			    btparse			    BT_MISC(1)

       bt_misc - miscellaneous BibTeX-like string-processing utilities

	  void bt_purify_string	(char *	string,	btshort	options);
	  void bt_change_case (char transform, char * string, btshort options);

	      void bt_purify_string (char * string, btshort options);

	   "Purifies" a	"string" in the	BibTeX way (usually used for
	   generating sort keys).  "string" is modified	in-place.  "options"
	   is currently	unused;	just set it to zero for	future compatibility.
	   Purification	consists of copying alphanumeric characters,
	   converting hyphens and ties to space, copying spaces, and skipping
	   (almost) everything else.

	   "Almost" because "special characters" (used for accented and	non-
	   English letters) are	handled	specially.  Recall that	a BibTeX
	   special character is	any brace-group	that starts at brace-depth
	   zero	whose first character is a backslash.  For instance, the

	      {\foo bar}Herr M\"uller went from	{P{\r r}erov} to {\AA}rhus

	   contains two	special	characters: "{\foo bar}" and "\AA".  Neither
	   the "\"u" nor the "\r r" are	special	characters, because they are
	   not at the right brace depth.

	   Special characters are handled as follows: if the control sequence
	   (the	TeX command that follows the backslash)	is recognized as one
	   of LaTeX's "foreign letters"	("\oe",	"\ae", "\o", "\l", "\ae",
	   "\ss", plus uppercase versions), then it is converted to a
	   reasonable English approximation by stripping the backslash and
	   converting the second character (if any) to lowercase; thus,
	   "{\AA}" in the above	example	would become simply "Aa".  All other
	   control sequences in	a special character are	stripped, as are all
	   non-alphabetic characters.

	   For example the above string, after "purification," becomes

	      barHerr Muller went from Pr rerov	to Aarhus

	   Obviously, something	has gone wrong with the	word "P{\r r}erov" (a
	   town	in the Czech Republic).	 The accented `r' should be a special
	   character, starting at brace-depth zero.  If	the original string
	   were	instead

	      {\foo bar}Herr M\"uller went from	P{\r r}erov to {\AA}rhus

	   then	the purified result would be more sensible:

	      barHerr Muller went from Prerov to Aarhus

	   Note	the use	of a "nonsense"	special	character "{\foo bar}":	this
	   trick is often used to put certain text in a	string solely for
	   generating sort keys; the text is then ignored when the document is
	   processed by	TeX (as	long as	"\foo" is defined as a no-op TeX
	   macro).  This assumes, of course, that the output is	eventually
	   processed by	TeX; if	not, then this trick will backfire on you.

	   Also, "bt_purify_string()" is adequate for generating sort keys
	   when	you want to sort according to English-language conventions.
	   To follow the conventions of	other languages, though, a more
	   sophisticated approach will be needed; hopefully, future versions
	   of btparse will address this	deficiency.

	      void bt_change_case (char	transform, char	* string, btshort options);

	   Converts a string to	lowercase, uppercase, or "non-book title
	   capitalization", with special attention paid	to BibTeX special
	   characters and other	brace-groups.  The form	of conversion is
	   selected by the single character "transform": 'u' to	convert	to
	   uppercase, 'l' for lowercase, and 't' for "title capitalization".
	   "string" is modified	in-place, and "options"	is currently unused;
	   set it to zero for future compatibility.

	   Lowercase and uppercase conversion are obvious, with	the proviso
	   that	text in	braces is treated differently (explained below).
	   Title capitalization	simply means that everything is	converted to
	   lowercase, except the first letter of the first word, and words
	   immediately following a colon or sentence-ending punctuation.  For

	      Flying Squirrels:	Their Peculiar Habits. Part One

	   would be converted to

	      Flying squirrels:	Their peculiar habits. Part one

	   Text	within braces is handled as follows.  First, in	a "special
	   character" (see above for definition), control sequences that
	   constitute one of LaTeX's non-English letters are converted
	   appropriately---e.g., when converting to lowercase, "\AE" becomes
	   "\ae").  Any	other control sequence in a special character
	   (including accents) is preserved, and all text in a special
	   character, regardless of depth and punctuation, is converted	to
	   lowercase or	uppercase.  (For "title	capitalization," all text in a
	   special character is	converted to lowercase.)

	   Brace groups	that are not special characters	are left completely
	   untouched: neither text nor control sequences within	non-special
	   character braces are	touched.

	   For example,	the string

	      A	Guide to \LaTeXe: Document Preparation ...

	   would, when "transform" is 't' (title capitalization), be converted

	      A	guide to \latexe: Document preparation ...

	   which is probably not the desired result.  A	better attempt is

	      A	Guide to {\LaTeXe}: Document Preparation ...

	   which becomes

	      A	guide to {\LaTeXe}: Document preparation ...

	   However, if you go back and re-read the description of
	   "bt_purify_string()", you'll	discover that "{\LaTeXe}" here is a
	   special character, but not a	non-English letter: thus, the control
	   sequence is stripped.  Thus,	a sort key generated from this title
	   would be

	      A	Guide to  Document Preparation

	   ...oops!  The right solution	(and this applies to any title with a
	   TeX command that becomes actual text) is to bury the	control
	   sequence at brace-depth two:

	      A	Guide to {{\LaTeXe}}: Document Preparation ...


       Greg Ward <>

btparse, version 0.88		  2019-04-29			    BT_MISC(1)


Want to link to this manual page? Use this URL:

home | help