Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Text::Capitalize(3)   User Contributed Perl Documentation  Text::Capitalize(3)

       Text::Capitalize	- capitalize strings ("to WORK AS titles" becomes "To
       Work as Titles")

	  use Text::Capitalize;

	  print	capitalize( "...and justice for	all" ),	"\n";
	     ...And Justice For	All

	  print	capitalize_title( "...and justice for all" ), "\n";
	     ...And Justice for	All

	  print	capitalize_title( "agent of SFPUG", PRESERVE_ALLCAPS=>1	), "\n";
	     Agent of SFPUG

	  print	capitalize_title( "the ring:  symbol or	cliche?",
				  PRESERVE_WHITESPACE=>1 ), "\n";
	     The Ring:	Symbol or Cliche?
	     (Note, double-space after colon is	still there.)

	  # To work on international characters, may need to set locale
	  use Env qw( LANG );
	  $LANG	= "en_US";
	  print	capitalize_title( "A1/4ber maus" ), "\n";
	     Aber Maus

	  use Text::Capitalize qw( scramble_case );
	  print	scramble_case( 'It depends on what you mean by "mean"' );
	     It	dEpenDS	On wHAT	YOu mEan by "meAn".

	 Text::Capitalize is for capitalizing strings in a manner
       suitable	for use	in titles.

       Text::Capitalize	provides some routines for title-like formatting of

       The simple capitalize function just makes the inital character of each
       word uppercase, and forces the rest to lowercase.

       The capitalize_title function applies English title case	rules
       (discussed below) where only the	"important" words are supposed to be
       capitalized.  There are also some customization features	provided to
       allow the user to choose	variant	rules.

       Comparing capitalize and	captialize_title:

	 Input:		    "lost watches of splitsville"
	 capitalize:	    "Lost Watches Of Splitsville"
	 capitalize_title:  "Lost Watches of Splitsville"

       Some examples of	formatting with	capitalize_title:

	 Input:		    "KiLLiNG TiMe"
	 capitalize_title:  "Killing Time"

	 Input:		    "we	have come to wound the autumnal	city"
	 capitalize_title:  "We	Have Come to Wound the Autumnal	City"

	 Input:		    "ask for whom they ask for"
	 captialize_title:  "Ask for Whom They Ask For"

       Text::Capitalize	also provides some functions for special effects such
       as scramble_case, which typically would be used for this	sort of

	 Input:		   "get	whacky"
	 scramble_case:	   "gET	wHaCkY"	 (or something similar)

   default exports
	   Makes the inital character of each word uppercase, and forces the
	   rest	to lowercase.

	   The original	routine	by Stanislaw Y.	Pusep.

	   Applies English title case rules (See BACKGROUND) where only	the
	   "important" words are supposed to be	capitalized.

	   The one required argument is	the string to be capitalized.

	   Some	customization options may be passed in as pairs	of names and
	   values following the	required argument.

	   The following customizations	are allowed:



	   Array reference:


	   See "Customizing the	Exceptions to Capitalization".

   optional exports
	   The list of minor words that	don't usually get capitalized in
	   titles (used	by capitalize_title).  Defaults	to:

		a an the
		and or nor for but so yet
		to of by at for	but in with has
		de von

	   Defines the default arguments for the capitalize_title function
	   Initially, this is set-up to	shut off the features
	   has @exceptions as the NOT_CAPITALIZED list.

	   This	routine	provides a special effect: sCraMBliNg tHe CaSe

	   The algorithm here uses a modified probability distribution to get
	   a weirder looking effect than simple	randomization such as with

	   For a discussion of the algorithm, see "SPECIAL EFFECTS".

	   Randomizes the case of each character with a	50-50 chance of	each
	   one becoming	upper or lower case.

	   Function to provide a special effect: "RANDOMLY upcasing WHOLE
	   WORDS at a TIME".

	   This	uses a similar algorithm to scramble_case, though it also
	   ignores words on the	@exceptions list, just as capitalize_title

       The capitalize_title function tries to do the right thing by default:
       adjust an arbitrary chunk of text so that it can	be used	as a title.
       But as with many	aspects	of the human languages,	it is extremely
       difficult to come up with a set of programmatic rules that will cover
       all cases.

   Words that don't get	capitalized
       This web	page:

       presents	some admirably clear rules for capitalizing titles:

	 ALL words in EVERY title are capitalized except
	 (1) a,	an, and	the,
	 (2) two and three letter conjunctions (and, or, nor, for, but,	so, yet),
	 (3) prepositions.
	 Exceptions:  The first	and last words are always capitalized even
	 if they are among the above three groups.

       But consider the	case:

	 "It Waits Underneath the Sea"

       Should the word "underneath" be downcased because it's a	preposition?
       Most English speakers would be surprised	to see it that way.
       Consequently, the default list of exceptions to capitalization in this
       module only includes the	shortest of the	common prepositions (to	of by
       at for but in).

       The default entries on the exception list are:

	    a an the
	    and	or nor for but so yet
	    to of by at	for but	in with	has
	    de von

       The observant may note that the last row	is not composed	of English
       words.  The honorary "de" has been included in honor of "HonorA(C) de
       Balzac".	 And "von" was added for the sake of equal time.

   Customizing the Exceptions to Capitalization
       If you have different ideas about the "rules" of	English	(or perhaps if
       you're trying to	use this code with another language with different
       rules) you might	like to	substitute a new exception list	of your	own:

	 capitalize_title( "Dude, we, like, went to Old	Slavy, and uh, they didn't have	it",
			    NOT_CAPITALIZED => [ qw( uh	duh huh	wha like man you know )	] );

       This should return:

	  Dude,	We, like, Went To Old Slavy, And uh, They Didn't Have It

       Less radically, you might like to simply	add a word to the list,	for
       example "from":

	  use Text::Capitalize 0.2 qw( capitalize_title	@exceptions );
	  push @exceptions, "from";

	  print	capitalize_title( "fungi from yuggoth",
				  NOT_CAPITALIZED => \@exceptions);

       This should output:

	   Fungi from Yuggoth

   All Uppercase Words
       In order	to work	with a wide range of input strings, by default
       capitalize_title	presumes that upper-case input needs to	be adjusted
       (e.g. "DOOM APPROACHES!"	would become "Doom Approaches!").  But,	this
       doesn't allow for the possibilities such	as an acronym in a title (e.g.
       "RAM Prices Plummet" ideally should not become "Ram Prices Plummet").
       If the PRESERVE_ALLCAPS option is set, then it will be presumed that an
       all-uppercase word is that way for a reason, and	will be	left alone:

	  print	capitalize_title( "ram more RAM	down your throat",
				  PRESERVE_ALLCAPS => 1	);

       This should output:

	     Ram More RAM Down Your Throat

   Preserving Any Usage	of Uppercase for Mixed-case Words
       There are some other odd	cases that are difficult to handle well,
       notably mixed-case words	such as	"iMac",	"CHiPs", and so	on.  For these
       purposes, a PRESERVE_ANYCAPS option has been provided which presumes
       that any	usage of uppercase is there for	a reason, in which case	the
       entire word should be passed through untouched.	With PRESERVE_ANYCAPS
       on, only	the case of all	lowercase words	will ever be adjusted:

	  print	capitalize_title( "TLAs	i have known and loved",
			      PRESERVE_ANYCAPS => 1 );

       This should output:

	  TLAs I Have Known and	Loved

	  print	capitalize_title( "the next iMac: just another NeXt?",
				   PRESERVE_ANYCAPS => 1);

       This should output:

	  The Next iMac: Just Another NeXt?

   Handling Whitespace
       By default, the capitalize_title	function presumes that you're trying
       to clean	up potential title strings. As an extra	feature	it collapses
       multiple	spaces and tabs	into single spaces.  If	this feature doesn't
       seem desirable and you want it to literally restrict itself to
       adjusting capitalization, you can force that behavior with the

	  print	capitalize_title( "it came from	texas:	the new	new world order?",

       This should output:

	     It	Came From Texas:  The New New World Order?

       (Note: the double-space after the colon is still	there.)

   Comparison to Text::Autoformat
       As you might expect, there's more than one way to do this, and these
       two pieces of code perform very similar functions:

	  use Text::Capitalize 0.2;
	  print	capitalize_title( $t ),	"\n";

	  use Text::Autoformat;
	  print	autoformat { case => "highlight", right	=> length( $t )	}, $t;

       Note: with autoformat, supplying	the length of the string as the	"right
       margin" is much faster than plugging in an arbitrarily large number.
       There doesn't seem to be	any other way of turning off line-breaking
       (e.g. by	using the "fill" parameter) though possibly there will be in
       the future.

       As of this writing, "capitalize_title" has some advantages:

       1.  It works on characters outside the English 7-bit ASCII range, for
	   example with	my locale setting (en_US) the ISO-8859-1 International
	   characters are handled correctly, so	that "A1/4ber maus" becomes
	   "Aber Maus".

       2.  Minor words following leading punctuation become upper case:

	      "...And Justice for All"

       3.  It works with multiple sentence input (e.g. "And sooner. And
	   later."  should probably not	be "And	sooner.	and later.")

       4.  The list of minor words is more extensive (i.e. includes: so, yet,
	   nor), and is	also customizable.

       5.  There's a way of preserving acronyms	via the	PRESERVE_ALLCAPS
	   option and similarly, mixed-case words ("iMac", "NeXt", etc") with
	   the PRESERVE_ANYCAPS	option.

       6.  capitalize_title is roughly ten times faster.

       Another difference is that Text::Autoformat's "highlight" always
       preserves whitespace something like capitalize_title does with the
       PRESERVE_WHITESPACE option set.

       However,	it should be pointed out that Text::Autoformat is under	active
       maintenance by Damian Conway.  It also does far more than this module,
       and you may want	to use it for other reasons.

   Still more ways to do it
       Late breaking news: The second edition of the Perl Cookbook has just
       come out.  It now includes: "Properly Capitalizing a Title or Headline"
       as recipe 1.14.	You should familiarize yourself	with this if you want
       to become a true	master of all title capitalization routines.

       (And I see that recipe 1.13 includes a "randcap"	program	as an example,
       which as	it happens does	something like the random_case function
       described below...)

       Some functions have been	provided to make strings look weird by
       scrambling their	capitalization ("lIKe tHiS"): random_case and
       scramble_case.  The function "random_case" does a straight-forward
       randomization of	capitalization so that each letter has a 50-50 chance
       of being	upper or lower case.  The function "scramble_case" performs a
       very similar function, but does a slightly better job of	producing
       something "weird-looking".

       The difficulty is that there are	differences between human perception
       of randomness and actual	randomness.  Consider the fact that of the
       sixteen ways that the four letter word "word" can be capitalized, three
       of them are rather boring: "word", "Word" and "WORD".  To make it less
       likely that scramble_case will produce dull output when you want
       "weird" output, a modified probability distribution has been used that
       records the history of previous outcomes, and tweaks the	likelihood of
       the next	decision in the	opposite direction, back toward	the expected
       average.	 In effect, this simulates a world in which the	Gambler's
       Fallacy is correct ("Hm... red has come up a lot, I bet that black is
       going to	come up	now.").	"Streaks" are much less	likely with
       scramble_case than with random_case.

       Additionally, with scramble_case	the probability	that the first
       character of the	input string will become upper-case has	been tweaked
       to less than 50%.  (Future versions may apply this tweak	on a per-word
       basis rather than just on a per-string basis).

       There is	also a function	that scrambles capitalization on a word-by-
       word basis called "zippify_case", which should produce output like: "In
       my PREVIOUS life	i was a	LATEX-novelty REPAIRMAN!"

       By default, this	version	of the module provides the two functions
       capitalize and capitalize_title.	 Future	versions will have no further
       additions to the	default	export list.

       Optionally, the following functions may also be exported:

	   A function to scramble capitalization in a wEiRD loOOkInG wAy.
	   Supposed to look a little stranger than the simpler random_case

	   Function to randomize capitalization	of each	letter in the string.
	   Compare to "scramble_case"

	   A function like "scramble_case" that	acts on	a word-by-word basis
	   (Somewhat LIKE this,	YOU know?).

       It is also possible to export the following variables:

	   The list of minor words that	capitalize_title uses by default to
	   determine the exceptions to capitalization.

	   The hash of allowed arguments (with defaults) that the
	   capitalize_title function uses.

       1. In capitalize_title, quoted sentence terminators are treated as
       actual sentence breaks, e.g. in this case:

	    'say "yes but!" and	"know what?"'

       The program sees	the ! and effectively treats this as two separate
       sentences: the word "but" becomes "But" (under the rule that last words
       must always be uppercase, even if they're on the	exception list)	and
       the word	"and" becomes "And" (under the first word rule).

       2. There's no good way to automatically handle names like "McCoy".
       Consider	the difficulty of disambiguating "Macadam Roads" from "MacAdam
       Rode".  If you need to solve problems like this,	consider using the
       case_surname function of	Lingua::En::NameParse.

       3. In general, Text::Capitalize is a very parochial English oriented
       module that looks like it belongs in the	"Lingua::En::*"	tree.

       4. There's currently no way of doing a PRESERVE_ANYCAPS that *also*
       adjusts capitalization of words on the exception	list, so that "iMac Or
       iPod" would become "iMac	or iPod".


       "The Perl Cookbook", second edition, recipes 1.13 and 1.14


       About "scramble_case":

       Version 0.9

	  Joseph M. Brenner

	  Stanislaw Y. Pusep  (who wrote "capitalize")
	     ICQ UIN:  11979567

       And many	thanks (for feature suggestions	and code examples) to:

	   Belden Lyman, Yary Hcluhan, Randal Schwartz

       Copyright 2003 by Joseph	Brenner. All rights reserved.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       Hey! The	above document had some	coding errors, which are explained

       Around line 28:
	   Non-ASCII character seen before =encoding in	'"A1/4ber'. Assuming

perl v5.32.1			  2019-09-27		   Text::Capitalize(3)


Want to link to this manual page? Use this URL:

home | help