Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Regexp::Common::profanUseruContributed Perl DocRegexp::Common::profanity_us(3)

       Regexp::Common::profanity_us -- provide regexes for U.S.	profanity

	 use Regexp::Common qw /profanity_us/;

	 my $RE	= $RE{profanity}{us}{normal}{label}{-keep}{-dist=>3};

	 while (<>) {
	     warn "PROFANE" if /$RE/;

       Or easier

	use Regexp::Profanity::US;

	$profane = profane     ($string);
	@profane = profane_list($string);

       Instead of a dry	technical overview, I am going to explain the
       structure of this module	based on its history. I	consult	at a company
       that generates customer leads primarily by having websites that attract
       people (e.g.  lowering loan values, selling cars, buying	real estate,
       etc.). For some reason we get more than our fair	share of profane
       leads. For this reason I	was told to write a profanity checker.

       For the data that I was dealing with, the profanity was most often in
       the email address or in the first or last name, so I naively started
       filtering profanity with	a set of regexps for that sort of data.	Note
       that both names and email addresses are unlike what you are reading
       now: they are not whitespace-separated text, but	are instead labels.

       Therefore full support for profanity checking should work in 2 entirely
       different contexts: labels (email, names) and text (what	you are
       reading).  Because open-source is driven	by demand and I	have no	need
       for detecting profanity in text,	only "label" is	implemented at the
       moment. And you know the	next sentence: "patches	welcome" :)

   Spelling Variations Dictated	by Sound or Sight
       Creative	use of symbols to spell	words (el33t sp3@k)

       Now, within labels, you can see normal ascii or creative	use of

       Here are	some normal profane labels:

       And here	they are in ascii art:

       A CPAN module which does	a great	job of "drawing	words" is
       Acme::Tie::Eleet.  I thought I knew all of the ways that	someone	could
       "inflate" a letter so that dirty	words could bypass a profanity
       checker,	but just look at all these:

	%letter	=
	   ( a => [ "4", "@" ],
	     c => "(",
	     e => "3",
	     g => "6",
	     h => [ "|-|", "]-[" ],
	     k => [ "|<", "]{" ],
	     i => "!",
	     l => [ "1", "|" ],
	     m => [ "|V|", "|\\/|" ],
	     n => "|\\|",
	     o => "0",
	     s => [ "5", "Z" ],
	     t => [ "7", "+"],
	     u => "\\_/",
	     v => "\\/",
	     w => [ "vv", "\\/\\/" ],
	     'y' => "j",
	     z => "2",

       Soundex respelling

       Which of	course brings me to the	final way to take normal text and vary
       it for the same meaning:	soundex.

       The way a word sounds can lead to different spellings. For example, we

       Which we	can soundex out	as:

       Or, given:

       We can rewrite it as:

       There are two CPAN modules, Text::Soundex and Text::Metaphone which do
       this sort of thing, but after they resolved "shit" and "shot" to	the
       same soundex, I forgot about them :).

       So to conclude this OVERVIEW, (or is that oV3r\/ieW :), this module
       does profanity checking for:

	 labels	and not	text

       and for:

	 normal	and not	eleet spelling

       with a bit of hedging to	support	soundexing (and	only definite obscene
       words are searched for. Ambiguous / contextual searching	is left	as an
       exercise	for the	reader).

       In Regexp::Common terminology, which is the infrastructure on which
       this module is built, we	have only the following	regexp for your
       string-matching ecstasy:


       and patches are welcome for:


       But do note this	if you plan to implement text parsing,

       "[^:alpha:]" and	not "\b" should	be used	because	"_" does not form a
       word boundary and so


       will match

	 shit head



       but not


       Another thing about text	is that	it may be resolved into	labels by
       splitting on whitespace.	Thus, one could	have one engine	and a
       different pre-processor.

       Please consult the manual of Regexp::Common for a general description
       of the works of this interface.

       Do not use this module directly,	but load it via	Regexp::Common.

       This module reads one flag, "-dist" which is used to set	the amount of
       characters that can appear between components of	an obscene phrase.
       For example


       will match the following	regular	expression


       as long as the flag "-dist" is set to 3 or greater because this module
       changes "-" into	".{0,$dist}" with $dist	defaulting to 7.  Why such a
       large default? It is done so that the profanity list can	omit certain
       words such as my	or your. Take this:

	 poop on your face

       We have the following regular expression


       which is	transformed to


       which will match	the possible prepositions and adjectives in between
       "poop" and "face" and also match	the hideous term "poopface".

       Under "-keep" (see Regexp::Common):

       $1  captures the	entire word

       Regexp::Common for a general description	of how to use this interface.

       Regexp::Common::profanity for a slightly	more European set of words.

       Regexp::Profanity::US for a pair	of wrapper functions that use these

       T. M. Brannon,

       I cannot	pay enough thanks to

	 Matthew Simon Cavalletto,

       who refactored this module completely of	his own	volition and in	spite
       of his hectic schedule. He turned this module from an unsophisticated
       hack into something worth others	using.

       Useful brain picking came from William McKee of Knowmad Consulting on
       the Data::FormValidator mailing list.

perl v5.32.1			  2011-08-03   Regexp::Common::profanity_us(3)


Want to link to this manual page? Use this URL:

home | help