Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Regexp::Common::profanUseruContributed Perl DocRegexp::Common::profanity_us(3)

NAME
       Regexp::Common::profanity_us -- provide regexes for U.S.	profanity

SYNOPSIS
	 use Regexp::Common qw /profanity_us/;

	 my $RE	= $RE{profanity}{us}{normal}{label}{-keep}{-dist=>3};

	 while (<>) {
	     warn "PROFANE" if /$RE/;
	 }

       Or easier

	use Regexp::Profanity::US;

	$profane = profane     ($string);
	@profane = profane_list($string);

OVERVIEW
       Instead of a dry	technical overview, I am going to explain the
       structure of this module	based on its history. I	consult	at a company
       that generates customer leads primarily by having websites that attract
       people (e.g.  lowering loan values, selling cars, buying	real estate,
       etc.). For some reason we get more than our fair	share of profane
       leads. For this reason I	was told to write a profanity checker.

       For the data that I was dealing with, the profanity was most often in
       the email address or in the first or last name, so I naively started
       filtering profanity with	a set of regexps for that sort of data.	Note
       that both names and email addresses are unlike what you are reading
       now: they are not whitespace-separated text, but	are instead labels.

       Therefore full support for profanity checking should work in 2 entirely
       different contexts: labels (email, names) and text (what	you are
       reading).  Because open-source is driven	by demand and I	have no	need
       for detecting profanity in text,	only "label" is	implemented at the
       moment. And you know the	next sentence: "patches	welcome" :)

   Spelling Variations Dictated	by Sound or Sight
       Creative	use of symbols to spell	words (el33t sp3@k)

       Now, within labels, you can see normal ascii or creative	use of
       symbols:

       Here are	some normal profane labels:
	 suckmycock@isp.com
	 shitonastick

       And here	they are in ascii art:
	 s\/cKmyc0k@aol.com
	 sh|+0naST1ck

       A CPAN module which does	a great	job of "drawing	words" is
       Acme::Tie::Eleet.  I thought I knew all of the ways that	someone	could
       "inflate" a letter so that dirty	words could bypass a profanity
       checker,	but just look at all these:

	%letter	=
	   ( a => [ "4", "@" ],
	     c => "(",
	     e => "3",
	     g => "6",
	     h => [ "|-|", "]-[" ],
	     k => [ "|<", "]{" ],
	     i => "!",
	     l => [ "1", "|" ],
	     m => [ "|V|", "|\\/|" ],
	     n => "|\\|",
	     o => "0",
	     s => [ "5", "Z" ],
	     t => [ "7", "+"],
	     u => "\\_/",
	     v => "\\/",
	     w => [ "vv", "\\/\\/" ],
	     'y' => "j",
	     z => "2",
	     );

       Soundex respelling

       Which of	course brings me to the	final way to take normal text and vary
       it for the same meaning:	soundex.

       The way a word sounds can lead to different spellings. For example, we
       have
	shitonastick

       Which we	can soundex out	as:
	shitonuhstick

       Or, given:
	nigger

       We can rewrite it as:
	nigga
	nigguh
	niggah

       There are two CPAN modules, Text::Soundex and Text::Metaphone which do
       this sort of thing, but after they resolved "shit" and "shot" to	the
       same soundex, I forgot about them :).

       So to conclude this OVERVIEW, (or is that oV3r\/ieW :), this module
       does profanity checking for:

	 labels	and not	text

       and for:

	 normal	and not	eleet spelling

       with a bit of hedging to	support	soundexing (and	only definite obscene
       words are searched for. Ambiguous / contextual searching	is left	as an
       exercise	for the	reader).

       In Regexp::Common terminology, which is the infrastructure on which
       this module is built, we	have only the following	regexp for your
       string-matching ecstasy:

	   $RE{profanity}{us}{normal}{label}

       and patches are welcome for:

	   $RE{profanity}{us}{label}{eleet}
	   $RE{profanity}{us}{text}{normal}
	   $RE{profanity}{us}{text}{eleet}

       But do note this	if you plan to implement text parsing,

       "[^:alpha:]" and	not "\b" should	be used	because	"_" does not form a
       word boundary and so

	 \bshit\b

       will match

	 shit head

       and

	 shit-head

       but not

	 shit_head

       Another thing about text	is that	it may be resolved into	labels by
       splitting on whitespace.	Thus, one could	have one engine	and a
       different pre-processor.

USAGE
       Please consult the manual of Regexp::Common for a general description
       of the works of this interface.

       Do not use this module directly,	but load it via	Regexp::Common.

       This module reads one flag, "-dist" which is used to set	the amount of
       characters that can appear between components of	an obscene phrase.
       For example

	 suck!!!my!!!cock

       will match the following	regular	expression

	 suck-my-cock

       as long as the flag "-dist" is set to 3 or greater because this module
       changes "-" into	".{0,$dist}" with $dist	defaulting to 7.  Why such a
       large default? It is done so that the profanity list can	omit certain
       words such as my	or your. Take this:

	 poop on your face

       We have the following regular expression

	 poop--face

       which is	transformed to

	 poop.{0,7}.{0,7}face

       which will match	the possible prepositions and adjectives in between
       "poop" and "face" and also match	the hideous term "poopface".

   Capturing
       Under "-keep" (see Regexp::Common):

       $1  captures the	entire word

SEE ALSO
       Regexp::Common for a general description	of how to use this interface.

       Regexp::Common::profanity for a slightly	more European set of words.

       Regexp::Profanity::US for a pair	of wrapper functions that use these
       regexps.

AUTHOR
       T. M. Brannon, tbone@cpan.org

       I cannot	pay enough thanks to

	 Matthew Simon Cavalletto, evo@cpan.org.

       who refactored this module completely of	his own	volition and in	spite
       of his hectic schedule. He turned this module from an unsophisticated
       hack into something worth others	using.

       Useful brain picking came from William McKee of Knowmad Consulting on
       the Data::FormValidator mailing list.

perl v5.24.1			  2011-08-03   Regexp::Common::profanity_us(3)

NAME | SYNOPSIS | OVERVIEW | USAGE | SEE ALSO | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Regexp::Common::profanity_us&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help