Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Text::Match::FastAlterUservContributed Perl DoText::Match::FastAlternatives(3)

       Text::Match::FastAlternatives - efficient search	for many strings

	   use Text::Match::FastAlternatives;

	   my $expletives = Text::Match::FastAlternatives->new(@naughty);
	   while (my $line = <>) {
	       print "Do you email your	mother with that keyboard?\n"
		   if $expletives->match($line);

       This module allows you to search	for any	of a list of substrings
       ("keys")	in a larger string.  It	is particularly	efficient when the set
       of keys is large.

       This efficiency comes at	the cost of some flexibility: if you want
       case-insensitive	matching, you have to fold case	yourself:

	   my $expletives = Text::Match::FastAlternatives->new(
	       map { lc	} @naughty);
	   while (my $line = <>) {
	       print "Do you email your	mother with that keyboard?\n"
		   if $expletives->match(lc $line);

       (The same applies if you	want matching that is insensitive to Unicode
       normalization forms; see	Unicode::Normalize.)

       This module is designed as a drop-in replacement	for Perl code of the
       following form:

	   my $expletives_regex	= join '|', map	{ quotemeta } @naughty;
	   $expletives_regex = qr/$expletives_regex/;
	   while (my $line = <>) {
	       print "Do you email your	mother with that keyboard?\n"
		   if $line =~ $expletives_regex;

       Text::Match::FastAlternatives can easily	perform	this test a hundred
       times faster than the equivalent	regex, if you have enough keys.	 The
       more keys it searches for, the faster it	gets compared to the regex.

       Modules like Regexp::Trie can build an optimised	version	of such	a
       regex, designed to take advantage of the	niceties of perl's regex
       engine.	With a large number of keys, this module will substantially
       outperform even an optimised regex like that.  In one real-world
       situation with 339 keys,	running	on Perl	5.8, Regexp::Trie produced a
       regex that ran 857% faster than the naive regex (according to
       Benchmark), but using Text::Match::FastAlternatives ran 18275% faster
       than the	naive regex, or	twenty times faster than Regexp::Trie's
       optimised regex.

       The enhancements	to the regex engine in Perl 5.10 include algorithms
       similar to those	in Text::Match::FastAlternatives.  However, even with
       very small sets of keys,	Perl has to do extra work to be	fully general,
       so Text::Match::FastAlternatives	is still faster.  The difference is
       greater for larger sets of keys.	 For one test with only	5 keys,
       Text::Match::FastAlternatives was 21% faster than perl-5.10.0; with 339
       keys (as	before), the difference	was 111% (that is, slightly over twice
       as fast).

	   Constructs a	matcher	that can efficiently search for	all of the
	   @keys in parallel.  Throws an exception if any of the keys are
	   undefined, or any of	them contain malformed UTF-8.

	   Returns a boolean value indicating whether the $target string
	   contains any	of the keys in $matcher.

       $matcher->match_at($target, $pos)
	   Returns a boolean value indicating whether the $target string
	   contains any	of the keys in $matcher	at position $pos.  Returns
	   false (without emitting any warning)	if $pos	is larger than the
	   length of $string.

	   Returns a boolean value indicating whether the $target string is
	   exactly equal to any	of the keys in $matcher.

       Text::Match::FastAlternatives has a "DESTROY" method implemented	in XS.
       If you write a subclass with its	own destructor,	you will need to
       invoke the base destructor, or you will leak memory.

   Interaction with Perl internals
       Text::Match::FastAlternatives may change	the Perl-internal encoding of
       strings passed to "new" or to its "match" methods.  This	is not
       considered a bug, as the	Perl-internal encoding of a string is not
       normally	of interest to Perl code (as opposed to	"perl" internals).
       However,	you may	encounter situations where preserving a	string's
       existing	encoding is important (perhaps to work around a	bug in some
       other module).  If so, you may need to copy scalar variables before
       matching	them:

	   $matches++ if $tmfa->match(my $temporary_copy = $original);

       Text::Match::FastAlternatives manages to	be so fast by using an Aho-
       Corasick	automaton internally.  The time	to search for a	match (or
       determine that there is no match) is independent	of the number of keys
       being sought; total worst-case search time is O(n) where	n is the
       length of the target string.

       The "match_at" and "exact_match"	methods	only need to find a match at
       one position, so	they have worst-case running time of O(min(n, m))
       where m is the length of	the longest key.

       Text::Match::FastAlternatives uses an adaptive technique	to minimise
       memory usage; in	the best case, each character in a key requires	only 4
       bytes of	storage	in the automaton.  This	low memory usage has the
       additional advantage of reducing	contention for CPU cache lines,
       further improving performance.

       Regexp::Trie, Regexp::Optimizer,	Regexp::Assemble, Unicode::Normalize,
       perl5100delta, perlunitut, perlunifaq.

       Aaron Crane <>

       Copyright 2006, 2007, 2008, 2010, 2012 Aaron Crane.

       This library is free software; you can redistribute it and/or modify it
       under the terms of the Artistic License,	or (at your option) under the
       terms of	the GNU	General	Public License version 2.

perl v5.32.1			  2012-12-28  Text::Match::FastAlternatives(3)


Want to link to this manual page? Use this URL:

home | help