Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help

       Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes	-
       Split long regexps into smaller "qr//" chunks.

       This Policy is part of the core Perl::Critic distribution.

       Big regexps are hard to read, perhaps even the hardest part of Perl.  A
       good practice to	write digestible chunks	of regexp and put them
       together.  This policy flags any	regexp that is longer than "N"
       characters, where "N" is	a configurable value that defaults to 60.  If
       the regexp uses the "x" flag, then the length is	computed after parsing
       out any comments	or whitespace.

       Unfortunately the use of	descriptive (and therefore longish) variable
       names can cause regexps to be in	violation of this policy, so
       interpolated variables are counted as 4 characters no matter how	long
       their names actually are.

       As an example, look at the regexp used to match email addresses in
       Email::Valid::Loose (tweaked lightly to wrap for	POD)


       which is	constructed from the following code:

	   my $esc	   = '\\\\';
	   my $period	   = '\.';
	   my $space	   = '\040';
	   my $open_br	   = '\[';
	   my $close_br	   = '\]';
	   my $nonASCII	   = '\x80-\xff';
	   my $ctrl	   = '\000-\037';
	   my $cr_list	   = '\n\015';
	   my $qtext	   = qq/[^$esc$nonASCII$cr_list\"]/; # "
	   my $dtext	   = qq/[^$esc$nonASCII$cr_list$open_br$close_br]/;
	   my $quoted_pair = qq<$esc>.qq<[^$nonASCII]>;
	   my $atom_char   = qq/[^($space)<>\@,;:\".$esc$open_br$close_br$ctrl$nonASCII]/;# "
	   my $atom	   = qq<$atom_char+(?!$atom_char)>;
	   my $quoted_str  = qq<\"$qtext*(?:$quoted_pair$qtext*)*\">; #	"
	   my $word	   = qq<(?:$atom|$quoted_str)>;
	   my $domain_ref  = $atom;
	   my $domain_lit  = qq<$open_br(?:$dtext|$quoted_pair)*$close_br>;
	   my $sub_domain  = qq<(?:$domain_ref|$domain_lit)>;
	   my $domain	   = qq<$sub_domain(?:$period$sub_domain)*>;
	   my $local_part  = qq<$word(?:$word|$period)*>; # This part is modified
	   $Addr_spec_re   = qr<$local_part\@$domain>;

       If you read the code from bottom	to top,	it is quite readable.  And,
       you can even see	the one	violation of RFC822 that Tatsuhiko Miyagawa
       deliberately put	into Email::Valid::Loose to allow periods.  Look for
       the "|\." in the	upper regexp to	see that same deviation.

       One could certainly argue that the top regexp could be re-written more
       legibly with "m//x" and comments.  But the bottom version is self-
       documenting and,	for example, doesn't repeat "\x80-\xff"	18 times.
       Furthermore, it's much easier to	compare	the second version against the
       source BNF grammar in RFC 822 to	judge whether the implementation is
       sound even before running tests.

       This policy allows regexps up to	"N" characters long, where "N"
       defaults	to 60.	You can	override this to set it	to a different number
       with the	"max_characters" setting.  To do this, put entries in a
       .perlcriticrc file like this:

	   max_characters = 40

       Initial development of this policy was supported	by a grant from	the
       Perl Foundation.

       Chris Dolan <>

       Copyright (c) 2007-2011 Chris Dolan.  Many rights reserved.

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.  The full text of this license can
       be found	in the LICENSE file included with this module

perl v5.32.Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)


Want to link to this manual page? Use this URL:

home | help