Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MojoMojo::Declaw(3)   User Contributed Perl Documentation  MojoMojo::Declaw(3)

NAME
       MojoMojo::Declaw	- Cleans HTML as well as CSS of	scripting and other
       executable contents, and	neutralises XSS	attacks.  Derived from
       HTML::Defang version 1.01.

SYNOPSIS
	 my $InputHtml = "<html><body></body></html>";

	 my $Defang = MojoMojo::Declaw->new(
	   context => $Self,
	   fix_mismatched_tags => 1,
	   tags_to_callback => [ br embed img ],
	   tags_callback => \&DefangTagsCallback,
	   url_callback	=> \&DefangUrlCallback,
	   css_callback	=> \&DefangCssCallback,
	   attribs_to_callback => [ qw(border src) ],
	   attribs_callback => \&DefangAttribsCallback
	 );

	 my $SanitizedHtml = $Defang->defang($InputHtml);

	 # Callback for	custom handling	specific HTML tags
	 sub DefangTagsCallback	{
	   my ($Self, $Defang, $OpenAngle, $lcTag, $IsEndTag, $AttributeHash, $CloseAngle, $HtmlR, $OutR) = @_;
	   return 1 if $lcTag eq 'br';	  # Explicitly defang this tag,	eventhough safe
	   return 0 if $lcTag eq 'embed'; # Explicitly whitelist this tag, eventhough unsafe
	   return 2 if $lcTag eq 'img';	  # I am not sure what to do with this tag, so process as HTML::Defang normally	would
	 }

	 # Callback for	custom handling	URLs in	HTML attributes	as well	as style tag/attribute declarations
	 sub DefangUrlCallback {
	   my ($Self, $Defang, $lcTag, $lcAttrKey, $AttrValR, $AttributeHash, $HtmlR) =	@_;
	   return 0 if $$AttrValR =~ /safesite.com/i; #	Explicitly allow this URL in tag attributes or stylesheets
	   return 1 if $$AttrValR =~ /evilsite.com/i; #	Explicitly defang this URL in tag attributes or	stylesheets
	 }

	 # Callback for	custom handling	style tags/attributes
	 sub DefangCssCallback {
	   my ($Self, $Defang, $Selectors, $SelectorRules, $Tag, $IsAttr) = @_;
	   my $i = 0;
	   foreach (@$Selectors) {
	     my	$SelectorRule =	$$SelectorRules[$i];
	     foreach my	$KeyValueRules (@$SelectorRule)	{
	       foreach my $KeyValueRule	(@$KeyValueRules) {
		 my ($Key, $Value) = @$KeyValueRule;
		 $$KeyValueRule[2] = 1 if $Value =~ '!important';		   # Comment out any '!important' directive
		 $$KeyValueRule[2] = 1 if $Key =~ 'position' &&	$Value =~ 'fixed'; # Comment out any 'position=fixed;' declaration
	       }
	     }
	     $i++;
	   }
	 }

	 # Callback for	custom handling	HTML tag attributes
	 sub DefangAttribsCallback {
	   my ($Self, $Defang, $lcTag, $lcAttrKey, $AttrValR, $HtmlR) =	@_;
	   $$AttrValR =	'0' if $lcAttrKey eq 'border';	# Change all 'border' attribute	values to zero.
	   return 1 if $lcAttrKey eq 'src';		# Defang all 'src' attributes
	   return 0;
	 }

DESCRIPTION
       This module accepts an input HTML and/or	CSS string and removes any
       executable code including scripting, embedded objects, applets, etc.,
       and neutralises any XSS attacks.	A whitelist based approach is used
       which means only	HTML known to be safe is allowed through.

       HTML::Defang uses a custom html tag parser. The parser has been
       designed	and tested to work with	nasty real world html and to try and
       emulate as close	as possible what browsers actually do with strange
       looking constructs. The test suite has been built based on examples
       from a range of sources such as http://ha.ckers.org/xss.html and
       http://imfo.ru/csstest/css_hacks/import.php to ensure that as many as
       possible	XSS attack scenarios have been dealt with.

       HTML::Defang can	make callbacks to client code when it encounters the
       following:

       o   When	a specified tag	is parsed

       o   When	a specified attribute is parsed

       o   When	a URL is parsed	as part	of an HTML attribute, or CSS property
	   value.

       o   When	style data is parsed, as part of an HTML style attribute, or
	   as part of an HTML <style> tag.

       The callbacks include details about the current tag/attribute that is
       being parsed, and also gives a scalar reference to the input HTML.
       Querying	pos() on the input HTML	should indicate	where the module is
       with parsing. This gives	the client code	flexibility in working with
       HTML::Declaw.

       HTML::Declaw can	defang whole tags, any attribute in a tag, any URL
       that appear as an attribute or style property, or any CSS declaration
       in a declaration	block in a style rule. This helps one to precisely
       block the most specific unwanted	elements in the	contents(for example,
       block just an offending attribute instead of the	whole tag), while
       retaining any safe HTML/CSS.

CONSTRUCTOR
       MojoMojo::Declaw-_new(%Options)
	   Constructs a	new HTML::Declaw object. The following options are
	   supported:

	   Options
	       tags_to_callback
		   Array reference of tags for which a call back should	be
		   made. If a tag in this array	is parsed, the subroutine
		   tags_callback() is invoked.

	       attribs_to_callback
		   Array reference of tag attributes for which a call back
		   should be made. If an attribute in this array is parsed,
		   the subroutine attribs_callback() is	invoked.

	       tags_callback
		   Subroutine reference	to be invoked when a tag listed	in
		   @$tags_to_callback is parsed.

	       attribs_callback
		   Subroutine reference	to be invoked when an attribute	listed
		   in @$attribs_to_callback is parsed.

	       url_callback
		   Subroutine reference	to be invoked when a URL is detected
		   in an HTML tag attribute or a CSS property.

	       css_callback
		   Subroutine reference	to be invoked when CSS data is found
		   either as the contents of a 'style' attribute in an HTML
		   tag,	or as the contents of a	<style>	HTML tag.

	       fix_mismatched_tags
		   This	property, if set, fixes	mismatched tags	in the HTML
		   input. By default, tags present in the default
		   %mismatched_tags_to_fix hash	are fixed. This	set of tags
		   can be overridden by	passing	in an array reference
		   $mismatched_tags_to_fix to the constructor. Any opened tags
		   in the set are automatically	closed if no corresponding
		   closing tag is found. If an unbalanced closing tag is
		   found, that is commented out.

	       mismatched_tags_to_fix
		   Array reference of tags for which the code would check for
		   matching opening and	closing	tags. See the property
		   $fix_mismatched_tags.

	       context
		   You can pass	an arbitrary scalar as a 'context' value
		   that's then passed as the first parameter to	all callback
		   functions. Most commonly this is something like '$Self'

	       Debug
		   If set, prints debugging output.

CALLBACK METHODS
       COMMON PARAMETERS
	   A number of the callbacks share the same parameters.	These common
	   parameters are documented here. Certain variables may have specific
	   meanings in certain callbacks, so be	sure to	check the
	   documentation for that method first before referring	this section.

	   $context
	       You can pass an arbitrary scalar	as a 'context' value that's
	       then passed as the first	parameter to all callback functions.
	       Most commonly this is something like '$Self'

	   $Defang
	       Current HTML::Declaw instance

	   $OpenAngle
	       Opening angle(<)	sign of	the current tag.

	   $lcTag
	       Lower case version of the HTML tag that is currently being
	       parsed.

	   $IsEndTag
	       Has the value '/' if the	current	tag is a closing tag.

	   $AttributeHash
	       A reference to a	hash containing	the attributes of the current
	       tag and their values. Each value	is a scalar reference to the
	       value, rather than just a scalar	value. You can add attributes
	       (remember to make it a scalar ref, eg $AttributeHash{"newattr"}
	       = \"newval"), delete attributes,	or modify attribute values in
	       this hash, and any changes you make will	be incorporated	into
	       the output HTML stream.

	       The attribute values will have any entity references decoded
	       before being passed to you, and any unsafe values we be re-
	       encoded back into the HTML stream.

	       So for instance,	the tag:

		 <div title="&lt;&quot;Hi there	&#x003C;">

	       Will have the attribute hash:

		 { title => \q[<"Hi there <] }

	       And will	be turned back into the	HTML on	output:

		 <div title="&lt;&quot;Hi there	&lt;">

	   $CloseAngle
	       Anything	after the end of last attribute	including the closing
	       HTML angle(>)

	   $HtmlR
	       A scalar	reference to the input HTML. The input HTML is parsed
	       using m/\G$SomeRegex/c constructs, so to	continue from where
	       HTML:Defang left, clients can use m/\G$SomeRegex/c for further
	       processing on the input.	This will resume parsing from where
	       HTML::Declaw left. One can also use the pos() function to
	       determine where HTML::Declaw left off. This combined with the
	       add_to_output() method should give reasonable flexibility for
	       the client to process the input.

	   $OutR
	       A scalar	reference to the processed output HTML so far.

       tags_callback($context, $Defang,	$OpenAngle, $lcTag, $IsEndTag,
       $AttributeHash, $CloseAngle, $HtmlR, $OutR)
	   If $Defang->{tags_callback} exists, and HTML::Declaw	has parsed a
	   tag preset in $Defang->{tags_to_callback}, the above	callback is
	   made	to the client code. The	return value of	this method determines
	   whether the tag is defanged or not. More details below.

	   Return values
	       0   The current tag will	not be defanged.

	       1   The current tag will	be defanged.

	       2   The current tag will	be processed normally by HTML:Defang
		   as if there was no callback method specified.

       attribs_callback($context, $Defang, $lcTag, $lcAttrKey, $AttrVal,
       $HtmlR, $OutR)
	   If $Defang->{attribs_callback} exists, and HTML::Declaw has parsed
	   an attribute	present	in $Defang->{attribs_to_callback}, the above
	   callback is made to the client code.	The return value of this
	   method determines whether the attribute is defanged or not. More
	   details below.

	   Method parameters
	       $lcAttrKey
		   Lower case version of the HTML attribute that is currently
		   being parsed.

	       $AttrVal
		   Reference to	the HTML attribute value that is currently
		   being parsed.

		   See $AttributeHash for details of decoding.

	   Return values
	       0   The current attribute will not be defanged.

	       1   The current attribute will be defanged.

	       2   The current attribute will be processed normally by
		   HTML:Defang as if there was no callback method specified.

       url_callback($context, $Defang, $lcTag, $lcAttrKey, $AttrVal,
       $AttributeHash, $HtmlR, $OutR)
	   If $Defang->{url_callback} exists, and HTML::Declaw has parsed a
	   URL,	the above callback is made to the client code. The return
	   value of this method	determines whether the attribute containing
	   the URL is defanged or not. URL callbacks can be made from <style>
	   tags	as well	style attributes, in which case	the particular style
	   declaration will be commented out. More details below.

	   Method parameters
	       $lcAttrKey
		   Lower case version of the HTML attribute that is currently
		   being parsed. However if this callback is made as a result
		   of parsing a	URL in a style attribute, $lcAttrKey will be
		   set to the string style, or will be set to undef if this
		   callback is made as a result	of parsing a URL inside	a
		   style tag.

	       $AttrVal
		   Reference to	the URL	value that is currently	being parsed.

	       $AttributeHash
		   A reference to a hash containing the	attributes of the
		   current tag and their values. Each value is a scalar
		   reference to	the value, rather than just a scalar value.
		   You can add attributes (remember to make it a scalar	ref,
		   eg $AttributeHash{"newattr"}	= \"newval"), delete
		   attributes, or modify attribute values in this hash,	and
		   any changes you make	will be	incorporated into the output
		   HTML	stream.	Will be	set to undef if	the callback is	made
		   due to URL in a <style> tag or attribute.

	   Return values
	       0   The current URL will	not be defanged.

	       1   The current URL will	be defanged.

	       2   The current URL will	be processed normally by HTML:Defang
		   as if there was no callback method specified.

       css_callback($context, $Defang, $Selectors, $SelectorRules, $lcTag,
       $IsAttr,	$OutR)
	   If $Defang->{css_callback} exists, and HTML::Declaw has parsed a
	   <style> tag or style	attribtue, the above callback is made to the
	   client code.	The return value of this method	determines whether a
	   particular declaration in the style rules is	defanged or not. More
	   details below.

	   Method parameters
	       $Selectors
		   Reference to	an array containing the	selectors in a style
		   tag or attribute.

	       $SelectorRules
		   Reference to	an array containing the	style declaration
		   blocks of all selectors in a	style tag or attribute.
		   Consider the	below CSS:

		     a { b:c; d:e}
		     j { k:l; m:n}

		   The declaration blocks will get parsed into the following
		   data	structure:

		     [
		       [
			 [ "b",	"c", 2],
			 [ "d",	"e", 2]
		       ],
		       [
			 [ "k",	"l", 2],
			 [ "m",	"n", 2]
		       ]
		     ]

		   So, generally each property:value pair in a declaration is
		   parsed into an array	of the form

		     ["property", "value", X]

		   where X can be 0, 1 or 2, and 2 the default value. A	client
		   can manipulate this value to	instruct HTML::Declaw to
		   defang this property:value pair.

		   0 - Do not defang

		   1 - Defang the style:property value

		   2 - Process this as if there	is no callback specified

	       $IsAttr
		   True	if the currently processed item	is a style attribute.
		   False if the	currently processed item is a style tag.

METHODS
       PUBLIC METHODS
	   defang($InputHtml)
	       Cleans up $InputHtml of any executable code including
	       scripting, embedded objects, applets, etc., and defang any XSS
	       attacks.

	       Method parameters
		   $InputHtml
		       The input HTML string that needs	to be sanitized.

	       Returns the cleaned HTML. If fix_mismatched_tags	is set,	any
	       tags that appear	in @$mismatched_tags_to_fix that are
	       unbalanced are automatically commented or closed.

	   add_to_output($String)
	       Appends $String to the output after the current parsed tag
	       ends. Can be used by client code	in callback methods to add
	       HTML text to the	processed output. If the HTML text needs to be
	       defanged, client	code can safely	call HTML::Declaw->defang()
	       recursively from	within the callback.

	       Method parameters
		   $String
		       The string that is added	after the current parsed tag
		       ends.

       defang_and_add_to_output
	   defang and add result to output

       INTERNAL	METHODS
	   Generally these methods never need to be called by users of the
	   class, because they'll be called internally as the appropriate tags
	   are encountered, but	they may be useful for some users in some
	   cases.

	   defang_script($OutR,	$HtmlR,	$TagOps, $OpenAngle, $IsEndTag,	$Tag,
	   $TagTrail, $Attributes, $CloseAngle)
	       This method is invoked when a <script> tag is parsed. Defangs
	       the <script> opening tag, and any closing tag. Any scripting
	       content is also commented out, so browsers don't	display	them.

	       Returns 1 to indicate that the <script> tag must	be defanged.

	       Method parameters
		   $OutR
		       A reference to the processed output HTML	before the tag
		       that is currently being parsed.

		   $HtmlR
		       A scalar	reference to the input HTML.

		   $TagOps
		       Indicates what operation	should be done on a tag. Can
		       be undefined, integer or	code reference.	Undefined
		       indicates an unknown tag	to HTML::Declaw, 1 indicates a
		       known safe tag, 0 indicates a known unsafe tag, and a
		       code reference indicates	a subroutine that should be
		       called to parse the current tag.	For example, <style>
		       and <script> tags are parsed by dedicated subroutines.

		   $OpenAngle
		       Opening angle(<)	sign of	the current tag.

		   $IsEndTag
		       Has the value '/' if the	current	tag is a closing tag.

		   $Tag
		       The HTML	tag that is currently being parsed.

		   $TagTrail
		       Any space after the tag,	but before attributes.

		   $Attributes
		       A reference to an array of the attributes and their
		       values, including any surrouding	spaces.	Each element
		       of the array is added by	'push' calls like below.

			 push @$Attributes, [ $AttributeName, $SpaceBeforeEquals, $EqualsAndSubsequentSpace, $QuoteChar, $AttributeValue, $QuoteChar, $SpaceAfterAtributeValue ];

		   $CloseAngle
		       Anything	after the end of last attribute	including the
		       closing HTML angle(>)

	   defang_style($OutR, $HtmlR, $TagOps,	$OpenAngle, $IsEndTag, $Tag,
	   $TagTrail, $Attributes, $CloseAngle,	$IsAttr)
	       Builds a	list of	selectors and declarations from	HTML style
	       tags as well as style attributes	in HTML	tags and calls
	       defang_stylerule() to do	the actual defanging.

	       Returns 0 to indicate that style	tags must not be defanged.

	       Method parameters
		   $IsAttr
		       Whether we are currently	parsing	a style	attribute or
		       style tag. $IsAttr will be true if we are currently
		       parsing a style attribute.

		   For a description of	other parameters, see documentation of
		   defang_script() method

	   cleanup_style($StyleString)
	       Helper function to clean	up CSS data. This function directly
	       operates	on the input string without taking a copy.

	       Method parameters
		   $StyleString
		       The input style string that is cleaned.

	   defang_stylerule($SelectorsIn, $StyleRules, $lcTag, $IsAttr,
	   $HtmlR, $OutR)
	       Defangs style data.

	       Method parameters
		   $SelectorsIn
		       An array	reference to the selectors in the style
		       tag/attribute contents.

		   $StyleRules
		       An array	reference to the declaration blocks in the
		       style tag/attribute contents.

		   $lcTag
		       Lower case version of the HTML tag that is currently
		       being parsed.

		   $IsAttr
		       Whether we are currently	parsing	a style	attribute or
		       style tag. $IsAttr will be true if we are currently
		       parsing a style attribute.

		   $HtmlR
		       A scalar	reference to the input HTML.

		   $OutR
		       A scalar	reference to the processed output so far.

	   defang_attributes($OutR, $HtmlR, $TagOps, $OpenAngle, $IsEndTag,
	   $Tag, $TagTrail, $Attributes, $CloseAngle)
	       Defangs attributes, defangs tags, does tag, attrib, css and url
	       callbacks.

	       Method parameters
		   For a description of	the method parameters, see
		   documentation of defang_script() method

	   cleanup_attribute($AttributeString)
	       Helper function to cleanup attributes

	       Method parameters
		   $AttributeString
		       The value of the	attribute.

   get_applicable_charset
       Get the charset from the	content	meta attribute?

SEE ALSO
       HTML::Defang, <http://mailtools.anomy.net/>,
       <http://htmlcleaner.sourceforge.net/>, HTML::StripScripts,
       HTML::Detoxifier, HTML::Sanitizer, HTML::Scrubber

AUTHOR
       Kurian Jose Aerthail <cpan@kurianja.fastmail.fm>. Thanks	to Rob Mueller
       <cpan@robm.fastmail.fm> for initial code, guidance and support and bug
       fixes.

COPYRIGHT AND LICENSE
       HTML::Declaw is a modifed version of HTML::Defang which has the
       following license:

       Copyright (C) 2003-2009 by The FastMail Partnership

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

perl v5.24.1			  2013-05-12		   MojoMojo::Declaw(3)

NAME | SYNOPSIS | DESCRIPTION | CONSTRUCTOR | CALLBACK METHODS | METHODS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=MojoMojo::Declaw&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help