Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
AI::DecisionTree(3)   User Contributed Perl Documentation  AI::DecisionTree(3)

NAME
       AI::DecisionTree	- Automatically	Learns Decision	Trees

VERSION
       version 0.11

SYNOPSIS
	 use AI::DecisionTree;
	 my $dtree = new AI::DecisionTree;

	 # A set of training data for deciding whether to play tennis
	 $dtree->add_instance
	   (attributes => {outlook     => 'sunny',
			   temperature => 'hot',
			   humidity    => 'high'},
	    result => 'no');

	 $dtree->add_instance
	   (attributes => {outlook     => 'overcast',
			   temperature => 'hot',
			   humidity    => 'normal'},
	    result => 'yes');

	 ... repeat for	several	more instances,	then:
	 $dtree->train;

	 # Find	results	for unseen instances
	 my $result = $dtree->get_result
	   (attributes => {outlook     => 'sunny',
			   temperature => 'hot',
			   humidity    => 'normal'});

DESCRIPTION
       The "AI::DecisionTree" module automatically creates so-called "decision
       trees" to explain a set of training data.  A decision tree is a kind of
       categorizer that	use a flowchart-like process for categorizing new
       instances.  For instance, a learned decision tree might look like the
       following, which	classifies for the concept "play tennis":

			  OUTLOOK
			  /  |	\
			 /   |	 \
			/    |	  \
		  sunny/  overcast \rainy
		      /	     |	    \
		 HUMIDITY    |	     WIND
		 /  \	    *no*     /	\
		/    \		    /	 \
	   high/      \normal	   /	  \
	      /	       \    strong/	   \weak
	    *no*      *yes*	 /	    \
			       *no*	   *yes*

       (This example, and the inspiration for the "AI::DecisionTree" module,
       come directly from Tom Mitchell's excellent book	"Machine Learning",
       available from McGraw Hill.)

       A decision tree like this one can be learned from training data,	and
       then applied to previously unseen data to obtain	results	that are
       consistent with the training data.

       The usual goal of a decision tree is to somehow encapsulate the
       training	data in	the smallest possible tree.  This is motivated by an
       "Occam's	Razor" philosophy, in which the	simplest possible explanation
       for a set of phenomena should be	preferred over other explanations.
       Also, small trees will make decisions faster than large trees, and they
       are much	easier for a human to look at and understand.  One of the
       biggest reasons for using a decision tree instead of many other machine
       learning	techniques is that a decision tree is a	much more scrutable
       decision	maker than, say, a neural network.

       The current implementation of this module uses an extremely simple
       method for creating the decision	tree based on the training instances.
       It uses an Information Gain metric (based on expected reduction in
       entropy)	to select the "most informative" attribute at each node	in the
       tree.  This is essentially the ID3 algorithm, developed by J. R.
       Quinlan in 1986.	 The idea is that the attribute	with the highest
       Information Gain	will (probably)	be the best attribute to split the
       tree on at each point if	we're interested in making small trees.

METHODS
   Building and	Querying the Tree
       new(...parameters...)
	   Creates a new decision tree object and returns it.  Accepts the
	   following parameters:

	   noise_mode
	       Controls	the behavior of	the "train()" method when "noisy" data
	       is encountered.	Here "noisy" means that	two or more training
	       instances contradict each other,	such that they have identical
	       attributes but different	results.

	       If "noise_mode" is set to "fatal" (the default),	the "train()"
	       method will throw an exception (die).  If "noise_mode" is set
	       to "pick_best", the most	frequent result	at each	noisy node
	       will be selected.

	   prune
	       A boolean "prune" parameter which specifies whether the tree
	       should be pruned	after training.	 This is usually a good	idea,
	       so the default is to prune.  Currently we prune using a simple
	       minimum-description-length criterion.

	   verbose
	       If set to a true	value, some status information will be output
	       while training a	decision tree.	Default	is false.

	   purge
	       If set to a true	value, the "do_purge()"	method will be invoked
	       during "train()".  The default is true.

	   max_depth
	       Controls	the maximum depth of the tree that will	be created
	       during "train()".  The default is 0, which means	that trees of
	       unlimited depth can be constructed.

       add_instance(attributes => \%hash, result => $string, name => $string)
	   Adds	a training instance to the set of instances which will be used
	   to form the tree.  An "attributes" parameter	specifies a hash of
	   attribute-value pairs for the instance, and a "result" parameter
	   specifies the result.

	   An optional "name" parameter	lets you give a	unique name to each
	   training instance.  This can	be used	in coordination	with the
	   "set_results()" method below.

       train()
	   Builds the decision tree from the list of training instances.  If a
	   numeric "max_depth" parameter is supplied, the maximum tree depth
	   can be controlled (see also the "new()" method).

       get_result(attributes =>	\%hash)
	   Returns the most likely result (from	the set	of all results given
	   to "add_instance()")	for the	set of attribute values	given.	An
	   "attributes"	parameter specifies a hash of attribute-value pairs
	   for the instance.  If the decision tree doesn't have	enough
	   information to find a result, it will return	"undef".

       do_purge()
	   Purges training instances and their associated information from the
	   DecisionTree	object.	 This can save memory after training, and
	   since the training instances	are implemented	as C structs, this
	   turns the DecisionTree object into a	pure-perl data structure that
	   can be more easily saved with "Storable.pm",	for instance.

       purge()
	   Returns true	or false depending on the value	of the tree's "purge"
	   property.  An optional boolean argument sets	the property.

       copy_instances(from => $other_tree)
	   Allows two trees to share the same set of training instances.  More
	   commonly, this lets you train one tree, then	re-use its instances
	   in another tree (possibly changing the instance "result" values
	   using "set_results()"), which is much faster	than re-populating the
	   second tree's instances from	scratch.

       set_results(\%results)
	   Given a hash	that relates instance names to instance	result values,
	   change the result values as specified.

   Tree	Introspection
       instances()
	   Returns a reference to an array of the training instances used to
	   build this tree.

       nodes()
	   Returns the number of nodes in the trained decision tree.

       depth()
	   Returns the depth of	the tree.  This	is the maximum number of
	   decisions that would	need to	be made	to classify an unseen
	   instance, i.e. the length of	the longest path from the tree's root
	   to a	leaf.  A tree with a single node would have a depth of zero.

       rule_tree()
	   Returns a data structure representing the decision tree.  For
	   instance, for the tree diagram above, the following data structure
	   is returned:

	    [ 'outlook', {
		'rain' => [ 'wind', {
		    'strong' =>	'no',
		    'weak' => 'yes',
		} ],
		'sunny'	=> [ 'humidity', {
		    'normal' =>	'yes',
		    'high' => 'no',
		} ],
		'overcast' => 'yes',
	    } ]

	   This	is slightly remniscent of how XML::Parser returns the parsed
	   XML tree.

	   Note	that while the ordering	in the hashes is unpredictable,	the
	   nesting is in the order in which the	criteria will be checked at
	   decision-making time.

       rule_statements()
	   Returns a list of strings that describe the tree in rule-form.  For
	   instance, for the tree diagram above, the following list would be
	   returned (though not	necessarily in this order - the	order is
	   unpredictable):

	     if	outlook='rain' and wind='strong' -> 'no'
	     if	outlook='rain' and wind='weak' -> 'yes'
	     if	outlook='sunny'	and humidity='normal' -> 'yes'
	     if	outlook='sunny'	and humidity='high' -> 'no'
	     if	outlook='overcast' -> 'yes'

	   This	can be helpful for scrutinizing	the structure of a tree.

	   Note	that while the order of	the rules is unpredictable, the	order
	   of criteria within each rule	reflects the order in which the
	   criteria will be checked at decision-making time.

       as_graphviz()
	   Returns a "GraphViz"	object representing the	tree.  Requires	that
	   the GraphViz	module is already installed, of	course.	 The object
	   returned will allow you to create PNGs, GIFs, image maps, or
	   whatever graphical representation of	your tree you might want.

	   A "leaf_colors" argument can	specify	a fill color for each leaf
	   node	in the tree.  The keys of the hash should be the same as the
	   strings appearing as	the "result" parameters	given to
	   "add_instance()", and the values should be any GraphViz-style color
	   specification.

	   Any additional arguments given to "as_graphviz()" will be passed on
	   to GraphViz's "new()" method.  See the GraphViz docs	for more info.

LIMITATIONS
       A few limitations exist in the current version.	All of them could be
       removed in future versions - especially with your help. =)

       No continuous attributes
	   In the current implementation, only discrete-valued attributes are
	   supported.  This means that an attribute like "temperature" can
	   have	values like "cool", "medium", and "hot", but using actual
	   temperatures	like 87	or 62.3	is not going to	work.  This is because
	   the values would split the data too finely -	the tree-building
	   process would probably think	that it	could make all its decisions
	   based on the	exact temperature value	alone, ignoring	all other
	   attributes, because each temperature	would have only	been seen once
	   in the training data.

	   The usual way to deal with this problem is for the tree-building
	   process to figure out how to	place the continuous attribute values
	   into	a set of bins (like "cool", "medium", and "hot") and then
	   build the tree based	on these bin values.  Future versions of
	   "AI::DecisionTree" may provide support for this.  For now, you have
	   to do it yourself.

TO DO
       All the stuff in	the LIMITATIONS	section.  Also,	revisit	the pruning
       algorithm to see	how it can be improved.

AUTHOR
       Ken Williams, ken@mathforum.org

SEE ALSO
       Mitchell, Tom (1997).  Machine Learning.	 McGraw-Hill. pp 52-80.

       Quinlan,	J. R. (1986).  Induction of decision trees.  Machine Learning,
       1(1), pp	81-106.

       perl, GraphViz

perl v5.24.1			  2012-03-03		   AI::DecisionTree(3)

NAME | VERSION | SYNOPSIS | DESCRIPTION | METHODS | LIMITATIONS | TO DO | AUTHOR | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=AI::DecisionTree&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help