Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordNet::Similarity::IUserdContributed Perl DoWordNet::Similarity::ICFinder(3)

NAME
       WordNet::Similarity::ICFinder - a module	for finding the	information
       content of concepts in WordNet

SYNOPSIS
	use WordNet::QueryData;
	my $wn = WordNet::QueryData->new;
	defined	$wn or die "Construction of WordNet::QueryData failed";

	use WordNet::Similarity::ICFinder;
	my $obj	= WordNet::Similarity::ICFinder->new ($wn);
	my ($err, $errString) =	$obj->getError ();
	$err and die $errString;

	my $wps1 = 'cat#n#1';
	my $wps2 = 'feline#n#1';

	my $offset1 = $wn -> offset ($wps1);
	my $offset2 = $wn -> offset ($wps2);

	# using	the wps	mode

	my $ic	 = $obj->IC ($wps1, 'n', 'wps');
	my $prob = $obj->probability ($wps1, 'n', 'wps');
	my $freq = $obj->getFrequency ($wps1, 'n', 'wps');
	print "$wps1 has frequency $freq, probability $prob, and IC $ic\n";

	my $ic	 = $obj->IC ($wps2, 'n', 'wps');
	my $prob = $obj->probability ($wps2, 'n', 'wps');
	my $freq = $obj->getFrequency ($wps2, 'n', 'wps');
	print "$wps2 has frequency $freq, probability $prob, and IC $ic\n";

	my @lcsbyic = $obj -> getLCSbyIC($wps1,$wps2,'n','wps');
	print "$wps1 and $wps2 have LCS	$lcsbyic[0]->[0] with IC $lcsbyic[0]->[1]\n";

	# doing	the same thing in the offset mode

	my $ic	 = $obj->IC ($offset1, 'n', 'offset');
	my $prob = $obj->probability ($offset1,	'n', 'offset');
	my $freq = $obj->getFrequency ($offset1, 'n', 'offset');
	print "$offset1	has frequency $freq, probability $prob,	and IC $ic\n";

	my $ic	 = $obj->IC ($offset2, 'n', 'offset');
	my $prob = $obj->probability ($offset2,	'n', 'offset');
	my $freq = $obj->getFrequency ($offset2, 'n', 'offset');
	print "$offset2	has frequency $freq, probability $prob,	and IC $ic\n";

	my @lcsbyic = $obj -> getLCSbyIC($offset1,$offset2,'n','wps');
	print "$offset1	and $offset2 have LCS $lcsbyic[0]->[0] with IC $lcsbyic[0]->[1]\n";

DESCRIPTION
   Introduction
       Three of	the measures provided within the package require information
       content values of concepts (WordNet synsets) for	computing the semantic
       relatedness of concepts.	Resnik (1995) describes	a method for computing
       the information content of concepts from	large corpora of text. In
       order to	compute	information content of concepts, according to the
       method described	in the paper, we require the frequency of occurrence
       of every	concept	in a large corpus of text. We provide these frequency
       counts to the three measures (Resnik, Jiang-Conrath and Lin measures)
       in files	that we	call information content files.	These files contain a
       list of WordNet synset offsets along with their part of speech and
       frequency count.	The files are also used	to determine the topmost nodes
       of the noun and verb 'is-a' hierarchies in WordNet. The information
       content file to be used is specified in the configuration file for the
       measure.	If no information content file is specified, then the default
       information content file, generated at the time of the installation of
       the WordNet::Similarity modules,	is used. A description of the format
       of these	files follows. The FIRST LINE of this file must	contain	the
       hash-code of WordNet the	the file was created with. This	should be
       present as a string of the form

	 wnver::<hashcode>

       For example, if WordNet version 2.1 with	the hash-code
       LL1BZMsWkr0YOuiewfbiL656+Q4 was used for	creation of the	information
       content file, the following line	would be present at the	start of the
       information content file.

	 wnver::LL1BZMsWkr0YOuiewfbiL656+Q4

       The rest	of the file contains on	each line, a WordNet synset offset,
       part-of-speech and a frequency count, of	the form

	 <offset><part-of-speech> <frequency> [ROOT]

       without any leading or trailing spaces. For example, one	of the lines
       of an information content file may be as	follows.

	 63723n	667

       where '63723' is	a noun synset offset and 667 is	its frequency count.
       Suppose the noun	synset with offset 1740	is the root node of one	of the
       noun taxonomies and has a frequency count of 17625. Then	this synset
       would appear in an information content file as follows:

	 1740n 17625 ROOT

       The ROOT	tags are extremely significant in determining the top of the
       hierarchies and must not	be omitted. Typically, frequency counts	for
       the noun	and verb hierarchies are present in each information content
       file.  A	number of support programs to generate these files from
       various corpora are present in the '/utils' directory of	the package. A
       sample information content file has been	provided in the	'/samples'
       directory of the	package.

   Methods
       The following methodes are provided by this module.

       Public Methods

       $module->traceOptions ()
	   Prints status of configuration options specific to this module to
	   the trace string.  This module has only one such options:
	   infocontent.

       $module->probability ($synset, $pos, $mode)
	   Returns the probability of $synset in a corpus (using frequency
	   values from whatever	information content file is being used).  If
	   $synset is a	wps string, then $mode must be 'wps'; if $synset is an
	   offset, then	$mode must be 'offset'.

       $module->IC ($synset, $pos, $mode)
	   Returns the information content of $synset.	If $synset is a	wps
	   string, then	$mode must be 'wps'; if	$synset	is an offset, then
	   $mode must be 'offset'.

       $module->getFrequency ($synset, $pos, $mode)
	   Returns the frequency of $synset in whatever	information content
	   file	is currently being used.

	   If $synset is a wps string, then the	mode must be 'wps'; if $synset
	   is an offset, then $mode must be 'offset'.

	   Usually the "IC()" and "probability()" methods will be more useful
	   than	this method.  This method is useful in determining if the
	   frequency of	a synset was 0.

       getLCSbyIC($synset1, $synset2, $pos, $mode)
	   Given two input synsets, finds the least common subsumer (LCS) of
	   them.  If there are multiple	candidates for the LCS,	the the
	   candidate with the greatest information content.

	   Parameters: two synsets, a part of speech, and a mode.

	   Returns: a list of the form ($lcs, $ic) where $lcs is the LCS and
	   $ic is the information content of the LCS.

       $module->configure ()
	   Overrides the configure method of WordNet::Similarity to process
	   the information content file	(also calles
	   WordNet::Similarity::configure() so that all	the work done by that
	   method is still accomplished).

       Private Methods

       $module->_loadInfoContentFile ($file)
	   Subroutine to load frequency	counts from an information content
	   file.

       $module->_isValidInfoContentFile	($filename)
	   Subroutine that checks the validity of an information content file.

AUTHORS
	 Ted Pedersen, University of Minnesota Duluth
	 tpederse at d.umn.edu

	 Jason Michelizzi, Univeristy of Minnesota Duluth
	 mich0212 at d.umn.edu

	 Siddharth Patwardhan, University of Utah, Salt	Lake City
	 sidd at cs.utah.edu

BUGS
       None.

       To report a bug e-mail tpederse at d.umn.edu or go to
       http://groups.yahoo.com/group/wn-similarity/.

SEE ALSO
       WordNet::Similarity(3) WordNet::Similarity::res(3)
       WordNet::Similarity::lin(3) WordNet::Similarity::jcn(3)

COPYRIGHT
       Copyright (c) 2005, Ted Pedersen, Jason Michelizzi and Siddharth
       Patwardhan

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to

	   The Free Software Foundation, Inc.,
	   59 Temple Place - Suite 330,
	   Boston, MA  02111-1307, USA.

       Note: a copy of the GNU General Public License is available on the web
       at <http://www.gnu.org/licenses/gpl.txt>	and is included	in this
       distribution as GPL.txt.

perl v5.32.0			  2008-03-27  WordNet::Similarity::ICFinder(3)

NAME | SYNOPSIS | DESCRIPTION | AUTHORS | BUGS | SEE ALSO | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=WordNet::Similarity::ICFinder&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help