Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
WordNet::Similarity(3)User Contributed Perl DocumentatioWordNet::Similarity(3)

       WordNet::Similarity - Perl modules for computing	measures of semantic

   Basic Usage Example
	 use WordNet::QueryData;

	 use WordNet::Similarity::path;

	 my $wn	= WordNet::QueryData->new;

	 my $measure = WordNet::Similarity::path->new ($wn);

	 my $value = $measure->getRelatedness("car#n#1", "bus#n#2");

	 my ($error, $errorString) = $measure->getError();

	 die $errorString if $error;

	 print "car (sense 1) <-> bus (sense 2)	= $value\n";

   Using a configuration file to initialize the	measure
	 use WordNet::Similarity::path;

	 my $sim = WordNet::Similarity::path->new($wn, "mypath.cfg");

	 my $value = $sim->getRelatedness("dog#n#1", "cat#n#1");

	 ($error, $errorString)	= $sim->getError();

	 die $errorString if $error;

	 print "dog (sense 1) <-> cat (sense 1)	= $value\n";

   Printing traces
	 print "Trace String ->	".($sim->getTraceString())."\n";

       We observe that humans find it extremely	easy to	say if two words are
       related and if one word is more related to a given word than another.
       For example, if we come across two words, 'car' and 'bicycle', we know
       they are	related	as both	are means of transport.	Also, we easily
       observe that 'bicycle' is more related to 'car' than 'fork' is. But is
       there some way to assign	a quantitative value to	this relatedness? Some
       ideas have been put forth by researchers	to quantify the	concept	of
       relatedness of words, with encouraging results.

       Eight of	these different	measures of relatedness	have been implemented
       in this software	package. A simple edge counting	measure	and a random
       measure have also been provided.	These measures rely heavily on the
       vast store of knowledge available in the	online electronic dictionary
       -- WordNet. So, we use a	Perl interface for WordNet called
       WordNet::QueryData to make it easier for	us to access WordNet. The
       modules in this package REQUIRE that the	WordNet::QueryData module be
       installed on the	system before these modules are	installed.

       The following function is defined:

       addConfigOption ($name, $required, $type, $default_val)
	   Adds	the configuration option, $name, to the	list of	known config
	   options (cf.	configure()).  If $required is true, then the option
	   requires a value; otherwise,	the value is optional, and the default
	   value $default_val is used if a value is not	specified in the
	   config file.	 $type is the type of value the	option takes.  It can
	   be 'i' for integer, 'f' for floating-point, 's' for string, or 'p'
	   for a file name.

	   returns: nothing, but will "die" on error.  You can put the call to
	   this	function in an "eval" block to trap the	exception (N.B., the
	   "eval BLOCK"	form of	"eval" does not	significantly degrade
	   performance,	unlike the "eval EXPR" form of "eval".	See "perldoc
	   -f eval").

       The following methods are defined in this package:

       Public methods

       $obj->new ($wn, $config_file)
	   The constructor for WordNet::Similarity::* objects.

	   Parameters: $wn is a	WordNet::QueryData object, $config_file	is a
	   configuration file (optional).

	   Return value: the new blessed object

       $obj->initialize	($config_file)
	   Performs some initialization	on the module.

	   Parameter: the location of a	configuration file

	   Returns: nothing

	   Parses a configuration file.

	   If you write	a module and want to add a new configuration option,
	   you can use the addConfigOption function to specify the name	and
	   nature of the option.

	   The value of	the option is place in "self": $self->{optionname}.

	   parameter: a	file name

	   returns: true if parsing of config file was successful, false on

	   Returns the current trace string and	resets the trace string	to
	   empty.  If tracing is turned	off, then an empty string will always
	   be returned.

	   Checks to see if any	errors have occurred.  Returns a list of the
	   form	($level,A $string).  If	$level is 0, then no errors have
	   occurred; if	$level is non-zero, then an error has occurred.	 A
	   value of 1 is considered a warning, and a value of 2	is considered
	   an error.  If $level	is non-zero, then $string will have a
	   (hopefully) meaningful error	message.

	   Prints module-specific options to the trace string.	Any module
	   that	adds configuration options via addConfigOption should override
	   this	method.

	   Options should be printed out using the following format:

	     $self->{traceString} .= "option_name :: $option_value\n"

	   Note	that the option	name is	separated from its current value by a
	   space, two colons, and another space.  The string should be
	   terminated by a newline.

	   Since multiple modules may be overriding this method, any module
	   that	overrides this method should insure that the superclass'
	   method gets called as well.	You do this by putting this line at
	   the end of your method:


	   returns: nothing

       $obj->parseWps($synset1,	$synset2)
	   parameters: synset1,	synset2

	   returns: a reference	to an array [$word1, $pos1, $sense1, $offset1,
	   $word2, $pos2, $sense2, $offset2] or	undef

	   This	method checks the format of the	two input synsets by calling
	   validateSynset() for	each synset.

	   If the synsets are in wps format, a reference to an array will be
	   returned.  This array has the form [$word1, $pos1, $sense1,
	   $offset1, $word2, $pos2, $sense2, $offset2] where $word1 is the
	   word	part of	$wps1, $pos1, is the part of speech of $wps1, $sense1
	   is the sense	from $wps.  $offset1 is	the offset for $wps1.

	   If an error occurs (such as a synset	being poorly-formed), then
	   undef is returned, the error	level is set to	non-zero, and an error
	   message is appended to the error string.

	   parameter: synset

	   returns: a list or undef on error

	   synset is a string in word#pos#sense	format

	   This	method does the	following:

	   1.  Verifies	that the synset	is well-formed (i.e., that it consists
	       of three	parts separated	by #s, the pos is one of {n, v,	a, r}
	       and that	sense is a natural number).  A synset that matches the
	       pattern '[^\#]+\#[nvar]\#\d+' is	considered well-formed.

	   2.  Checks if the synset exists by trying to	find the offset	for
	       the synset

	   If any of these tests fails,	then the error level is	set to non-
	   zero, a message is appended to the error string, and	undef is

	   If the synset is well-formed	and exists, then a list	is returned
	   that	has the	format ($word, $pos, $sense, $offset).

       $obj->getRelatedness($synset1, $synset2)
	   parameters: synset1,	synset2

	   returns: a relatedness score

	   This	is a virtual method. It	must be	overridden by a	module that is
	   derived from	this class. This method	takes two synsets and returns
	   a numeric value as their score of relatedness.

       $obj->printSet ($pos, $mode, @synsets)
	   If tracing is turned	on, prints the contents	of @synsets to the
	   trace string.  The contents of @synsets can be either wps strings
	   or offsets.	If they	are wps	strings, then $mode must be the	string
	   'wps'; if they are offsets, then the	mode must be 'offset'.	Please
	   don't try to	mix wps	and offsets.

	   Returns the string that was appended	to the trace string.

       $obj->fetchFromCache($wps1, $wps2, $non_symmetric)
	   Looks for the relatedness value of ($wps1, $wps2) in	the cache.  If
	   $non_symmetric is false (or isn't specified), then the cache	is
	   searched for	($wps2,	$wps1) if ($wps1, $wps2) isn't found.

	   Returns: a relatedness value	or undef if none found in the cache.

       $obj->storeToCache ($wps1, $wps2, $score)
	   Stores the relatedness value, $score, of ($wps1, $wps2) to the

	   Returns: nothing

       This package consists of	Perl modules along with	supporting Perl
       programs	that implement the semantic relatedness	measures described by
       Leacock Chodorow	(1998),	Jiang Conrath (1997), Resnik (1995), Lin
       (1998), Wu Palmer (1993), Hirst St-Onge (1998) the Extended Gloss
       Overlaps	measure	by Banerjee and	Pedersen (2002)	and a Gloss Vector
       measure recently	introduced by Patwardhan and Pedersen. The package
       contains	Perl modules designed as object	classes	with methods that take
       as input	two word senses. The semantic distance between these word
       senses is returned by these methods. A quantitative measure of the
       degree to which two word	senses are related has wide ranging
       applications in numerous	areas, such as word sense disambiguation,
       information retrieval, etc. For example,	in order to determine which
       sense of	a given	word is	being used in a	particular context, the	sense
       having the highest relatedness with its context word senses is most
       likely to be the	sense being used. Similarly, in	information retrieval,
       retrieving documents containing highly related concepts are more	likely
       to have higher precision	and recall values.

       A command line interface	to these modules is also present in the
       package.	The simple, user-friendly interface simply returns the
       relatedness measure of two given	words. Number of switches and options
       have been provided to modify the	output and enhance it with trace
       information and other useful output. Support programs for generating
       information content files from various corpora are also available in
       the package. The	information content files are required by three	of the
       measures	for computing the relatedness of concepts.  There is also a
       tool to find the	depths of the taxonomies in WordNet.

       Configuration files

       The behavior of the measures of semantic	relatedness can	be controlled
       by using	configuration files. These configuration files specify how
       certain parameters are initialized within the object. A configuration
       file may	be specified as	a parameter during the creation	of an object
       using the new method. The configuration files must follow a fixed

       Every configuration file	starts with the	name of	the module ON THE
       FIRST LINE of the file. For example, a configuration file for the res
       module will have	on the first line 'WordNet::Similarity::res'. This is
       followed	by the various parameters, each	on a new line and having the
       form 'name::value'. The 'value' of a parameter is optional (in case of
       boolean parameters). In case 'value' is omitted,	we would have just
       'name::'	on that	line. Comments are supported in	the configuration
       file. Anything following	a '#' is ignored in the	configuration file.

       Sample configuration files are present in the '/samples'	subdirectory
       of the package. Each of the modules has specific	parameters that	can be
       set/reset using the configuration files.	Please read the	manpages or
       the perldocs of the respective modules for details on the parameters
       specific	to each	of the modules.	For instance, 'man
       WordNet::Similarity::res' or 'perldoc WordNet::Similarity::res' should
       display the documentation for the Resnik	module.	 The module parses the
       configuration file and recognizes the following parameters:

	   This	option is supported by all measures.

	   The value of	this parameter specifies the level of tracing that
	   should be employed for generating the traces. This value is an
	   integer equal to 0, 1, or 2.	If the value is	omitted, then the
	   default value, 0, is	used. A	value of 0 switches tracing off. A
	   value of 1 or 2 switches tracing on.	 The difference	between	a
	   value of 1 or 2 depends upon	the measure being used.

	   For vector and lesk,	a value	of 1 displays as traces	only the gloss
	   overlaps found. A value of 2	displays as traces all the text	being

	   For the res,	lin, jcn, wup, lch, path, and hso measures, a trace of
	   level 1 means the synsets are represented as	word#pos#sense
	   strings, while for level 2, the synsets are represented as
	   word#pos#offset strings.

	   This	option is supported by all measures.

	   The value of	this parameter specifies whether or not	caching	of the
	   relatedness values should be	performed.  This value is an integer
	   equal to  0 or 1.  If the value is omitted, then the	default	value,
	   1, is used. A value of 0 switches caching 'off', and	a value	of 1
	   switches caching 'on'.

	   This	option is supported by all measures.

	   The value of	this parameter indicates the size of the cache,	used
	   for storing the computed relatedness	value. The specified value
	   must	be a non-negative integer.  If the value is omitted, then the
	   default value, 5,000, is used. Setting maxCacheSize to zero has the
	   same	effect as setting cache	to zero, but setting cache to zero is
	   likely to be	more efficient.	 Caching and tracing at	the same time
	   can result in excessive memory usage	because	the trace strings are
	   also	cached.	 If you	intend to perform a large number of
	   relatedness queries,	then you might want to turn tracing off.

       The semantic relatedness	modules	in this	distribution are built as
       classes.	 The classes define four methods that are useful in finding
       relatedness values for pairs of synsets.


       Typical Usage Examples

       To create an object of the Resnik measure, we would have	the following
       lines of	code in	the Perl program.

	  use WordNet::Similarity::res;
	  $object = WordNet::Similarity::res->new($wn, '~/resnik.conf');

       The reference of	the initialized	object is stored in the	scalar
       variable	'$object'. '$wn' contains a WordNet::QueryData object that
       should have been	created	earlier	in the program.	The second parameter
       to the 'new' method is the path of the configuration file for the
       resnik measure. If the 'new' method is unable to	create the object,
       '$object' would be undefined. This, as well as any other	error/warning
       may be tested.

	  die "Unable to create	resnik object.\n" unless defined $object;
	  ($err, $errString) = $object->getError();
	  die $errString."\n" if($err);

       To create a Leacock-Chodorow measure object, using default values, i.e.
       no configuration	file, we would have the	following:

	  use WordNet::Similarity::lch;
	  $measure = WordNet::Similarity::lch->new($wn);

       To find the semantic relatedness	of the first sense of the noun 'car'
       and the second sense of the noun	'bus' using the	resnik measure,	we
       would write the following piece of code:

	  $relatedness = $object->getRelatedness('car#n#1', 'bus#n#2');

       To get traces for the above computation:

	  print	$object->getTraceString();

       However,	traces must be enabled using configuration files. By default
       traces are turned off.

	 Ted Pedersen, University of Minnesota Duluth
	 tpederse at

	 Siddharth Patwardhan, University of Utah, Salt	Lake City
	 sidd at

	 Jason Michelizzi, Univeristy of Minnesota Duluth
	 mich0212 at

	 Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
	 banerjee+ at


       To submit a bug report, go to or send e-mail to tpederse

       perl(1),	WordNet::Similarity::jcn(3), WordNet::Similarity::res(3),
       WordNet::Similarity::lin(3), WordNet::Similarity::lch(3),
       WordNet::Similarity::hso(3), WordNet::Similarity::lesk(3),
       WordNet::Similarity::wup(3), WordNet::Similarity::path(3),
       WordNet::Similarity::random(3), WordNet::Similarity::ICFinder(3),
       WordNet::Similarity::PathFinder(3) WordNet::QueryData(3)

       Copyright (c) 2005, Ted Pedersen, Siddharth Patwardhan, Jason
       Michelizzi and Satanjeev	Banerjee

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to

	   The Free Software Foundation, Inc.,
	   59 Temple Place - Suite 330,
	   Boston, MA  02111-1307, USA.

       Note: a copy of the GNU General Public License is available on the web
       at <>	and is included	in this
       distribution as GPL.txt.

perl v5.24.1			  2015-10-04		WordNet::Similarity(3)


Want to link to this manual page? Use this URL:

home | help