Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordNet::Similarity::lUser3Contributed Perl DocumeWordNet::Similarity::lesk(3)

NAME
       WordNet::Similarity::lesk - Perl	module for computing semantic
       relatedness of word senses using	gloss overlaps as described by
       Banerjee	and Pedersen (2002) -- a method	that adapts the	Lesk approach
       to WordNet.

SYNOPSIS
	 use WordNet::Similarity::lesk;

	 use WordNet::QueryData;

	 my $wn	= WordNet::QueryData->new();

	 my $lesk = WordNet::Similarity::lesk->new($wn);

	 my $value = $lesk->getRelatedness("car#n#1", "bus#n#2");

	 ($error, $errorString)	= $lesk->getError();

	 die "$errorString\n" if($error);

	 print "car (sense 1) <-> bus (sense 2)	= $value\n";

DESCRIPTION
       Lesk (1985) proposed that the relatedness of two	words is proportional
       to to the extent	of overlaps of their dictionary	definitions. Banerjee
       and Pedersen (2002) extended this notion	to use WordNet as the
       dictionary for the word definitions. This notion	was further extended
       to use the rich network of relationships	between	concepts present is
       WordNet.	This adapted lesk measure has been implemented in this module.

   Methods
       $measure->initialize($file)
	   Overrides the initialize method in the parent class
	   (GlossFinder.pm). This method essentially initializes the measure
	   for use.

	   Parameters: $file --	configuration file.

	   Returns: none.

       $lesk->traceOptions()
	   This	method is internally called to determine the extra options
	   specified by	this measure (apart from the default options specified
	   in the WordNet::Similarity base class).

	   Parameters: none.

	   Returns: none.

       $lesk->getRelatedness
	   Computes the	relatedness of two word	senses using the Extended
	   Gloss Overlaps algorithm.

	   Parameters: two word	senses in "word#pos#sense" format.

	   Returns: Unless a problem occurs, the return	value is the
	   relatedness score, which is greater-than or equal-to	0. If an error
	   occurs, then	the error level	is set to non-zero and an error	string
	   is created (see the description of getError()).

   Usage
       The semantic relatedness	modules	in this	distribution are built as
       classes that define the following methods:

	 new()
	 getRelatedness()
	 getError()
	 getTraceString()

       See the WordNet::Similarity(3) documentation for	details	of these
       methods.

       Typical Usage Examples

       To create an object of the lesk measure,	we would have the following
       lines of	code in	the Perl program.

	  use WordNet::Similarity::lesk;
	  $measure = WordNet::Similarity::lesk->new($wn, '/home/sid/lesk.conf');

       The reference of	the initialized	object is stored in the	scalar
       variable	'$measure'. '$wn' contains a WordNet::QueryData	object that
       should have been	created	earlier	in the program.	The second parameter
       to the 'new' method is the path of the configuration file for the lesk
       measure.	If the 'new' method is unable to create	the object, '$measure'
       would be	undefined. This, as well as any	other error/warning may	be
       tested.

	  die "Unable to create	object.\n" if(!defined $measure);
	  ($err, $errString) = $measure->getError();
	  die $errString."\n" if($err);

       To find the semantic relatedness	of the first sense of the noun 'car'
       and the second sense of the noun	'bus' using the	measure, we would
       write the following piece of code:

	  $relatedness = $measure->getRelatedness('car#n#1', 'bus#n#2');

       To get traces for the above computation:

	  print	$measure->getTraceString();

       However,	traces must be enabled using configuration files. By default
       traces are turned off.

CONFIGURATION FILE
       The behavior of the measures of semantic	relatedness can	be controlled
       by using	configuration files. These configuration files specify how
       certain parameters are initialized within the object. A configuration
       file may	be specified as	a parameter during the creation	of an object
       using the new method. The configuration files must follow a fixed
       format.

       Every configuration file	starts with the	name of	the module ON THE
       FIRST LINE of the file. For example, a configuration file for the lesk
       module will have	on the first line 'WordNet::Similarity::lesk'. This is
       followed	by the various parameters, each	on a new line and having the
       form 'name::value'. The 'value' of a parameter is optional (in case of
       boolean parameters). In case 'value' is omitted,	we would have just
       'name::'	on that	line. Comments are supported in	the configuration
       file. Anything following	a '#' is ignored till the end of the line.

       The module parses the configuration file	and recognizes the following
       parameters:

       trace
	   The value of	this parameter specifies the level of tracing that
	   should be employed for generating the traces. This value is an
	   integer equal to 0, 1, or 2.	If the value is	omitted, then the
	   default value, 0, is	used. A	value of 0 switches tracing off. A
	   value of 1 or 2 switches tracing on.	 A value of 1 displays as
	   traces only the gloss overlaps found. A value of 2 displays as
	   traces all the text being compared.

       cache
	   The value of	this parameter specifies whether or not	caching	of the
	   relatedness values should be	performed.  This value is an integer
	   equal to  0 or 1.  If the value is omitted, then the	default	value,
	   1, is used. A value of 0 switches caching 'off', and	a value	of 1
	   switches caching 'on'.

       maxCacheSize
	   The value of	this parameter indicates the size of the cache,	used
	   for storing the computed relatedness	value. The specified value
	   must	be a non-negative integer.  If the value is omitted, then the
	   default value, 5,000, is used. Setting maxCacheSize to zero has the
	   same	effect as setting cache	to zero, but setting cache to zero is
	   likely to be	more efficient.	 Caching and tracing at	the same time
	   can result in excessive memory usage	because	the trace strings are
	   also	cached.	 If you	intend to perform a large number of
	   relatedness queries,	then you might want to turn tracing off.

       relation
	   The value of	this parameter is the path to a	file that contains a
	   list	of WordNet relations.  The path	may be either an absolute path
	   or a	relative path.

	   The lesk measure combines glosses of	synsets	related	to the target
	   synsets by these relations and then searches	for overlaps in	these
	   "super-glosses."

	   WARNING: the	format of the relation file is different for the
	   vector and lesk measures.

       stop
	   The value of	this parameter the path	of a file containing a list of
	   stop	words that should be ignored in	the glosses.  The path may be
	   either an absolute path or a	relative path.

       stem
	   The value of	this parameter indicates whether or not	stemming
	   should be performed.	 The value must	be an integer equal to 0 or 1.
	   If the value	is omitted, then the default value, 0, is used.	 A
	   value of 1 switches 'on' stemming, and a value of 0 switches
	   stemming 'off'. When	stemming is enabled, all the words of the
	   glosses are stemmed before their vectors are	created	for the	vector
	   measure or their overlaps are compared for the lesk measure.

       normalize
	   The value of	this parameter indicates whether or not	normalization
	   of scores is	performed.  The	value must be an integer equal to 0 or
	   1.  If the value is omitted,	then the default value,	0, is assumed.
	   A value of 1	switches 'on' normalizing of the score,	and a value of
	   0 switches normalizing 'off'. When normalizing is enabled, the
	   score obtained by counting the gloss	overlaps is normalized by the
	   size	of the glosses.	 The details are described in Banerjee and
	   Pedersen (2002).

RELATION FILE FORMAT
       The relation file starts	with the string	"RelationFile" on the first
       line of the file. Following this, on each consecutive line, a relation
       is specified in the form	--

       func(func(func... (func)...))-func(func(func... (func)...)) [weight]

       Where "func" can	be any one of the following functions:

	 hype()	= Hypernym of
	 hypo()	= Hyponym of
	 holo()	= Holonym of
	 mero()	= Meronym of
	 attr()	= Attribute of
	 also()	= Also see
	 sim() = Similar
	 enta()	= Entails
	 caus()	= Causes
	 part()	= Particle
	 pert()	= Pertainym of
	 glos =	gloss (without example)
	 example = example (from the gloss)
	 glosexample = gloss + example
	 syns =	synset of the concept

       Each of these specifies a WordNet relation. And the outermost function
       in the nesting can only be one of glos, example,	glosexample or syns.
       The set of functions to the left	of the "-" are applied to the first
       word sense. The functions to the	right of the "-" are applied to	the
       second word sense. An optional weight can be specified to weigh the
       contribution of that relation in	the overall score.

       For example,

	glos(hype(hypo))-example(hype) 0.5

       means that the gloss of the hypernym of the hyponym of the first	synset
       is overlapped with the example of the hypernym of the second synset to
       get the lesk score. This	score is weighted 0.5. If "glos", "example",
       "glosexample" or	"syns" is not provided as the outermost	function of
       the nesting, the	measure	assumes	"glos" as the default.

       So,

	glos(hypo(also))-glos(holo(attr))

       and

	hypo(also)-holo(attr)

       are treated the same by the measure.

SEE ALSO
       perl(1),	WordNet::Similarity(3),	WordNet::QueryData(3)

       http://www.cs.utah.edu/~sidd

       http://wordnet.princeton.edu

       http://www.ai.mit.edu/~jrennie/WordNet

       http://groups.yahoo.com/group/wn-similarity

AUTHORS
	Ted Pedersen, University of Minnesota Duluth
	tpederse at d.umn.edu

	Satanjeev Banerjee, Carnegie Mellon University,	Pittsburgh
	banerjee+ at cs.cmu.edu

	Siddharth Patwardhan, University of Utah, Salt Lake City
	sidd at	cs.utah.edu

BUGS
       None.

       To report bugs, go to http://groups.yahoo.com/group/wn-similarity/ or
       e-mail "tpederseA atA d.umn.edu".

COPYRIGHT AND LICENSE
       Copyright (c) 2005, Ted Pedersen, Satanjeev Banerjee and	Siddharth
       Patwardhan

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to

	  The Free Software Foundation,	Inc.,
	  59 Temple Place - Suite 330,
	  Boston, MA  02111-1307, USA.

       Note: a copy of the GNU General Public License is available on the web
       at <http://www.gnu.org/licenses/gpl.txt>	and is included	in this
       distribution as GPL.txt.

perl v5.24.1			  2015-10-04	  WordNet::Similarity::lesk(3)

NAME | SYNOPSIS | DESCRIPTION | CONFIGURATION FILE | RELATION FILE FORMAT | SEE ALSO | AUTHORS | BUGS | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=WordNet::Similarity::lesk&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help