Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordNet::Similarity::PUseriContributed Perl WordNet::Similarity::PathFinder(3)

NAME
       WordNet::Similarity::PathFinder - module	to implement path finding
       methods (by node	counting) for WordNet::Similarity measures of semantic
       relatedness

SYNOPSIS
	use WordNet::QueryData;
	my $wn = WordNet::QueryData->new;

	use WordNet::Similarity::PathFinder;
	my $obj	= WordNet::Similarity::PathFinder->new ($wn);

	my $wps1 = 'winston_churchill#n#1';
	my $wps2 = 'england#n#1';

	# parseWps returns reference to	an array that contains
	# word1	pos1 sense1 offset1 word2 pos2 sense2 offset2

	my $result = $obj->parseWps($wps1, $wps2);
	print "@$result\n";

	# path is a reference to an array that contains	the path between
	# wps1 and wps2	expressed as a series of wps values

	my @paths = $obj->getShortestPath($wps1, $wps2,	'n', 'wps');
	my ($length, $path) = @{shift @paths};
	defined	$path or die "No path between synsets";
	print "shortest	path between $wps1 and $wps2 is	$length	edges long\n";
	print "@$path\n";

	my $offset1 = $wn -> offset($wps1);
	my $offset2 = $wn -> offset($wps2);

	# path is a reference to an array that contains	the path between
	# offset1 and offset2 expressed	as a series of offset values

	my @paths = $obj->getShortestPath($offset1, $offset2, 'n', 'offset');
	my ($length, $path) = @{shift @paths};
	defined	$path or die "No path between synsets";
	print "shortest	path between $offset1 and $offset2 is $length edges  long\n";
	print "@$path\n";

DESCRIPTION
   Introduction
       This class is derived from (i.e., is a sub-class	of)
       WordNet::Similarity.

       The methods in this module are useful for finding paths between
       concepts	in WordNet's 'is-a' taxonomies.	 Concept A is-a	concept	B if,
       and only	if, B is a hypernym of A or A is in the	hypernym tree of B.
       N.B., only nouns	and verbs have hypernyms.

       The methods that	find path lengths (such	as "getShortestPath()" and
       "getAllPaths()" compute the lengths using node-counting not edge-
       counting.  In general, the length of a path using node-counting will
       always be one more than the length using	edge-counting.	For example,
       if concept A is a hyponym of concept B, then the	path length between A
       and B using node-counting is 2, but the length using edge-counting is
       1.  Likewise, the path between A	and A is 1 using node-counting and 0
       using edge-counting.

   Methods
       This module inherits all	the methods of WordNet::Similarity.
       Additionally, the following methods are also defined.

       Public methods

       $measure->setPosList()
	   Specifies the parts of speech that measures derived from this
	   module support (namely, nouns and verbs).

	   parameters: none

	   returns: true

       $self->traceOptions()
	   Overrides method of same name in WordNet::Similarity.  Prints
	   module-specific configuration options to the	trace string (if
	   tracing is on).  PathFinder supports	one module specific option:
	   rootNode.

	   Parameters: none

	   returns: nothing

       $measure->parseWps($synset1, $synset2)
	   parameters: synset1,	synset2	-- two synsets in wps format

	   returns: a reference	to an array, WordNet::Similarity::UNRELATED,
	   or undef

	   Overrides the parseWps() method in WordNet::Similarity in order to
	   run additional checks, but calls WordNet::Similarity::parseWps() to
	   get those checks accomplished as well.  Thus, this method does
	   everything that WordNet::Similarity::parseWps does.

	   quote from WordNet::Similarity::parseWps:
	       This method checks the format of	the two	input synsets by
	       calling validateSynset()	for each synset.

	       If the synsets are in wps format, a reference to	an array will
	       be returned.  This array	has the	form [$word1, $pos1, $sense1,
	       $offset1, $word2, $pos2,	$sense2, $offset2] where $word1	is the
	       word part of $wps1, $pos1, is the part of speech	of $wps1,
	       $sense1 is the sense from $wps.	$offset1 is the	offset for
	       $wps1.

	       If an error occurs (such	as a synset being poorly-formed), then
	       undef is	returned, the error level is set to non-zero, and an
	       error message is	appended to the	error string.

	   In addition,	if the two synsets are from different parts of speech,
	   then	WordNet::Similarity::UNRELATED is returned, the	error level is
	   set to 1, and a message is appended to the error string.

	   If either synset is not a noun or a verb, then the error level is
	   set to 1, a message is appended to the error	string,	and undef is
	   returned.

	   If the synsets are in wps format, a reference to an array will be
	   returned.  This array has the form [$word1, $pos1, $sense1,
	   $offset1, $word2, $pos2, $sense2, $offset2].

       $measure->getShortestPath($synset1, $synset2, $pos, $mode)
	   Given two input synsets, returns the	shortest path between the two
	   synsets.

	   Parameters: two synsets, a part-of-speech, and a mode indicator
	   (i.e., the string 'offset' or 'wps').  If the mode is 'offset',
	   then	the synsets should be WordNet offsets.	If the mode is 'wps',
	   then	the synsets should be in word#pos#sense	format.

	   Returns: a list of references to arrays.  Each array	has the	form
	   "($path_length, $path_ref)" where $path_ref is a reference to an
	   array whose elements	are the	synsets	along the shortest path
	   between the two input synsets.  There will be as many array
	   references returned as there	are shortest paths between the
	   synsets.  That is, there will be no arrays returned if there	is no
	   path	between	the synsets, and there will be at least	one array
	   returned if there is	a path between the synsets.  If	there are
	   multiple paths tied for being shortest in length, then all those
	   paths are returned (hence, this is why multiple array references
	   can be returned).

	   Upon	error, returns undef, sets the error level to non-zero,	and
	   appends a message to	the error string.

       $measure->getAllPaths($synset1, $synset2, $pos, $mode)
	   Given two input synsets, returns all	the paths between the two
	   synsets.

	   Parameters: a reference to the object, two synsets, a part-of-
	   speech, and a mode indicator	(the string 'offset' or	'wps').

	   If the mode is 'offset', then the synsets should be WordNet
	   offsets.  If	the mode is 'wps', then	they should be strings in
	   word#pos#sense format.

	   Returns: A list of all paths, sorted	by path	length in ascending
	   order.  The format for each item in the list	is a reference to an
	   array that has the format: [$top, $length, [@synsets_list]] where
	   @synset_list	is a list of synsets along the path (including the two
	   input synsets)

	   Returns undef on error.

       $measure->validateSynset($synset)
	   parameters: synset -- a string in word#pos#sense format

	   returns: a list or undef on error

	   This	method overrides the method of the same	name in
	   WordNet::Similarity to provide additional behavior but calls
	   WordNet::Similarity::validateSynset to accomplish that method's
	   behavior.  Thus, this method	does everything	that
	   WordNet::Similarity::validateSynset does.

	   quote from WordNet::Similarity::validateSynset:
	       This method does	the following:

	       1.  Verifies that the synset is well-formed (i.e., that it
		   consists of three parts separated by	#s, the	pos is one of
		   {n, v, a, r}	and that sense is a natural number).  A	synset
		   that	matches	the pattern '[^\#]+\#[nvar]\#\d+' is
		   considered well-formed.

	       2.  Checks if the synset	exists by trying to find the offset
		   for the synset

	   This	method,	however, has a slightly	different return value.
	   Instead of merely breaking the synset into three parts, it returns
	   the "safe" form of the synset.  That	is, if a synset	has multiple
	   word	senses,	this method returns the	first word sense in that
	   synset (this	is so that other path-finding methods work properly).
	   For example,	if the input to	this method is auto#n#1, the return
	   value is ('car', 'n', 1, 2853224) since the sense 'car#n#1' is the
	   first member	of the synset to which 'auto#n#1' belongs.

	   If any of these tests fails,	then the error level is	set to non-
	   zero, a message is appended to the error string, and	undef is
	   returned.

       Private methods

       $measure->_getHypernymTrees($synset, $pos, $mode)
	   This	method takes as	input a	synset and returns a list of
	   references to arrays	where these arrays are paths from the input
	   synset to the top of	the taxonomy (*Root*#[nv]#1 if the root	node
	   is on).

	   Parameters: a synset, a part-of-speech, and a mode.	The mode must
	   be either the string	'wps' or 'offset'.  If the mode	is 'wps', then
	   the synset must be in wps format; otherwise,	it must	be an offset.

	   Returns: a list of references to arrays.  These arrays are paths
	   (hypernym trees).

       getLCSbyPath($synset1, $synset2,	$pos, $mode)
	   Given two input synsets, finds the least common subsumer (LCS) of
	   them.  If there are multiple	candidates for the LCS (due to
	   multiple inheritance), the LCS that results in the shortest path
	   between in input concepts is	chosen.

	   Parameters: two synsets, a part of speech, and a mode.

	   Returns: a list of references to arrays where each array has	the
	   from	"($lcs,	$pathlength)".	$pathlength is the length of the path
	   between the two input concepts.  There can be multiple LCSs
	   returned if there are ties for the shortest path between the	two
	   synsets.  Returns undef on error.

       $measure->_getSubsumerFromTrees($treeref1, $treeref2, $mode)
	   This	subroutine returns takes two trees as produced by
	   getHypernymTrees and	returns	the most specific subsumer from	them.

	   Parameters: two references to arrays, and a string indicating mode
	   ('wps' or 'offset').

	   Returns: the	subsumer or undef

       getDepth()
	   This	method is non-functional and likely to be moved	to a different
	   module soon.

   Discussion
       Many of the methods in this module can work with	either offsets or wps
       strings internally.  There are several interesting consequences of each
       mode.

       1.  An offset is	not a unique identifier	for a synset, but neither is a
	   wps string.	An offset only indicates a byte	offset in one of the
	   WordNet data	files (data.noun, data.verb, etc. on Unix-like
	   systems).  An offset	along with a part of speech, however, does
	   uniquely identify a synset.

	   A word#pos#sense string, on the other hand, is the opposite
	   extreme.  A word#pos#sense string is	an identifier for a unique
	   word	sense.	A synset can have several word senses in it (i.e., a
	   synset is a set of word senses that are synonymous).	 The synset
	   {beer_mug#n#1, stein#n#1} has two word senses.  The wps strings
	   'beer_mug#n#1' and 'stein#n#1' can both be used to refer to the
	   synset.  For	simplicity, we usually just use	the first wps string
	   when	referring to the synset.  N.B.,	the wps	representation was
	   developed by	WordNet::QueryData.

       2.  Early versions of WordNet::Similarity::* used offsets internally
	   for finding paths, hypernym trees, subsumers, etc.  The module
	   WordNet::QueryData that is used by Similarity, however, accepts
	   only	wps strings as input to	its querySense method, which is	used
	   to find hypernyms.  We have found that it is	more efficient
	   (faster) to use wps strings internally.

AUTHORS
	Ted Pedersen, University of Minnesota Duluth
	tpederse at d.umn.edu

	Jason Michelizzi, University of	Minnesota Duluth
	mich0212 at d.umn.edu

	Siddharth Patwardhan, University of Utah, Salt Lake City
	sidd at	cs.utah.edu

BUGS
       None.

SEE ALSO
       WordNet::Similarity(3) WordNet::Similarity::path(3)
       WordNet::Similarity::lch(3) WordNet::Similarity::wup(3)

COPYRIGHT
       Copyright (c) 2005, Ted Pedersen, Siddharth Patwardhan and Jason
       Michelizzi

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to

	   The Free Software Foundation, Inc.,
	   59 Temple Place - Suite 330,
	   Boston, MA  02111-1307, USA.

       Note: a copy of the GNU General Public License is available on the web
       at <http://www.gnu.org/licenses/gpl.txt>	and is included	in this
       distribution as GPL.txt.

perl v5.24.1			  2008-03-27WordNet::Similarity::PathFinder(3)

NAME | SYNOPSIS | DESCRIPTION | AUTHORS | BUGS | SEE ALSO | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=WordNet::Similarity::PathFinder&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help