Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Text::Similarity(3)   User Contributed Perl Documentation  Text::Similarity(3)

NAME
       Text::Similarity	- Measure the pair-wise	Similarity of Files or Strings

SYNOPSIS
	     # this will return	an un-normalized score that just gives the
	     # number of overlaps by default (or F1 if normalize is set),
	     # plus a hash table of other scores, with the following keys
	     # 'wc1', 'wc2', 'raw', 'precision', 'recall', 'F',	'dice',	'E', 'cosine', 'raw_lesk','lesk'
	     # wc1 and wc2 are respective word counts; see Overlaps.pm for definitions of other	scores

	     use Text::Similarity::Overlaps;
	     my	$mod = Text::Similarity::Overlaps->new;
	     defined $mod or die "Construction of Text::Similarity::Overlaps failed";

	     # adjust file names to reflect true relative position
	     # these paths are valid from lib/Text/Similarity
	     my	$text_file1 = 'Overlaps.pm';
	     my	$text_file2 = '../OverlapFinder.pm';

	     my	$score = $mod->getSimilarity ($text_file1, $text_file2);
	     print "The	similarity of $text_file1 and $text_file2 is : $score\n";

	     my	($score1, %allScores) =	$mod->getSimilarity ($text_file1, $text_file2);
	     print "The	raw similarity of $text_file1 and $text_file2 is : $allScores{'raw'}\n";
	     print "The	lesk score of $text_file1 and $text_file2 is : $allScores{'lesk'}\n";

	     # if you want to turn on the verbose options and provide a	stoplist
	     # you can pass those parameters to	Overlaps.pm via	hash arguments

	     # the verbose option causes extra scores to be printed to STDERR

	     use Text::Similarity::Overlaps;
	     my	%options = ('verbose' => 1, 'stoplist' => '../../samples/stoplist.txt');

	     my	$mod = Text::Similarity::Overlaps->new (\%options);
	     defined $mod or die "Construction of Text::Similarity::Overlaps failed";

	     # adjust file names to reflect true relative position
	     # these paths are valid from lib/Text/Similarity
	     my	$text_file1 = 'Overlaps.pm';
	     my	$text_file2 = '../OverlapFinder.pm';

	     my	($score, %allScores) = $mod->getSimilarity ($text_file1, $text_file2);
	     print "The	raw similarity of $text_file1 and $text_file2 is : $allScores{'raw'}\n";
	     print "The	lesk score of $text_file1 and $text_file2 is : $allScores{'lesk'}\n";

DESCRIPTION
       This module is a	superclass for other modules and provides generic
       services	such as	stop word removal, compound identification, and	text
       cleaning	or sanitizing.

       It's important to realize that additional methods of measuring
       similarity can be added to this package.	Text::Similarity::Overlaps is
       just one	possible way of	measuring similarity, others can be added.

       Subroutine sanitizeString carries out text cleaning. Briefly, it
       removes nearly all punctuation except for underscores and embedded
       apostrophes, converts all text to lower case, and collapes multiple
       white spaces to a single	space.

       This module is where compounds are identified (although currently
       disabled). When implemented it will check a list	of compounds provided
       by the user, and	then when a compound is	found in the text it will be
       desigated via an	underscore (e.g., white	house might be converted to
       white_house).

       Stop words are removed here. The	length of the documents	reported does
       not include the stop words. Overlaps are	found after stopword removal.
       By including a word in the stoplist, you	are saying that	the word never
       existed in your input (in effect).

BUGS
       o   Compoundify and stemming currently not supported.

       o   Granularity option in getSimilarity not supported.

       o   Cleaning should probably be optional.

SEE ALSO
       <http://text-similarity.sourceforge.net>

AUTHORS
	Ted Pedersen, University of Minnesota, Duluth
	tpederse at d.umn.edu

	Siddharth Patwardhan, University of Utah
	sidd at	cs.utah.edu

	Jason Michelizzi

	Ying Liu, University of	Minnesota, Twin	Cities
	liux0395 at umn.edu

       Last modified by	: $Id: Similarity.pm,v 1.4 2015/10/08 13:22:13
       tpederse	Exp $

COPYRIGHT AND LICENSE
       Copyright (C) 2004-2010,	Ted Pedersen, Jason Michelizzi,	Siddharth
       Patwardhan, and Ying Liu

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA

perl v5.32.1			  2015-10-08		   Text::Similarity(3)

NAME | SYNOPSIS | DESCRIPTION | BUGS | SEE ALSO | AUTHORS | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Text::Similarity&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help