Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TEXT_SIMILARITY(1)    User Contributed Perl Documentation   TEXT_SIMILARITY(1)

NAME
       text_simlarity.pl - Measure the pair-wise similarity between files or
       strings

SYNOPSIS
	text_similarity.pl --type Text::Similarity::Overlaps --normalize
				--string '.......this is one' '????this	is two'

	text_similarity.pl --type Text::Similarity::Overlaps --no-normalize
				--string '.......this is one' '????this	is two'

	text_similarity.pl --type Text::Similarity::Overlaps
				--string 'sir winston churchill' 'Churchill, Winston Sir'

	text_similarity.pl --type Text::Similarity::Overlaps ../GPL.txt	../FDL.txt

	text_similarity.pl --verbose --type Text::Similarity::Overlaps ../GPL.txt ../FDL.txt

	text_similarity.pl --verbose --stoplist	stoplist.txt --type Text::Similarity::Overlaps
			       ../GPL.txt ../FDL.txt

	text_similarity.pl [[--verbose]	[--stoplist=FILE] [--no-normalize] [--string]]
			       --type=TYPE | --help | --version] FILE1 FILE2

DESCRIPTION
       This script is a	simple command-line interface to the Text::Similarity
       Perl modules. A method for computing similarity must be specified via
       the --type option, and then that	method is used to measure the
       similarity of two strings or two	files.

       Text::Similarity::Overlaps measures similarity by counting the number
       of words	that overlap (match) between the two inputs, without regard to
       order. So, all of the following strings would have the same pairwise
       similarity (they	would each have	a raw score of 4 relative to each
       other, meaning that 4 words are overlapping or matching).

	winston	churchill was here
	here was winston churchill
	winston	was here churchill

       By default Text::Similarity::Overlaps returns a normalized F-measure
       between 0 and 1.	Normalization can be turned off	by specifying
       --no-normalize. It returns various other	overlap	based scores if	you
       specify --verbose.

OPTIONS
       --type=TYPE
	   The type of text similarity measure.	 Valid values include:

	       Text::Similarity::Overlaps

       --stoplist=FILE
	   The name of a file containing stop words. Under the ./sample
	   directory, we give two formats of the stop words format, one	word
	   per line(stoplist.txt) and one word in the regular expression
	   format per line(stoplist-nsp.regex).	If you want to mix these two
	   formats to make your	own stop words file, it	is also	all right.

       --no-normalize
	   Do not normalize scores.  Normally, scores are normalized so	that
	   they	range from 0 to	1.  Using this option will give	you a raw
	   score instead.

       --string
	   Input will be provided on the command line as strings, not files.

       --verbose
	   Show	all the	matches	that are found between the files, their	length
	   and frequency, as well as precision,	recall,	F-measure, E-measure,
	   Cosine, and the Dice	Coefficient.

       --help
	   Show	a detailed help	message.

       --version
	   Show	version	information.

AUTHORS
	Ted Pedersen, University of Minnesota, Duluth
	tpederse at d.umn.edu

	Jason Michelizzi

	Ying Liu, University of	Minnesota, Twin	Cities
	liux0395 at umn.edu

       Last modified by: $Id: text_similarity.pl,v 1.1.1.1 2013/06/26 02:38:12
       tpederse	Exp $

BUGS
       --compfile is not working, seems	to cause hang (tdp 3/21/08)

COPYRIGHT AND LICENSE
       Copyright (C) 2004-2010,	Jason Michelizzi, Ted Pedersen and Ying	Liu

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA

perl v5.32.1			  2013-06-26		    TEXT_SIMILARITY(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | AUTHORS | BUGS | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=text_similarity.pl&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help