Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TREEBANKFREQ(1)	      User Contributed Perl Documentation      TREEBANKFREQ(1)

NAME
       treebankFreq.pl - Compute Information Content from Penn Treebank	2

SYNOPSIS
	treebankFreq.pl	[--outfile=OUTFILE [--stopfile=STOPFILE]
	      [--wnpath=WNPATH]	[--resnik] [--smooth=SCHEME] PATH
	       | --help	--version]

DESCRIPTION
       This program reads the Penn Treebank, Release 2,	from the Linguistic
       Data Consortium,	<http://ldc.upenn.edu>,	and computes the frequency
       counts for each synset in WordNet. These	frequency counts are used by
       the Lin,	Resnik,	and Jiang & Conrath measures of	semantic relatedness
       to calculate the	information content values of concepts.	The output is
       generated in a format as	required by the	WordNet::Similarity modules
       for computing semantic relatedness.

       A more detailed description of how information content is calculated
       can be found in rawtextFreq.pl. This program uses exactly the same
       techniques as described there.

OPTIONS
       --outfile=filename

	   The name of a file to which output should be	written

       --stopfile=filename

	   A file containing a list of stop listed words that will not be
	   considered in the frequency counts.	A sample file can be down-
	   loaded from
	   http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt

       --wnpath=path

	   Location of the WordNet data	files (e.g.,
	   /usr/local/WordNet-3.0/dict)

       --resnik

	   Use Resnik (1995) frequency counting

       --smooth=SCHEME

	   Smoothing should used on the	probabilities computed.	 SCHEME	can
	   only	be ADD1	at this	time

       --help

	   Show	a help message

       --version

	   Display version information

       PATH

	   Path	to the raw Wall	Stree Journal portion of the Treebank corpus.
	   This	is usually in the /raw/wsj subdirectory	of the Treebank
	   installation.  Thus,	you might run this program as

	       treebankFreq.pl [OPTIONS] /home/sid/treebank/raw/wsj

BUGS
       Report to WordNet::Similarity mailing list :
	<http://groups.yahoo.com/group/wn-similarity>

SEE ALSO
       WordNet::Similarity

       Penn Treebank :
	<http://ldc.upenn.edu>,

       WordNet home page :
	<http://wordnet.princeton.edu>

       WordNet::Similarity home	page :
	<http://wn-similarity.sourceforge.net>

AUTHORS
	Ted Pedersen, University of Minnesota, Duluth
	tpederse at d.umn.edu

	Satanjeev Banerjee, Carnegie Mellon University,	Pittsburgh
	banerjee+ at cs.cmu.edu

	Siddharth Patwardhan, University of Utah, Salt Lake City
	sidd at	cs.utah.edu

COPYRIGHT
       Copyright (c) 2005-2008,	Ted Pedersen, Satanjeev	Banerjee, and
       Siddharth Patwardhan

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.  This	program	is distributed in the hope
       that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty	of MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.	 See the GNU General Public License for	more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place - Suite 330, Boston, MA	02111-1307, USA.

perl v5.32.0			  2020-08-09		       TREEBANKFREQ(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | BUGS | SEE ALSO | AUTHORS | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=treebankFreq.pl&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help