Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Lingua::StopWords(3)  User Contributed Perl Documentation Lingua::StopWords(3)

NAME
       Lingua::StopWords - Stop	words for several languages.

SYNOPSIS
	   use Lingua::StopWords qw( getStopWords );
	   my $stopwords = getStopWords('en');

	   my @words = qw( i am	the walrus goo goo g'joob );

	   # prints "walrus goo	goo g'joob"
	   print join '	', grep	{ !$stopwords->{$_} } @words;

DESCRIPTION
       In keyword search, it is	common practice	to suppress a collection of
       "stopwords": words such as "the", "and",	"maybe", etc. which exist in
       in a large number of documents and do not tell you anything important
       about any document which	contains them.	This module provides such
       "stoplists" in several languages.

   Supported Languages
	   |-----------------------------------------------------------|
	   | Language	| ISO code | default encoding |	also available |
	   |-----------------------------------------------------------|
	   | Danish	| da	   | ISO-8859-1	      |	UTF-8	       |
	   | Dutch	| nl	   | ISO-8859-1	      |	UTF-8	       |
	   | English	| en	   | ISO-8859-1	      |	UTF-8	       |
	   | Finnish	| fi	   | ISO-8859-1	      |	UTF-8	       |
	   | French	| fr	   | ISO-8859-1	      |	UTF-8	       |
	   | German	| de	   | ISO-8859-1	      |	UTF-8	       |
	   | Hungarian	| hu	   | ISO-8859-1	      |	UTF-8	       |
	   | Italian	| it	   | ISO-8859-1	      |	UTF-8	       |
	   | Norwegian	| no	   | ISO-8859-1	      |	UTF-8	       |
	   | Portuguese	| pt	   | ISO-8859-1	      |	UTF-8	       |
	   | Spanish	| es	   | ISO-8859-1	      |	UTF-8	       |
	   | Swedish	| sv	   | ISO-8859-1	      |	UTF-8	       |
	   | Russian	| ru	   | KOI8-R	      |	UTF-8	       |
	   |-----------------------------------------------------------|

FUNCTIONS
   getStopWords
	   my $stoplist	     = getStopWords('en');
	   my $utf8_stoplist = getStopWords('en', 'UTF-8');

       Retrieve	a stoplist in the form of a hashref where the keys are all
       stopwords and the values	are all	1.

	   $stoplist = {
	       and => 1,
	       if  => 1,
	       # ...
	   };

       getStopWords() expects 1-2 arguments.  The first, which is required, is
       an ISO code representing	a supported language.  If the ISO code cannot
       be found, getStopWords returns undef.

       The second argument should be 'UTF-8' if	you want the stopwords encoded
       in UTF-8.  The UTF-8 flag will be turned	on, so make sure you
       understand all the implications of that.

SEE ALSO
       The stoplists supplied by this module were created as part of the
       Snowball	project	(see <http://snowball.tartarus.org>,
       Lingua::Stem::Snowball).

       Lingua::EN::StopWords provides a	different stoplist for English.

AUTHOR
       Maintained by Marvin Humphrey <marvin at	rectangular dot	com>.
       Original	author Fabien Potencier, <fabpot at cpan dot org>.

COPYRIGHT AND LICENSE
       Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself, either Perl	version	5.8.3 or, at
       your option, any	later version of Perl 5	you may	have available.

perl v5.24.1			  2008-08-22		  Lingua::StopWords(3)

NAME | SYNOPSIS | DESCRIPTION | FUNCTIONS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Lingua::StopWords&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help