Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
HUGE-MERGE(1)	      User Contributed Perl Documentation	 HUGE-MERGE(1)

NAME
       huge-merge.pl - Merge the results of multiple huge-sort generated files
       into a single sorted file.

SYNOPSIS
       huge-merge.pl output-directory

DESCRIPTION
       Combine the sorted bigram files generated by huge-sort.pl efficiently.

       This program is used internally by huge-count.pl.

USGAE
       huge-merge.pl [OPTIONS] SOURCEDIR

INPUT
   Required Arguments:
       SOURCEDIR

       Input to	huge-merge.pl should be	a single flat directory	containing
       multiple	plain text files generated by huge-sort.pl. The	result file,
       merge.* (* is a number, the final result	file has the maximum number),
       is in the source	directory.

   Optional Arguments:
       --keep

       Switches	ON the --keep option will keep all the intermediate merging
       files.

       Other Options:

       --help

       Displays	the help information.

       --version

       Displays	the version information.

       BUGS

       There is	a limitation in	huge-merge.pl. When the	size of	the corpus is
       very large (>16G)  and the some of the terms of the bigrams is very
       long (>30 chars), the program could run out of memory at	huge-merge.pl
       step. This is because huge-merge	use two	hashes to count	the
       frequencies of the first	and second term	of the bigrams.	These two
       hashes could use	up the memory with the increase	of the length of the
       terms and the increase of the number of the terms. If just for normal
       text, terms are within limited length and numbers, the software won't
       use up the memory.

AUTHOR
       Ying Liu, University of Minnesota, Twin Cities.	liux0395 at umn.edu

       Ted Pedersen, University	of Minnesota, Duluth.  tpederse	at umn.edu

COPYRIGHT
       Copyright (C) 2009-2011,	Ying Liu and Ted Pedersen

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either	version	2 of the License, or (at your
       option) any later version.  This	program	is distributed in the hope
       that it will be useful, but WITHOUT ANY WARRANTY; without even the
       implied warranty	of MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.	 See the GNU General Public License for	more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place - Suite 330, Boston, MA	02111-1307, USA.

perl v5.24.1			  2011-03-31			 HUGE-MERGE(1)

NAME | SYNOPSIS | DESCRIPTION | USGAE | INPUT | AUTHOR | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=huge-merge.pl&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help