Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Sort(3)		      User Contributed Perl Documentation	       Sort(3)

NAME
       File::Sort - Sort a file	or merge sort multiple files

SYNOPSIS
	 use File::Sort	qw(sort_file);
	 sort_file({
	   I =>	[qw(file_1 file_2)],
	   o =>	'file_new', k => '5.3,5.5rn', -t => '|'
	 });

	 sort_file('file1', 'file1.sorted');

DESCRIPTION
       This module sorts text files by lines (or records).  Comparisons	are
       based on	one or more sort keys extracted	from each line of input, and
       are performed lexicographically.	By default, if keys are	not given,
       sort regards each input line as a single	field.	The sort is a merge
       sort.  If you don't like	that, feel free	to change it.

   Options
       The following options are available, and	are passed in the hash
       reference passed	to the function	in the format:

	 OPTION	=> VALUE

       Where an	option can take	multiple values	(like "I", "k",	and "pos"),
       values may be passed via	an anonymous array:

	 OPTION	=> [VALUE1, VALUE2]

       Where the OPTION	is a switch, it	should be passed a boolean VALUE of 1
       or 0.

       This interface will always be supported,	though a more perlish
       interface may be	offered	in the future, as well.	 This interface	is
       basically a mapping of the command-line options to the Unix sort
       utility.

       "I" INPUT
	   Pass	in the input file(s).  This can	be either a single string with
	   the filename, or an array reference containing multiple filename
	   strings.

       "c" Check that single input fle is ordered as specified by the
	   arguments and the collating sequence	of the current locale.	No
	   output is produced; only the	exit code is affected.

       "m" Merge only; the input files are assumed to already be sorted.

       "o" OUTPUT
	   Specify the name of an OUTPUT file to be used instead of the
	   standard output.

       "u" Unique: Suppresses all but one in each set of lines having equal
	   keys.  If used with the c option check that there are no lines with
	   consecutive lines with duplicate keys, in addition to checking that
	   the input file is sorted.

       "y" MAX_SORT_RECORDS
	   Maximum number of lines (records) read before writing to temp file.
	   Default is 200,000. This may	eventually change to be	kbytes instead
	   of lines.  Lines was	easier to implement.  Can also specify with
	   MAX_SORT_RECORDS environment	variable.

       "F" MAX_SORT_FILES
	   Maximum number of temp files	to be held open	at once.  Default to
	   40, as older	Windows	ports had quite	a small	limit.	Can also
	   specify with	MAX_SORT_FILES environment variable.  No temp files
	   will	be used	at all if MAX_SORT_RECORDS is never reached.

       "D" Send	debugging information to STDERR.  Behavior subject to change.

       The following options override the default ordering rules. When
       ordering	options	appear independent of any key field specifications,
       the requested field ordering rules are applied globally to all sort
       keys. When attached to a	specific key (see k), the specified ordering
       options override	all global ordering options for	that key.

       "d" Specify that	only blank characters and alphanumeric characters,
	   according to	the current locale setting, are	significant in
	   comparisons.	 d overrides i.

       "f" Consider all	lower-case characters that have	upper-case
	   equivalents,	according to the current locale	setting, to be the
	   upper-case equivalent for the purposes of comparison.

       "i" Ignores all characters that are non-printable, according to the
	   current locale setting.

       "n" Does	numeric	instead	of string compare, using whatever perl
	   considers to	be a number in numeric comparisons.

       "r" Reverse the sense of	the comparisons.

       "b" Ignore leading blank	characters when	determining the	starting and
	   ending positions of a restricted sort key.  If the b	option is
	   specified before the	first k	option,	it is applied to all k
	   options.  Otherwise,	the b option can be attached indepently	to
	   each	field_start or field_end option	argument (see below).

       "t" STRING
	   Use STRING as the field separator character;	char is	not considered
	   to be part of a field (although it can be included in a sort	key).
	   Each	occurrence of char is significant (for example,	<char><char>
	   delimits an empty field).  If t is not specified, blank characters
	   are used as default field separators; each maximal non-empty
	   sequence of blank characters	that follows a non-blank character is
	   a field separator.

       "X" STRING
	   Same	as t, but STRING is interpreted	as a Perl regular expression
	   instead.  Do	not escape any characters ("/" characters need to be
	   escaped internally, and will	be escaped for you).

	   The string matched by STRING	is not included	in the fields
	   themselves, unless demanded by perl's regex and split semantics
	   (e.g., regexes in parentheses will add that matched expression as
	   an extra field).  See perlre	and "split" in perlfunc.

       "R" STRING
	   Record separator, defaults to newline.

       "k" pos1[,pos2]
	   The keydef argument is a restricted sort key	field definition. The
	   format of this definition is:

	       field_start[.first_char][type][,field_end[.last_char][type]]

	   where field_start and field_end define a key	field restricted to a
	   portion of the line,	and type is a modifier from the	list of
	   characters b, d, f, i, n, r.	 The b modifier	behaves	like the b
	   option, but applies only to the field_start or field_end to which
	   it is attached. The other modifiers behave like the corresponding
	   options, but	apply only to the key field to which they are
	   attached; they have this effect if specified	with field_start,
	   field_end, or both.	If any modifier	is attached to a field_start
	   or a	field_end, no option applies to	either.

	   Occurrences of the k	option are significant in command line order.
	   If no k option is specified,	a default sort key of the entire line
	   is used.  When there	are multiple keys fields, later	keys are
	   compared only after all earlier keys	compare	equal.

	   Except when the u option is specified, lines	that otherwise compare
	   equal are ordered as	if none	of the options d, f, i,	n or k were
	   present (but	with r still in	effect,	if it was specified) and with
	   all bytes in	the lines significant to the comparison.  The order in
	   which lines that still compare equal	are written is unspecified.

       "pos" +pos1 [-pos2]
	   Similar to k, these are mostly obsolete switches, but some people
	   like	them and want to use them.  Usage is:

	       +field_start[.first_char][type] [-field_end[.last_char][type]]

	   Where field_end in k	specified the last position to be included, it
	   specifes the	last position to NOT be	included.  Also, numbers are
	   counted from	0 instead of 1.	 pos2 must immediately follow
	   corresponding +pos1.	 The rest should be the	same as	the k option.

	   Mixing +pos1	pos2 with k is allowed,	but will result	in all of the
	   +pos1 pos2 options being ordered AFTER the k	options.  It is	best
	   if you Don't	Do That.  Pick one and stick with it.

	   Here	are some equivalencies:

	       pos => '+1 -2'		   ->  k => '2,2'
	       pos => '+1.1 -1.2'	   ->  k => '2.2,2.2'
	       pos => ['+1 -2',	'+3 -5']   ->  k => ['2,2', '4,5']
	       pos => ['+2', '+0b -1']	   ->  k => ['3', '1b,1']
	       pos => '+2.1 -2.4'	   ->  k => '3.2,3.4'
	       pos => '+2.0 -3.0'	   ->  k => '3.1,4.0'

   Not Implemented
       If the options are not listed as	implemented above, or are not listed
       in TODO below, they are not in the plan for implementation.  This
       includes	T and z.

EXAMPLES
       Sort file by straight string compare of each line, sending output to
       STDOUT.

	   use File::Sort qw(sort_file);
	   sort_file('file');

       Sort contents of	file by	second key in file.

	   sort_file({k	=> 2, I	=> 'file'});

       Sort, in	reverse	order, contents	of file1 and file2, placing output in
       outfile and using second	character of second field as the sort key.

	   sort_file({
	       r => 1, k => '2.2,2.2', o => 'outfile',
	       I => ['file1', 'file2']
	   });

       Same sort but sorting numerically on characters 3 through 5 of the
       fifth field first, and only return records with unique keys.

	   sort_file({
	       u => 1, r => 1, k => ['5.3,5.5rn', '2.2,2.2'],
	       o => 'outfile', I => ['file1', 'file2']
	   });

       Print passwd(4) file sorted by numeric user ID.

	   sort_file({t	=> ':',	k => '3n', I =>	'/etc/passwd'});

       For the anal sysadmin, check that passwd(4) file	is sorted by numeric
       user ID.

	   sort_file({c	=> 1, t	=> ':',	k => '3n', I =>	'/etc/passwd'});

ENVIRONMENT
       Note that if you	change the locale settings after the program has
       started up, you must call setlocale() for the new settings to take
       effect.	For example:

	   # get constants
	   use POSIX 'locale_h';

	   # e.g., blank out locale
	   $ENV{LC_ALL}	= $ENV{LANG} = '';

	   # use new ENV settings
	   setlocale(LC_CTYPE, '');
	   setlocale(LC_COLLATE, '');

       LC_COLLATE
	   Determine the locale	for ordering rules.

       LC_CTYPE
	   Determine the locale	for the	interpretation of sequences of bytes
	   of text data	as characters (for example, single- versus multi-byte
	   characters in arguments and input files) and	the behaviour of
	   character classification for	the b, d, f, i and n options.

       MAX_SORT_RECORDS
	   Default is 200,000.	Maximum	number of records to use before
	   writing to a	temp file.  Overriden by y option.

       MAX_SORT_FILES
	   Maximum number of open temp files to	use before merging open	temp
	   files.  Overriden by	F option.

EXPORT
       Exports "sort_file" on request.

TODO
       Better debugging	and error reporting
       Performance hit with -u
       Do bytes	instead	of lines
       Better test suite
       Switch for turning off locale ... ?

HISTORY
       v1.01, Monday, January 14, 2002
	   Change license to be	that of	Perl.

       v1.00, Tuesday, November	13, 2001
	   Long	overdue	release.

	   Add O_TRUNC to output open (D'oh!).

	   Played with somem of	the -k options (Marco A. Romero).

	   Fix filehandle close	test of	STDOUT (Gael Marziou).

	   Some	cleanup.

       v0.91, Saturday,	February 12, 2000
	   Closed all files in test.pl so they could be	unlinked on some
	   platforms.  (Hubert Toullec)

	   Documented "I" option.  (Hubert Toullec)

	   Removed O_EXCL flag from "sort_file".

	   Fixed bug in	sorting	multiple files.	 (Paul Eckert)

       v0.90, Friday, April 30,	1999
	   Complete rewrite.  Took the code from this module to	write sort
	   utility for PPT project, then brought changes back over.  As	a
	   result the interface	has changed slightly, mostly in	regard to what
	   letters are used for	options, but there are also some key
	   behavioral differences.  If you need	the old	interface, the old
	   module will remain on CPAN, but will	not be supported.  Sorry for
	   any inconvenience this may cause.  The good news is that it should
	   not be too difficult	to update your code to use the new interface.

       v0.20
	   Fixed bug with unique option	(didn't	work :).

	   Switched to sysopen for better portability.

	   Print to STDOUT if no output	file supplied.

	   Added c option to check sorting.

       v0.18 (31 January 1998)
	   Tests 3 and 4 failed	because	we hit the open	file limit in the
	   standard Windows port of perl5.004_02 (50).	Adjusted the default
	   for total number of temp files from 50 to 40	(leave room for	other
	   open	files),	changed	docs.  (Mike Blazer, Gurusamy Sarathy)

       v0.17 (30 December 1998)
	   Fixed bug in	"_merge_files" that tried to "open" a passed
	   "IO::File" object.

	   Fixed up docs and did some more tests and benchmarks.

       v0.16 (24 December 1998)
	   One year between releases was too long.  I made changes Miko
	   O'Sullivan wanted, and I didn't even	know I had made	them.

	   Also	now use	"IO::File" to create temp files, so the	TMPDIR option
	   is no longer	supported.  Hopefully made the whole thing more	robust
	   and faster, while supporting	more options for sorting, including
	   delimited sorts, and	arbitrary sorts.

	   Made	CHUNK default a	lot larger, which improves performance.	 On
	   low-memory systems, or where	(e.g.) the MacPerl binary is not
	   allocated much RAM, it might	need to	be lowered.

       v0.11 (04 January 1998)
	   More	cleanup; fixed special case of no linebreak on last line;
	   wrote test suite; fixed warning for redefined subs (sort1 and
	   sort2).

       v0.10 (03 January 1998)
	   Some	cleanup; made it not subject to	system file limitations;
	   separated many parts	out into separate functions.

       v0.03 (23 December 1997)
	   Added reverse and numeric sorting options.

       v0.02 (19 December 1997)
	   Added unique	and merge-only options.

       v0.01 (18 December 1997)
	   First release.

THANKS
       Mike Blazer <blazer@mail.nevalink.ru>, Vicki Brown <vlb@cfcl.com>, Tom
       Christiansen <tchrist@perl.com>,	Albert Dvornik <bert@mit.edu>, Paul
       Eckert <peckert@epicrealm.com>, Gene Hsu	<gene@moreinfo.com>, Andrew M.
       Langmead	<aml@world.std.com>, Gael Marziou <gael_marziou@hp.com>, Brian
       L. Matthews <blm@halcyon.com>, Rich Morin <rdm@cfcl.com>, Matthias
       Neeracher <neeri@iis.ee.ethz.ch>, Miko O'Sullivan <miko@idocs.com>, Tom
       Phoneix <rootbeer@teleport.com>,	Marco A. Romero	<mromero@iglou.com>,
       Gurusamy	Sarathy	<gsar@activestate.com>,	Hubert Toullec
       <Hubert.Toullec@wanadoo.fr>.

AUTHOR
       Chris Nandor <pudge@pobox.com>, http://pudge.net/

       Copyright (c) 1997-2002 Chris Nandor.  All rights reserved.  This
       program is free software; you can redistribute it and/or	modify it
       under the same terms as Perl itself.

VERSION
       v1.01, Monday, January 14, 2002

SEE ALSO
       sort(1),	locale,	PPT project, <URL:http://sf.net/projects/ppt/>.

perl v5.32.1			  2002-01-22			       Sort(3)

NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | ENVIRONMENT | EXPORT | TODO | HISTORY | THANKS | AUTHOR | VERSION | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=File::Sort&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help