Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
MSORT(1)			 User Commands			      MSORT(1)

       msort - sort records in complex ways

       msort <options> [<input file>]

       msort  is  a  program for sorting text files in sophisticated ways.  It
       was developed initially for alphabetizing dictionaries of languages  in
       which  the  ordering  may  be quite different from English but has many
       other uses.

       msort allows you	to sort	blocks of text delimited in a number  of  ways
       rather  than just lines and to specify particular fields	of a record as
       sort keys using either their position, counted from either end,	or  by
       matching	regular	expressions to their tags.

       msort  is capable of sorting on multiple	keys, so that when two records
       tie on one key, the tie may be broken on	another. Any or	all  keys  may
       be  optional.   How  absent  optional  keys are ordered with respect to
       present keys may	be set separately for each key.

       msort allows you	to specify arbitrary sort orders and to	define	virtu-
       ally  unlimited numbers of multigraphs of effectively unlimited length.
       The sort	order and multigraphs are defined separately for each key.  If
       your system has locale support, you can also use	locale collation rules
       instead of specify your own sort	order.

       msort provides twelve types of key comparison: lexicographic,  numeric,
       numeric	string,	hybrid,	by string length, by angle, by date, by	domain
       name, by	time, by ISO8601 date/time stamp, by month name, and random.

       What month names	are used is a bit complicated. If the -s flag is  used
       on the same key and its argument	is the name of a file, the month names
       are read	from the file, which should be in the same format  as  a  sort
       order definition	file. If the -s	flag is	used and its argument is a lo-
       cale name, the month names recognized will be the month names  and  ab-
       breviations associated with the specified locale. If the	-s flag	is not
       used the	month names recognized will be the month names	and  abbrevia-
       tions  associated with the current locale. If your system does not have
       locale support and you do not use the -s	flag to	read the  month	 names
       from a file, the	month names recognized will be the English month names
       and abbreviations.

       msort can reverse the characters	in a key, allowing it to  be  used  to
       generate	reverse	dictionaries.

       A choice	of sorting algorithms is provided.

       msort fully supports Unicode. The text to be sorted, and	all specifica-
       tions, should be	in UTF-8 Unicode. (If you have plain ASCII text,  this
       is  not	a problem as ASCII is a	subset of Unicode.) Full Unicode case-
       folding is available, in	Turkic and non-Turkic variants.	 Unicode  nor-
       malization is performed before sorting.

       For usage information, execute msort with no arguments.

       Full  information about msort is	currently to be	found in the reference
       manual, which is	distributed as a PDF (Portable Document	Format)	 file.
       If  a  copy  is not available locally, you can download it from msort's
       home page:

   Informational options
	      Print usage message

	      Print version message

	      List defaults

	      List general command line	options

	      List equivalents for GNU sort command line options.

	      List informational command line options

	      List key-specific	command	line options

	      List limits

	      List the supported number	systems.

   General options
	      A	record is terminated by	two or more newlines

	      A	record consists	of a single line

       -r,--record-separator <separator>
	      A	record is terminated by	separator character

       -O,--fixed-size-record <bytes>
	      A	record consists	of the specified number	of bytes.

       -d,--field-separators <character>+
	      Fields are delimited by the named	character(s)

	      Sort on the entire text of the record

       -a,--algorithm <algorithm>
	      Use the specified	sort algorithm.	The choices  are:  I(nsertion-
	      Sort), M(ergeSort), Q(uickSort), and S(hellSort).	 Note that In-
	      sertionSort and MergeSort	are stable, while QuickSort and	Shell-
	      Sort are unstable. The default is	QuickSort.

       -M,-initial-maximum-records <records>
	      Set initial maximum number of records

	      End-of-line  in  the  input  data	 is  marked by Carriage	Return
	      (0x0D) as	on the Macintosh rather	than by	Line Feed (0x0A) as on
	      Unix systems.

	      Invert sense of comparisons globally

	      No  characters fall outside the Basic Multingual Plane (that is,
	      have values greater than 0xFFFF).

	      Copy the first record in the input to the	output without sorting
	      it. This is useful for sorting files with	a header.

	      Do  not  make internal use of the	Private	Use areas. By default,
	      multigraphs are assigned internally to codepoints	in the Supple-
	      mentary  Private Use areas if full Unicode is in use or to code-
	      points in	the Private Use	area if	input is restricted to the Ba-
	      sic  Multilingual	Plane by means of the -B option. If your input
	      makes use	of the Private Use areas, this option prevents	inter-
	      ference  with  your input. In this case, multigraphs will	be as-
	      signed to	the Low	and High Surrogate areas (0xD800-0xDFFF). Note
	      that this	limits the number of multigraphs to 2,048.

       -P,--random-seed	<seed>
	      Set  the	seed for the random number generator. If not set here,
	      it is set	to a value determined by the time. The	seed  used  is
	      reported in the log. This	option allows runs to be replicated.

	      Check  whether  the input	is already sorted. Do not generate any
	      output.  Exit status is 0	if input is already sorted, 11 if  not

       -1,--in <input file name>

       -2,--out	<output	file name>
	      If the output file is the	same as	the input file,	the input file
	      will be overwritten. The input file will not be  overwritten  if
	      the run is unsuccessful.

	      Suppress	output	to the log. If this flag is given before there
	      is any output to the log from a command line flag, nothing  will
	      be written to the	log and	the log	file will not be created. If a
	      command line flag	generates a log	message	before	this  flag  is
	      processed, the log file will be created but no log messages will
	      be written to it once this flag is processed. To guarantee  that
	      no  attempt  will	 be  made  to  open a log file,	give this flag

	      Be quiet - do not	chat while working

       -u,--unicode-normalization <mode>
	      Select Unicode normalization mode. The choices of	 mode  are:  c
	      for  normalization  form	C  (NFC),  d  for normalization	form D
	      (NFD), C for normalization form KC (NFKC), D  for	 normalization
	      form KD (NFKD), and n for	no normalization. The default is NFC.

   Key specific	options
       -e,--character-range <m,n>
	      Sort on characters m through n. Positive indices start from one.
	      Negative indices indicate	position with respect to  the  end  of
	      the  record.   For example, the range 3,-2 consists of the third
	      character	through	the next-to-last character.

       -n,--position <POS>(,<POS>)
	      Sort on the specified POS	or contiguous range of POSs,  where  a
	      POS  is  of  the	form <field number>(.<character	number>). Both
	      counts begin at one.  Field numbers but  not  character  numbers
	      may  be negative,	in which case they are counted from the	right.
	      Thus, 1.2	is the second character	of the first  field;  -2.1  is
	      the first	character of the next to last field.

       -t,--tag	<tag regexp>
	      Sort on the field	with the specified tag

       -o,--optional <comparison>
	      Optional:	compare	as (<,=,>) to present key if absent

	      Fold case

	      Fold case	with additional	Turkic conversions.

       -c,--comparison-type <comparison	type>
	      a(ngle),l(exicographic),	i(so8601  date/time),  t(ime), D(omain
	      name/email address), d(ate), m(onth name),  n(umeric),  N(umeric
	      string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
	      Specifies	 the number system expected for	this key. This affects
	      only numeric and numeric string keys. There are two special val-
	      ues. If the number system	is "all", records may contain any num-
	      ber system that msort can	interpret. Different records may  con-
	      tain  different  number systems.	If the number system is	"any",
	      records may contain any writing system that msort	can interpret,
	      but  all records must make use of	the same number	system.	 msort
	      sets the number system on	the basis of the first record.

       -f,--date-format	<date format>
	      Permutation of ymd with separators, e.g. y-m-d for international
	      date format, m/d/y for American date format, or a	permutation of
	      yd with separators, e.g. y-d, for	day-of-year dates.  All	 three
	      components  may  be  numbers in any available number system. The
	      month field may also be a	month name, determined by the same de-
	      vices as independent month name fields.

       -W,--sort-order-file-separators <file name>
	      Read  the	 list of characters to be treated as separators	in the
	      sort order definition file.

       -S,--substitutions <file	name>
	      Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
	      If the argument is a file	name, it is taken to be	a  sort	 order
	      file  and	 the  sort order for the key is	read from the file. If
	      the argument is a	locale name, the collation rules for that  lo-
	      cale  are	used. If the argument is "locale", the collation rules
	      for the current locale are used.

       -T,--transformations <(d)(e)(s)>
	      Apply the	specified transformations.  d specifies	that  diacrit-
	      ics  are to be stripped. Separately encoded combining diacritics
	      are removed. Characters with  diacritics	represented  by	single
	      codepoints  are  replaced	with the corresponding ASCII character
	      without the diacritics, if there is one.	e specifies  that  en-
	      closed  characters, that is, characters within circles or	paren-
	      theses, are to be	replaced with the  corresponding  plain	 ASCII
	      character	 if there is one.  s specifies that characters in spe-
	      cial styles are to be  replaced  with  the  corresponding	 plain
	      ASCII  character if there	is one.	Stylistic equivalents include:
	      small capitals (e.g. U+1D04), script forms (e.g. U+212C),	 black
	      letter  forms  (e.g.  U+212D),  Arabic  presentation forms (e.g.
	      U+FE81), Hebrew  presentation  forms  (e.g.  U+FB1D),  fullwidth
	      forms  (e.g.  U+FF01),  halfwidth	 forms	(e.g. U+FF7B), and the
	      mathematical alphanumeric	symbols	(e.g. U+1D400).

       -x,--exclusion-file <file name>
	      Read exclusions from named file

       -X,--exclude-characters <exclusions>
	      Exclude specified	characters

	      Invert sense of comparisons

	      Reverse characters of key

	      Ignore all but the first character of the	field, after substitu-
	      tions, exclusions, etc.

       Note: long options may not be available on your system.

       sort(1),	uninum(3)

       Bill Poser (

       GNU General Public License (, ver-
       sion 3.

msort				 January 2010			      MSORT(1)


Want to link to this manual page? Use this URL:

home | help