Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

Man Page or Keyword Search:
Man Architecture
Apropos Keyword Search (all sections) Output format
home | help
UA(1)				 User Commands				 UA(1)

       ua -   find  identical  sets  of	 files	(comes from the	Hungarian word
	      ugyanaz -	meaning	"the same")

       ua [OPTION]... [FILE]...

       Given a list of files, ua finds sets comprised of identical  ones.   ua
       was  designed  to take input from find or ls and	produce	output that is
       trivial to process by line oriented tools, such as sed, xargs, awk, wc,
       grep etc.  For example, counting	the number of sets of duplicates, sim-

	      $	find ~ -type f | ua - |	wc -l

       or to find the largest such set:

	      $	find ~ -type f | ua -ssep - | \
		awk -Fsep '{if (NF>M) {	M=NF;S=$0;}} END {print(S);}'

       -i     ignore letter case

       -w     ignore white spaces

       -n     do not ask the file system for file size

       -v     verbose output (prints stuff to stderr), verbose help

       -m max consider only the	first max bytes	in the hash

       -2     perform two stage	hashing, first hash on the prefix of size  set
	      with -m and throw	away candidates	with unique prefix hashes

       -s sep separator	(default SPACE)

       -p     also print the hash value

       -b size
	      set internal buffer size (default	1024)

       -h     this help	(-vh more verbose help)

       -      read  file  names	 from stdin, where each	line contains one file
	      name (this must also be the last option in the list)

       Each line of the	output represents one set of identical files. The col-
       umns  are  the  path  names  separated by sep (-ssep). When -p set, the
       first column will be the	hash value. Remember that if -i	or -w are set,
       the hash	value will likely be different from what md5sum	would give.

       Calculation proceeds in three steps:

	      1.  Ask  the  FS	for file size and throw	away files with	unique
	      byte counts.

	      2. If so requested (-2), calculate a fast	hash on	 a  fixed-size
	      prefix  (given  by -m) of	the files with the same	byte count and
	      throw away the ones with unique prefix hash values

	      3. If there are exactly two matching  files  left	 in  a	subset
	      after  filtering on size and prefix hash,	then these two will be
	      compared by byte;	otherwise the files will go through a full MD5
	      hash;  and the ones with the same	hash will be deemed identical.

       -w implies -n, since the	byte count is irrelevant information  in  this
       case.  The  two-stage hashing algorithm first calculates	identical sets
       considering only	a fixed-size prefix (thus the -2 option	 requires  -m)
       and then	from these sets	calculates the final result.  This can be much
       faster when there are many files	with the same size or  when  comparing
       files  with  whitespaces	 ignored. When -w and -m max are both set, the
       max refers to the first max non-white space characters.

       Get help	on usage:

	      $	ua -h
	      $	ua -vh

       Find identical files in the current directory:

	      $	ua *
	      $	ls | ua	-p -

       In the first case, the files are	read from the command line,  while  in
       the  second the file names are read from	the standard input. The	letter
       one also	prints the hashcode.

       Compare text files:

	      $	ua -iwvb256 f1.txt f2.txt f3.txt

       Compares	the three files	ignoring letter	case and white spaces.	Inter-
       mediate	steps will be reported on stderr (-v). The -w implies -n, thus
       file sizes are not grouped. The internal	buffer size is reduced to 256,
       since the whitespaces will cause	data to	be moved in the	buffer.

       Calculate the number of identical files under home:

	      $	find ~ -type f | ua -2m256 - | wc -l

       Considering  the	 large	number	of files, the calculation will be per-
       formed with a two stage hash (-2).  Only	files that pass	the  256  byte
       prefix hash will	be fully hashed.

       Find identical header files:

	      $	find /usr/include -name	'*.h' |	ua -b256 -wm256	-2s, -

       Ignore  white spaces -w (thus use a smaller buffer -b256).  Perform the
       calculation in two stages (-2), first cluster based on the  whitespace-
       free  first  256	characters (-m256). Also, separate the identical files
       in the output by	commas (-s,).

       1.0,  ua	-h will	tell you whether you have the hashed or	the tree  ver-

       (C) Istvan T. Hernadvolgyi, EU.EDGE LLC,	2007

       This  is	 free  software.   You may redistribute	copies of it under the
       terms of	 the  Mozilla  Public  License	<>.
       There is	NO WARRANTY, to	the extent permitted by	law.

       MD5(3), md5sum(1), find(1)

ua 1.0				 November 2007				 UA(1)


Want to link to this manual page? Use this URL:

home | help