Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
UA(1)				 User Commands				 UA(1)

       ua -   find  identical  sets  of	 files	(comes from the	Hungarian word
	      ugyanaz -	meaning	"the same")

       ua [OPTION]... [FILE]...

       Given a list of files, ua finds sets comprised of identical  ones.   ua
       was  designed  to take input from find or ls and	produce	output that is
       trivial to process by line oriented tools, such as sed, xargs, awk, wc,
       grep etc.  For example, counting	the number of sets of duplicates, sim-

	      $	find ~ -type f | ua - |	wc -l

       or to find the largest such set:

	      $	find ~ -type f | ua -ssep - | \
		awk -Fsep '{if (NF>M) {	M=NF;S=$0;}} END {print(S);}'

       -i     ignore letter case

       -w     ignore white spaces

       -n     do not ask the file system for file size

       -v     verbose output (prints stuff to stderr), verbose help

       -m max consider only the	first max bytes	in the hash

       -2     perform two stage	hashing, first hash on the prefix of size  set
	      with -m and throw	away candidates	with unique prefix hashes

       -s sep separator	(default SPACE)

       -p     also print the hash value

       -b size
	      set internal buffer size (default	1024)

       -h     this help	(-vh more verbose help)

       -      read  file  names	 from stdin, where each	line contains one file
	      name (this must also be the last option in the list)

       Each line of the	output represents one set of identical files. The col-
       umns  are  the  path  names  separated by sep (-ssep). When -p set, the
       first column will be the	hash value. Remember that if -i	or -w are set,
       the hash	value will likely be different from what md5sum	would give.

       Calculation proceeds in three steps:

	      1.  Ask  the  FS	for file size and throw	away files with	unique
	      byte counts.

	      2. If so requested (-2), calculate a fast	hash on	 a  fixed-size
	      prefix  (given  by -m) of	the files with the same	byte count and
	      throw away the ones with unique prefix hash values

	      3. If there are exactly two matching files left in a subset  af-
	      ter  filtering  on  size and prefix hash,	then these two will be
	      compared by byte;	otherwise the files will go through a full MD5
	      hash; and	the ones with the same hash will be deemed identical.

       -w  implies  -n,	since the byte count is	irrelevant information in this
       case. The two-stage hashing algorithm first calculates  identical  sets
       considering  only  a fixed-size prefix (thus the	-2 option requires -m)
       and then	from these sets	calculates the final result.  This can be much
       faster  when  there are many files with the same	size or	when comparing
       files with whitespaces ignored. When -w and -m max are  both  set,  the
       max refers to the first max non-white space characters.

       Get help	on usage:

	      $	ua -h
	      $	ua -vh

       Find identical files in the current directory:

	      $	ua *
	      $	ls | ua	-p -

       In  the	first case, the	files are read from the	command	line, while in
       the second the file names are read from the standard input. The	letter
       one also	prints the hashcode.

       Compare text files:

	      $	ua -iwvb256 f1.txt f2.txt f3.txt

       Compares	the three files	ignoring letter	case and white spaces.	Inter-
       mediate steps will be reported on stderr	(-v). The -w implies -n,  thus
       file sizes are not grouped. The internal	buffer size is reduced to 256,
       since the whitespaces will cause	data to	be moved in the	buffer.

       Calculate the number of identical files under home:

	      $	find ~ -type f | ua -2m256 - | wc -l

       Considering the large number of files, the  calculation	will  be  per-
       formed  with  a two stage hash (-2).  Only files	that pass the 256 byte
       prefix hash will	be fully hashed.

       Find identical header files:

	      $	find /usr/include -name	'*.h' |	ua -b256 -wm256	-2s, -

       Ignore white spaces -w (thus use	a smaller buffer -b256).  Perform  the
       calculation  in two stages (-2),	first cluster based on the whitespace-
       free first 256 characters (-m256). Also,	separate the  identical	 files
       in the output by	commas (-s,).

       1.0,   ua -h will tell you whether you have the hashed or the tree ver-

       (C) Istvan T. Hernadvolgyi, EU.EDGE LLC,	2007

       This is free software.  You may redistribute copies  of	it  under  the
       terms  of  the  Mozilla	Public	License	<>.
       There is	NO WARRANTY, to	the extent permitted by	law.

       MD5(3), md5sum(1), find(1)

ua 1.0				 November 2007				 UA(1)


Want to link to this manual page? Use this URL:

home | help