FreeBSD Manual Pages
UA(1) User Commands UA(1) NAME ua - find identical sets of files (comes from the Hungarian word ugyanaz - meaning "the same") SYNOPSIS ua [OPTION]... [FILE]... DESCRIPTION Given a list of files, ua finds sets comprised of identical ones. ua was designed to take input from find or ls and produce output that is trivial to process by line oriented tools, such as sed, xargs, awk, wc, grep etc. For example, counting the number of sets of duplicates, sim- ply: $ find ~ -type f | ua - | wc -l or to find the largest such set: $ find ~ -type f | ua -ssep - | \ awk -Fsep '{if (NF>M) { M=NF;S=$0;}} END {print(S);}' OPTIONS -i ignore letter case -w ignore white spaces -n do not ask the file system for file size -v verbose output (prints stuff to stderr), verbose help -m max consider only the first max bytes in the hash -2 perform two stage hashing, first hash on the prefix of size set with -m and throw away candidates with unique prefix hashes -s sep separator (default SPACE) -p also print the hash value -b size set internal buffer size (default 1024) -h this help (-vh more verbose help) - read file names from stdin, where each line contains one file name (this must also be the last option in the list) OUTPUT Each line of the output represents one set of identical files. The col- umns are the path names separated by sep (-ssep). When -p set, the first column will be the hash value. Remember that if -i or -w are set, the hash value will likely be different from what md5sum would give. ALGORITHM Calculation proceeds in three steps: 1. Ask the FS for file size and throw away files with unique byte counts. 2. If so requested (-2), calculate a fast hash on a fixed-size prefix (given by -m) of the files with the same byte count and throw away the ones with unique prefix hash values 3. If there are exactly two matching files left in a subset af- ter filtering on size and prefix hash, then these two will be compared by byte; otherwise the files will go through a full MD5 hash; and the ones with the same hash will be deemed identical. -w implies -n, since the byte count is irrelevant information in this case. The two-stage hashing algorithm first calculates identical sets considering only a fixed-size prefix (thus the -2 option requires -m) and then from these sets calculates the final result. This can be much faster when there are many files with the same size or when comparing files with whitespaces ignored. When -w and -m max are both set, the max refers to the first max non-white space characters. EXAMPLES Get help on usage: $ ua -h $ ua -vh Find identical files in the current directory: $ ua * $ ls | ua -p - In the first case, the files are read from the command line, while in the second the file names are read from the standard input. The letter one also prints the hashcode. Compare text files: $ ua -iwvb256 f1.txt f2.txt f3.txt Compares the three files ignoring letter case and white spaces. Inter- mediate steps will be reported on stderr (-v). The -w implies -n, thus file sizes are not grouped. The internal buffer size is reduced to 256, since the whitespaces will cause data to be moved in the buffer. Calculate the number of identical files under home: $ find ~ -type f | ua -2m256 - | wc -l Considering the large number of files, the calculation will be per- formed with a two stage hash (-2). Only files that pass the 256 byte prefix hash will be fully hashed. Find identical header files: $ find /usr/include -name '*.h' | ua -b256 -wm256 -2s, - Ignore white spaces -w (thus use a smaller buffer -b256). Perform the calculation in two stages (-2), first cluster based on the whitespace- free first 256 characters (-m256). Also, separate the identical files in the output by commas (-s,). VERSION 1.0, ua -h will tell you whether you have the hashed or the tree ver- sion. AUTHOR (C) Istvan T. Hernadvolgyi, EU.EDGE LLC, 2007 <istvan.hernadvolgyi@gmail.com> LICENSE This is free software. You may redistribute copies of it under the terms of the Mozilla Public License <http://www.mozilla.org/MPL/>. There is NO WARRANTY, to the extent permitted by law. SEE ALSO MD5(3), md5sum(1), find(1) ua 1.0 November 2007 UA(1)
NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OUTPUT | ALGORITHM | EXAMPLES | VERSION | AUTHOR | LICENSE | SEE ALSO
Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=ua&manpath=FreeBSD+9.0-RELEASE+and+Ports>