Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
dupd(1)			    General Commands Manual		       dupd(1)

       dupd - find duplicate files

       dupd COMMAND [OPTIONS]

       dupd scans all the files	in the given path(s) to	find files with	dupli-
       cate content.

       The sets	of duplicate files are not (by	default)  displayed  during  a
       scan.   Instead,	 the duplicate info is saved into a database which can
       be queried with subsequent commands without having to  scan  all	 files

       As  noted  in the synopsis, the first argument to dupd must be the com-
       mand to run.  The command is one	of:

       scan - scan files looking for duplicates

       report -	show duplicate report from last	scan

       file - check for	duplicates of one file

       ls  - list info about every file

       dups - list all duplicate files

       uniques - list all unique files

       refresh - remove	deleted	files from the database

       validate	- revalidate all duplicates in database

       rmsh - create shell script to delete all	duplicates (use	with care!)

       help - show brief usage info

       usage - show this documentation

       man - show this documentation

       license - show license info

       version - show version and exit

       scan - Perform the filesystem scan for duplicates.

       -p, --path PATH
	      Recursively scan the directory tree starting at this path.   The
	      path  option can be given	multiple times to specify multiple di-
	      rectory trees to scan.  If no path option	is given, the  default
	      is to start scanning from	the current directory.

       -m, --minsize SIZE
	      Minimum  size  (in  bytes)  to  include in scan.	By default all
	      files with 1 byte	or more	are scanned.  In  practice  duplicates
	      in  files	that small are rarely interesting, so you can speed up
	      the scan by ignoring smaller files.

       -D, --hdd
	      Select HDD (hard disk drive) scan	mode.  By default dupd is  op-
	      timized  for  scanning filesystems on SSDs (solid	state drives).
	      When scanning files located on a HDD (and	not in the  filesystem
	      cache)  setting  this  option enables an alternate scan strategy
	      which is faster on a HDD.	 While the HDD mode is	generally  al-
	      ways  faster on HDDs and the SSD mode is nearly always faster on
	      a	SSD, there can be edge case scenarios where this is not	 true.
	      Note in particular that if the file content is in	the filesystem
	      cache (which it will often be if you are using dupd in  the  in-
	      teractive	 filesystem  exploration  mode it was designed to sup-
	      port) then the SSD (default) mode	is usually faster even if  the
	      underlying files are stored on a HDD.

	      Include  hidden  files (and hidden directories) in the scan.  By
	      default these are	not included.

       --db PATH
	      Override the default database file  location.   The  default  is
	      $HOME/.dupd_sqlite.   If	you override the path during scan, re-
	      member to	provide	this argument and the path for subsequent  op-
	      erations so the database can be found.

       --nodb Do not create a database file.  Duplicate	info is	sent to	stdout
	      instead.	Not recommended, as having the	database  is  required
	      for most of the subsequent commands documented below.

       -I, --hardlink-is-unique
	      Consider	hard links to the same file content as unique.	By de-
	      fault hard links are listed as duplicates.  See HARD LINKS  sec-
	      tion  below.   Note that if this option is given during scan, it
	      cannot be	given during interactive operations.

       --stats-file FILE
	      On completion, create (or	append to) FILE	and  save  some	 stats
	      from the run.  These are the same	stats as get displayed in ver-
	      bose mode	but are	more suitable for programmatic consumption.

       --file-count COUNT
	      Estimated	maximum	number of files	to scan.  The default is  five
	      million files.  This is only relevant if --hardlink-is-unique is
	      also given.

       report -	Display	the list of duplicates.

       --cut PATHSEG
	      Remove prefix PATHSEG from the file paths	in the report  output.
	      This  can	 reduce	 clutter  in  the output text if all the files
	      scanned share a long identical prefix.

       --minsize SIZE
	      Report only duplicate sets which consume at least	this much disk
	      space.   Note  this is the total size occupied by	all the	dupli-
	      cates in a set, not their	individual file	size.

       --format	NAME
	      Produce the report in this output	format.	 NAME is one of	 text,
	      csv, json.  The default is text.

       Note:  The  database  format  generated by scan is not guaranteed to be
       compatible with future versions.	You should run	report	(and  all  the
       other  commands below which access the database)	using the same version
       of dupd that was	used to	generate the database.

       file - Report duplicate status of one file.

       To check	whether	one given file still has known duplicates use the file
       operation.   Note  that this does not do	a new scan so it will not find
       new duplicates.	This checks whether the	duplicates  identified	during
       the  previous  scan still exist and verifies (by	hash) whether they are
       still duplicates.

       --file PATH
	      Required:	The file to check

       --cut PATHSEG
	      Remove prefix PATHSEG from the file paths	in the report output.

       --exclude PATH
	      Ignore any duplicates  under  PATH  when	reporting  duplicates.
	      This  is	useful	if  you	intend to delete the entire tree under
	      PATH, to make sure you don't delete all copies of	the file.

	      Ignore the existence of hard links to the	file for  the  purpose
	      of considering whether the file is unique.

       ls, uniques, dups - List	matching files.

       While  the  file	 command checks	the duplicate status of	a single file,
       these commands do the same for all the files in a given directory tree.

       ls - List all files, show whether they have duplicates or not.

       uniques - List all unique files.

       dups - List all files which have	known duplicates.

       --path PATH
	      Start from this directory	(default is current directory)

       --cut PATHSEG
	      Remove prefix $PATHSEG from the file paths in the	output.

       --exclude PATH
	      Ignore any duplicates under PATH when reporting duplicates.

	      Ignore the existence of hard links to the	file for  the  purpose
	      of considering whether the file is unique.

       refresh - Refreshing the	database.

       As  you remove duplicate	files these are	still listed in	the dupd data-
       base.  Ideally you'd run	the scan again to rebuild the database.	  Note
       that  re-running	 the  scan  after deleting some	duplicates can be very
       fast because the	files are in the cache,	so that	is the best option.

       However,	when dealing with a set	of files large enough that they	 don't
       fit  in the cache, re-running the scan may take a long time.  For those
       cases the refresh command offers	a much faster alternative.

       The refresh command checks whether all the files	in the	dupd  database
       still exist and removes those which do not.

       Be sure to consider the limitations of this approach.  The refresh com-
       mand does not re-verify whether all  files  listed  as  duplicates  are
       still  duplicates.   It also, of	course,	does not detect	any new	dupli-
       cates which may have appeared since the last scan.

       In summary, if you have only been deleting duplicates since the	previ-
       ous scan, run the refresh command.  It will prune all the deleted files
       from the	database and will be much faster than a	scan.  However,	if you
       have been adding	and/or modifying files since the last scan, it is best
       to run a	new scan.

       validate	- Validating the database.

       The validate operation is primarily for testing but is documented  here
       as it may be useful if you want to reconfirm that all duplicates	in the
       database	are still truly	duplicates.

       In most cases you will be better	off re-running the scan	operation  in-
       stead of	using validate.

       Validate	 is  fairly slow as it will fully hash every file in the data-

       rmsh - Create shell scrip to remove duplicate files.

       As a policy dupd	never modifies the filesystem!

       As a convenience	for those times	when it	is desirable to	 automatically
       remove  files,  this operation can create a shell script	to do so.  The
       output is a shell script	(to stdout) which can you run to  delete  your
       files (if you're	feeling	lucky).

       Review  the generated script carefully to see if	it truly does what you

       Automated deletion is generally not very	useful because it takes	 human
       intervention  to	decide which of	the duplicates is the best one to keep
       in each case.  While the	content	is the same, one of them  may  have  a
       better file name	and/or location.

       Optionally,  the	shell script can create	either soft or hard links from
       each removed file to the	copy being kept.  The options are mutually ex-

       --link Create symlinks for deleted files.

	      Create hard links	for deleted files.

       Additional global options

       -q     Quiet, suppress all output.

       -v     Verbose  mode.  Can be repeated multiple times for ever increas-
	      ing verbosity.

       -h     Show brief help summary.

       --db PATH
	      Override the default database file location.

       -F, --hash NAME
	      Specify an different hash	function.  This	applies	to any command
	      which uses content hashing.  NAME	is one of: md5 sha1 sha512

       Are  hard  links	duplicates or not?  The	answer depends on "what	do you
       mean by duplicates?" and	"what are you trying to	do?"

       If your primary goal for	removing duplicates is to save disk space then
       it  makes  sense	to ignore hardlinks.  If, on the other hand, your pri-
       mary goal is to reduce filesystem clutter then it makes more  sense  to
       think of	hardlinks as duplicates.

       By  default dupd	considers hardlinks as duplicates. You can switch this
       around with the --hardlink-is-unique option.  This option can be	 given
       either  during scan or to the interactive reporting commands (file, ls,
       uniques,	dups).

       Scan all	files in your home directory and then show the sets of	dupli-
       cates found:

	      %	dupd scan --path $HOME

	      %	dupd report

       Scan all	files in the current directory which is	on a HDD:

	      %	dupd scan --hdd

       Show  duplicate status (duplicate or unique) for	all files in docs sub-

	      %	dupd ls	--path docs

       I'm about to delete docs/old.doc	but want to check one last  time  that
       it is a duplicate and I want to review where those duplicates are:

	      %	dupd file --file docs/old.doc -v

       Read  the documentation in the dupd 'docs' directory or online documen-
       tation for more usage examples.

       dupd exits with status code 0 on	success, non-zero on error.




Want to link to this manual page? Use this URL:

home | help