Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
BULK_EXTRACTOR(1)	    General Commands Manual	     BULK_EXTRACTOR(1)

       bulk_extractor  -  Scans	a disk image for regular expressions and other

       bulk_extractor -o output_dir [options] [	image |	-R dir ]

       bulk_extractor scans a disk image (or any other file) for a large  num-
       ber  of	pre-defined  regular  expressions  and other kinds of content.
       These items are called features.	 When it finds a feature, bulk_extrac-
       tor  writes  the	output to an output file. Each line of the output file
       contains	a byte offset at which the feature was found, a	tab,  and  the
       actual feature. Features	therefore cannot contain the end-of-line char-

       bulk_extractor includes native support for  EnCase  (.E01)  and	AFFLIB
       (.aff)  files,  if  it compiled and linked on a system containing those
       libraries. Alternatively, the -R	option can be used to recursively scan
       and  process a directory	of individual files (disk images in such a di-
       rectory will be treated as files, not as	disk images).

       bulk_extractor is multi-threaded. By specifying the -j option, multiple
       copies  of  the program can be run. Each	thread writes its results into
       its own feature file. The files are then	combined by the	primary	thread
       when all	of the secondary threads complete.

       bulk_extractor  is a two-phase program. In phase	1 the features are ex-
       tracted.	In phase 2 a histogram is created of relevant features.

       bulk_extractor will also	create a wordlist of all the  words  that  are
       found  in the disk image. This can be used as a dictionary for cracking

       The options are as follows:

       -o outdir
	      Specifies	 the  output  directory,  which	 will  be  created  by
	      bulk_extractor  if  necessary.  If the output directory contains
	      data from	a partial bulk_extractor run, bulk_extractor will  at-
	      tempt to resume where the	previous run left off.

       -b bannerfile.txt
	      Read  the	 contents of bannerfile.txt and	stamp it at the	begin-
	      ning of each output file.	This might be useful if	you have  some
	      kind  of	privacy	 banner	that needs to be stamped at the	top of
	      all of your files.

       -r alert_list.txt
	      Specifies	an alert list, (or red list), which is a list of terms
	      that,  if	found, will be specifically flagged in a special alert
	      file that	begins with the	letters	ALERT.	 The  alert  list  may
	      contain  individual terms, which must be found in	their entirity
	      and are case-sensitive, or wildcards with	standard Unix globbing
	      (e.g. * Globbed terms are case-insensitive.

       -w stop_list.txt
	      Specifies	a stop list, (or white list), which is a list of terms
	      that, if found, will be placed in	a special stopped file (rather
	      than  in	the main file).	The whitelist may also contain globbed

       -s frac[:passes]
	      Specify random sampling parameters.

       -p path/format
	      Open a disk image	and print the information found	at path.   The
	      format  specification may	be r for raw output and	h for hex out-
	      Specify -p - for interactive mode.
	      Specify -p -http for HTTP	mode.

       -F <rfile>
	      Specifies	a file of regular expressions to  be  used  as	search

       -f <regex>
	      Specifies	a regular expression to	be used	as a search term.

       -q nn  Quiet mode. Only prints every nn status reports.
	      Specify -1 for no	status.

	      The scan_wordlist	scanner	should only extract words that are be-
	      tween n1 and n2 characters in length.

       These commands are useful for tuning operation:

       -C NN  Specifies	the size of the	context	window.

       -S fr:<name>:window=NN
	      specifies	context	window for recorder <name> to NN.

       -S fr:<name>:window_before=NN
	      specifies	context	window after to	NN for recorder	<name>

       -S fr:<name>:window_after=NN
	      specifies	context	window before to NN for	recorder <name>

       -G NN  specify the page size

       -g NN  Specifies	the size of the	margin in bytes.

       -j NN  Use n threads for	analysis. Normally you	do  not	 need  specify
	      this,  as	the default is the number of processors	on the current

       -m NN  Have bulk_extractor wait at most NN minutes for scanners to fin-
	      ish after	all data have been read.

       The following commands are useful for debugging:

       -V     Print the	version	number

       -R outdir
	      Restarts the program from	where it left off for a	particular di-

       -B nn  Set the dedup Bloom filter to nn	bits.  This  is	 used  by  the
	      scan_wordlist scanner.

       -M nn  Specifies	a maximum recursion depth of nn.

       -z pagenum
	      Start on page number pagenum.

       -Y <o1>[-<o2>]
	      Start at input offset o1 optionally ending at offset o2

       -dN    Enable debugging level N.

       Finally,	you can	control	scanners with these options:

       -P <dir>
	      Specifies	a directory in which to	find plugins.

       -E scanner
	      Turns off	all scanners, then enabled scanner scanner.

       -e scanner
	      Enables a	scanner.

       -x scanner
	      Disables a scanner.

       bulk_extractor  is based	on a feature extractor and named entity	recog-
       nizer developed for SBook in 1991. The feature extractor	was repurposed
       for  disk  images  in  2003. The	stand-alone bulk_extractor program was
       rewritten in 2005 and publicly released	in  2007.  The	multi-threaded
       bulk_extractor was released in May 2010.

       Simson Garfinkel	<>

User Manuals			   OCT 2013		     BULK_EXTRACTOR(1)


Want to link to this manual page? Use this URL:

home | help