Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
BZIP(1)			    General Commands Manual		       BZIP(1)

NAME
       bzip, bunzip - a	block-sorting file compressor, v0.21

SYNOPSIS
       bzip [ -cdfkvVL123456789	] [ filenames ...  ]
       bunzip [	-kvVL ]	[ filenames ...	 ]

DESCRIPTION
       Bzip  compresses	 files using the Burrows-Wheeler-Fenwick block-sorting
       text compression	algorithm.  Compression	is generally considerably bet-
       ter  than  that	achieved by more conventional LZ77/LZ78-based compres-
       sors, and competitive with all but the best of the PPM family  of  sta-
       tistical	compressors.

       The  command-line options are deliberately very similar to those	of GNU
       Gzip, but they are not identical.

       Bzip expects a list of file names to  follow  the  command-line	flags.
       Each  file is replaced by a compressed version of itself, with the name
       "original_name.bz".  Each compressed file  has  the  same  modification
       date and	permissions as the corresponding original, so that these prop-
       erties can be correctly restored	at decompression time.	File name han-
       dling  is  naive	in the sense that there	is no mechanism	for preserving
       original	file names, permissions	and dates in  filesystems  which  lack
       these  concepts,	or have	serious	file name length restrictions, such as
       MS-DOS.

       Bzip and	bunzip will not	overwrite existing files; if you want this  to
       happen, you should delete them first.

       If  no file names are specified,	bzip compresses	from standard input to
       standard	output.	 In this case, bzip will decline to  write  compressed
       output  to  a  terminal,	as this	would be entirely incomprehensible and
       therefore pointless.

       Bunzip (or bzip -d ) decompresses  and  restores	 all  specified	 files
       whose  names  end  in  ".bz".   Files  without this suffix are ignored.
       Again, supplying	no filenames causes decompression from standard	 input
       to standard output.

       You can also compress or	decompress exactly one named file to the stan-
       dard output by giving the -c flag.

       Compression is  always  performed,  even	 if  the  compressed  file  is
       slightly	 larger	 than  the  original.  The worst case expansion	is for
       files of	zero length, which expand to  seventeen	 bytes.	  Random  data
       (including  the	output of most file compressors) is coded at about 8.1
       bits per	byte, giving an	expansion of around 1%.

       As a self-check for your	protection, bzip uses 32-bit CRCs to make sure
       that  the  decompressed version of a file is identical to the original.
       This guards against corruption of the compressed	data, and against  un-
       detected	 bugs  in bzip (hopefully very unlikely).  The chances of data
       corruption going	undetected is microscopic, about one  chance  in  four
       billion	for each file processed.  Be aware, though, that the check oc-
       curs upon decompression,	so it can only tell you	that that something is
       wrong.  It can't	help you recover the original uncompressed data.

       Return values: 1	for an abnormal	exit, otherwise	0.

MEMORY MANAGEMENT
       Bzip compresses large files in blocks.  The block size affects both the
       compression ratio achieved, and the amount of memory  needed  both  for
       compression  and	 decompression.	  The  flags -1	through	-9 specify the
       block size to be	100,000	bytes through 900,000 bytes (the default)  re-
       spectively.  At decompression-time, the block size used for compression
       is read from the	header of the compressed file, and bunzip  then	 allo-
       cates  itself  just  enough memory to decompress	the file.  Since block
       sizes are stored	in compressed files, it	follows	that the flags	-1  to
       -9  are irrelevant to and so ignored during decompression.  Compression
       and decompression requirements, in bytes, can be	estimated as:

	     Compression:   300k + ( 8 x block size )

	     Decompression: 6 x	block size

       The 300k	constant is for	a frequency-count table, used in  the  sorting
       phase of	compression.

       Larger  block  sizes give rapidly diminishing marginal returns; most of
       the compression comes from the first two	or three hundred  k  of	 block
       size,  a	 fact worth bearing in mind when using bzip on small machines.
       It is also important to appreciate that the  decompression  memory  re-
       quirement  is set at compression-time by	the choice of block size.  So,
       for example, if you are compressing files which you think might	possi-
       bly be decompressed on a	4-megabyte machine, you	might want to select a
       block size of 200k or 300k, so the decompressor will draw  1200	kbytes
       or 1800 kbytes respectively, which is probably the limit	of what's com-
       fortable	on a 4-meg machine.  In	general, though, you  should  try  and
       use  the	 largest block size memory constraints allow.  Compression and
       decompression speed is virtually	unaffected by block size.

       Another significant point applies to files which	fit in a single	 block
       -- that means most files	you'd encounter	using a	large block size.  The
       amount of real memory touched is	proportional to	the size of the	 file,
       since  the  file	 is  smaller than a block.  For	example, compressing a
       file 20,000 bytes long with the flag -9 will cause  the	compressor  to
       allocate	 [by  the formula, in practice a little	more] 7500k of memory,
       but only	touch 300k + 20000 * 8 = 460 kbytes of it.  Similarly, the de-
       compressor will allocate	5400k but only touch 20000 * 6 = 120 kbytes.

       Here is a table which summarises	the maximum memory usage for different
       block sizes.  Also recorded is the total	compressed size	for  14	 files
       of the Calgary Text Compression Corpus totalling	3,141,622 bytes.  This
       column gives some feel for how  compression  varies  with  block	 size.
       These  figures  tend  to	understate the advantage of larger block sizes
       for larger files, since the Corpus is dominated by smaller files.

		       Compress	  Decompress   Corpus
		Flag	 usage	    usage	Size

		 -1	 1100k	     500k      905958
		 -2	 1900k	    1000k      870646
		 -3	 2700k	    1500k      853650
		 -4	 3500k	    2000k      840140
		 -5	 4300k	    2500k      838355
		 -6	 5100k	    3000k      831695
		 -7	 5900k	    3500k      827104
		 -8	 6700k	    4000k      821652
		 -9	 7500k	    4500k      821652

OPTIONS
       -c     Compress or decompress to	standard output.  -c requires  you  to
	      supply exactly one file name, and	this file is compressed	or de-
	      compressed to standard out.

       -d     Force decompression.  Bzip and bunzip are	really the  same  pro-
	      gram,  and  the decision about whether to	compress or decompress
	      is done on the basis of which name is used.  This	flag overrides
	      that mechanism, and forces bzip to decompress.

       -f     The  complement to -d: forces compression, regardless of the in-
	      vokation name.

       -k     Keep (don't delete) input	files during compression or decompres-
	      sion.

       -v     Verbose  mode  --	 show the compression ratio for	each file pro-
	      cessed.

       -V     Be very verbose.	This spews out lots of information during com-
	      pression which is	primarily of interest for debugging purposes.

       -L     Display the software license terms and conditions.

       -1 to -9
	      Set  the	block  size to 100 k, 200 k .. 900 k when compressing.
	      Has no effect when decompressing.	 See MEMORY MANAGEMENT above.

PERFORMANCE NOTES
       The sorting phase of compression	gathers	together  similar  strings  in
       the file.  Because of this, files containing very long runs of repeated
       symbols,	like "aabaabaabaab ..."	(repeated several hundred  times)  may
       compress	 extraordinarily slowly.  You can use the -V option to monitor
       progress	in great detail, if you	want.  Decompression  speed  is	 unaf-
       fected.	Such pathological cases	seem rare in practice.

       Incompressible  or  virtually-incompressible data may decompress	rather
       more slowly than	one would hope.	 This is due to	 naive	implementation
       of  the move-to-front coder, and	of the frequency tables	for the	arith-
       metic coder.

       Decompression on	Sun Sparc 1's (and  other  low-range  Sparcs)  can  be
       slow, because of	the lack of hardware implementations of	integer	multi-
       ply and divide in the SPARC v7 instruction set.	The situation is  much
       exacerbated  if	bzip  is compiled for a	full SPARC v8 instruction set,
       since this causes the machine to	trap on	each multiply and  divide  in-
       struction.  These traps take control to the relevant software emulation
       of the offending	instruction, but it is much quicker for	 the  compiler
       simply to plant a call to the emulation routine.	 Moral:	be careful how
       you compile bzip	for a Sparc.  If you use GNU C,	 investigate  the  ef-
       fects of	the -msupersparc and -mcypress flags.

       Wildcard	expansion for Windows 95 and NT	loses leading directory	infor-
       mation.	For example, the pathspec "sources\*.c"	is searched  correctly
       for  matching  files,  but the "sources\" bit is	ignored	when the files
       come to be processed, which means bzip won't be able  to	 find  any  of
       them.  This is easy to fix; perhaps some	enterprising soul will send me
       a patch?

CAVEATS
       I/O error messages are not as helpful as	they  could  be.   Bzip	 tries
       hard to detect I/O errors and exit cleanly, but the details of what the
       problem is sometimes seem rather	misleading.

       There is	no -t option to	test the integrity of a	compressed file.  How-
       ever, Unix folks	can do the following:

	  bzip -dcV file.bz > /dev/null

       which causes bzip to do a trial decompression of	file.bz, throwing away
       the result.  You'll be shown the	computed and stored  CRCs.   If	 these
       are  identical,	the  file is almost certainly OK -- see	the discussion
       above on	CRCs for a definition of "almost certainly".  If they're  not,
       bzip will complain loudly.  Note	that file.bz is	left unchanged regard-
       less of the outcome.  Win95/NT folks can	do  the	 same,	but  /dev/null
       will have to be replaced	with something suitable, perhaps NUL.

       This  manual page pertains to version 0.21 of bzip.  It may well	happen
       that some future	version	will use a different compressed	 file  format.
       If  you try to decompress, using	0.21, a	.bz file created with some fu-
       ture version which uses a different compressed file format,  0.21  will
       complain	 that  your  file  "is not a BZIP file".  If that happens, you
       should obtain a more recent version of bzip and use that	to  decompress
       the file.

AUTHOR
       Julian Seward, sewardj@cs.man.ac.uk.

       The  ideas embodied in bzip are due to (at least) the following people:
       Michael Burrows and David Wheeler (for the  block  sorting  transforma-
       tion), Peter Fenwick (for the structured	coding model, and many refine-
       ments), and Alistair Moffat, Radford  Neal  and	Ian  Witten  (for  the
       arithmetic  coder).  I am much indebted for their help, support and ad-
       vice.  See the file ALGORITHMS in the source distribution for  pointers
       to  sources  of	documentation.	 Christian von Roques encouraged me to
       look for	faster sorting algorithms, so  as  to  speed  up  compression.
       Many  people  sent  patches, helped with	portability problems, lent ma-
       chines, gave advice and were generally helpful.

				     local			       BZIP(1)

NAME | SYNOPSIS | DESCRIPTION | MEMORY MANAGEMENT | OPTIONS | PERFORMANCE NOTES | CAVEATS | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=bzip&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help