Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
DJVU(1)				 DjVuLibre-3.5			       DJVU(1)

       DjVu - DjVu and DjVuLibre.

       Although	 the Internet has given	us a worldwide infrastructure on which
       to build	the universal library, much of the world  knowledge,  history,
       and  literature	is  still  trapped  on	paper  in the basements	of the
       world's traditional libraries. Many libraries and content owners	are in
       the  process  of	digitizing their collections.  While many such efforts
       involve the painstaking process of converting paper documents  to  com-
       puter-friendly  form, such as SGML based	formats, the high cost of such
       conversions limits their	extent.	Scanning documents,  and  distributing
       the  resulting  images electronically is	not only considerably cheaper,
       but also	more faithful to the original document	because	 it  preserves
       its visual aspect.

       Despite	the quickly improving speed of network connections and comput-
       ers, the	number of scanned document images accessible on	the Web	 today
       is relatively small. There are several reasons for this.

       The  first reason is the	relatively high	cost of	scanning anything else
       but unbound sheets in black and white. This  problem  is	 slowly	 going
       away with the appearance	of fast	and low-cost color scanners with sheet

       The second reason is that long-established image	compression  standards
       and  file formats have proved inadequate	for distributing scanned docu-
       ments at	high resolution, particularly color documents.	Not  only  are
       the file	sizes and download times impractical, the decoding and render-
       ing times are also prohibitive.	A typical  magazine  page  scanned  in
       color  at 100 dpi in JPEG would typically occupy	100 KB to 200 KB , but
       the text	would be hardly	readable: insufficient for screen viewing  and
       totally	unacceptable for printing. The same page at 300	dpi would have
       sufficient quality for viewing and printing, but	the file size would be
       300  KB to 1000 KB at best, which is impractical	for remote access. An-
       other major problem is that a fully decoded 300 dpi color images	 of  a
       letter-size  page occupies 24 MB	of memory and easily causes disk swap-

       The third reason	is that	digital	documents are more than	just a collec-
       tion  of	 individual  page  images. Pages in a scanned documents	have a
       natural serial order. Special provision must be	made  to  ensure  that
       flipping	pages be instantaneous and effortless so as to maintain	a good
       user experience.	Even more important, most  existing  document  formats
       force  users  to	download the entire document first before displaying a
       chosen page.  However, users often want to jump to individual pages  of
       the  document without waiting for the entire document to	download.  Ef-
       ficient browsing	requires efficient random page access, fast sequential
       page  flipping, and quick rendering. This can be	achieved with a	combi-
       nation of advanced compression,	pre-fetching,  pre-decoding,  caching,
       and progressive rendering. DjVu decomposes each page into multiple com-
       ponents (text, backgrounds,  images,  libraries	of  common  shapes...)
       that  may  be  shared  by  several pages	and downloaded on demand.  All
       these requirements call for a very sophisticated	but parsimonious  con-
       trol mechanism to handle	on-demand downloading, pre-fetching, decoding,
       caching,	and progressive	rendering of the page images.  What  is	 being
       considered here is not just a document image compression	technique, but
       a whole platform	for document delivery.

       DjVu is an image	compression technique, a document format, and a	 soft-
       ware  platform  for  delivering documents images	over the Internet that
       fulfills	the above requirements.

       The DjVu	image compression is based on three technologies:

       DjVuPhoto, also known as	IW44, is a wavelet-based continuous-tone image
       compression  technique with progressive decoding/rendering.  It is best
       used for	encoding photographic images in	colors or in shades  of	 gray.
       Images are typically half the size as JPEG for the same distortion.

       DjVuBitonal,  also  known  as  JB2, is a	bitonal	image compression that
       takes advantage of repetitions of nearly	identical shapes on  the  page
       (such  as  characters) to efficiently compress text images.  It is best
       used to compress	black and white	images representing  text  and	simple
       drawings.  A typical 300	dpi page in DjVuBitonal	occupies 5 to 25 KB (3
       to 8 times better than TIFF-G4 or PDF ).

       DjVuDocument is a compression technique specifically designed for color
       digital	documents  images containing both pictures and text, such as a
       page of a magazine.  DjVuDocument  represents  images  into  separately
       compressed  layers.   The  foreground  layer is usually compressed with
       DjVu Bitonal and	contains the text and drawings.	 The background	 layer
       is  usually  compressed with DjVuPhoto and contains the background tex-
       ture and	the pictures at	lower resolution.

       The DjVu	technology is designed from the	ground up to support the effi-
       cient  delivery	of  digital  documents over the	Internet.  It provides
       various ways to deal with multi-page documents, and various ways	to en-
       rich the	content	with hyper-links, meta-data, searchable	text, etc.

   MIME	types
       The  DjVu  format has an	official MIME type of image/vnd.djvu, which is
       the preferred content-type to be	given by http servers for DjVu	files.
       Unofficial  mime	 types used historically are image/x.djvu and image/x-
       djvu, which may still be	encountered.  Ideally, clients should be  con-
       figured	to  handle all three.  (For web	server configuration help, see

   Bundled multi-page documents
       Bundled multi-page DjVu document	uses a single file  to	represent  the
       entire  document.   This	 single	file contains all the pages as well as
       ancillary information (e.g. the page directory, data shared by  several
       pages,  thumbnails,  etc.).   Using a single file format	is very	conve-
       nient for storing documents or for sending email	attachments.

       When you	type the URL of	a multi-page document, the DjVu	browser	plugin
       starts  downloading the whole file, but displays	the first page as soon
       as it is	available.  You	can immediately	navigate to other pages	 using
       the DjVu	toolbar.  Suppose however that the document is stored on a re-
       mote web	server.	 You can easily	access the first  page	and  see  that
       this  is	 not the document you wanted.  Although	you will never display
       the other pages the browser is transferring data	for these pages	and is
       wasting the bandwidth of	your server (and the bandwidth of the Internet
       too).  You could	also see the summary of	the document on	the first page
       and  jump to page 100.  But page	100 cannot be displayed	until data for
       pages 1 to 99 has been received.	 You may have to wait for  the	trans-
       mission of unnecessary page data.  This second problem (the unnecessary
       wait) can be solved using the ``byte serving'' options of the  HTTP/1.1
       protocol.  This option has to be	supported by the web server, the prox-
       ies, the	caches and the browser.	 Byte serving however does  not	 solve
       the first problem (the waste of bandwidth).

   Indirect multi-page documents
       Indirect	 multi-page  DjVu  documents solve both	problems.  An indirect
       multi-page DjVu document	is composed of several files.  The  main  file
       is  named  the  index file.  You	can browse a document using the	URL of
       the index file, just like you do	with a	bundled	 multi-page  document.
       The  index file however is very small.  It simply contains the document
       directory and the URLs of secondary files  containing  the  page	 data.
       When  you  browse an indirect multi-page	document, the browser only ac-
       cesses data for the pages you are viewing.  This	can be done at a  rea-
       sonable	speed because the browser maintains a cache of pages and some-
       times pre-fetches a few pages ahead of the current  page.   This	 model
       uses  the  web serving bandwidth	much more effectively.	It also	elimi-
       nates unnecessary delays	when jumping ahead to pages  located  anywhere
       in a long document.

       Every  DjVu image optionally includes so-called annotation chunks.  The
       annotation chunk	is often used to define	hyper-links to other  document
       pages  or  to  arbitrary	web pages.  Annotation chunks can also be used
       for other purposes such as setting the initial viewing mode of a	 page,
       defining	 highlighted  zones,  or storing arbitrary meta-data about the
       page or the document.

   Hidden text
       Every DjVu image	optionally includes a hidden text layer	 that  associ-
       ated  graphical	features with the corresponding	text.  The hidden text
       layer is	usually	generated by running an	Optical	Character  Recognition
       software.   This	 textual  information provides for indexing DjVu docu-
       ments and copying/pasting text from DjVu	page images.

       DjVu documents sometimes	contain	pre-computed page thumbnails.

       DjVu documents sometimes	contain	a navigation chunk containing an  out-
       line,  that  is,	 a hierarchical	table of contents with pointers	to the
       corresponding document pages.

       The DjVu	technology was initially created by a few researchers in  AT&T
       Labs	between	    1995     and    1999.     Lizardtech,    Inc.    ( ) then	obtained  a  commercial	 license  from
       AT&T  and  continued the	development.  They have	now a variety of solu-
       tions for producing and distributing documents using the	DjVu  technol-

       The DjVuZone web	site ( ) is managed by the few
       AT&T Labs researchers who created the  DjVu  technology	in  the	 first
       place.	We  promote  the  DjVu	technology by providing	an independent
       source of information about DjVu.

       Understanding how little	room there is for a proprietary	document  for-
       mat,  Lizardtech	released the DjVu Reference Library under the GNU Pub-
       lic License in December 2000.  This library entirely defines  the  com-
       pression	format and the elementary codecs.  Six month later, Lizardtech
       released	an updated DjVu	Reference Library as well as the  source  code
       of the Unix viewer.

       These  two  releases  form the basis of our initial DjVuLibre software.
       We modified the build system to comply with  the	 expectations  of  the
       open  source  community.	 Various bugs and portability issues have been
       fixed.  We also tried to	make it	simpler	to use and install, while pre-
       serving the essential structure of the Lizardtech releases.

       The DjVuLibre software contains the following components:

       bzz(1) A	general	purpose	compression command line program.  Many	inter-
	      nal DjVu data structures are compressed using this technique.

       c44(1) A	DjVuPhoto command line encoder.	This state-of-the-art  wavelet
	      compressor produces DjVuPhoto images from	PPM or JPEG images.

	      A	 DjVuBitonal  command line encoder. This soft-pattern-matching
	      compressor produces DjVuBitonal images from PBM images.  It  can
	      encode  images without loss, or introduce	small changes in order
	      to improve the compression ratio.	 The lossless encoding mode is
	      competitive with that of the Lizardtech commercial encoders.

	      A	 DjVuDocument command line encoder for images with few colors.
	      This encoder is well suited to compressing images	with  a	 small
	      number  of  distinct  colors  (e.g. screen-shots).  The dominant
	      color is encoded by the background layer.	 The other colors  are
	      encoded by the foreground	layer.

	      A	 DjVuDocument command line encoder for separated images.  This
	      encoder takes a file  containing	pre-segmented  foreground  and
	      background images	and produces a DjVuDocument image.

	      A	command	line decoder for DjVu images.  This program produces a
	      PNM image	representing any segment of any	page of	a  DjVu	 docu-
	      ment at any resolution.

	      A	stand-alone viewer for DjVu images.  This sophisticated	viewer
	      displays DjVu documents.	It implements document	navigation  as
	      well as fast zooming and panning.

	      A	web browser plugin for viewing DjVu images.  This small	plugin
	      allows for viewing DjVu documents	from web browsers.  It	inter-
	      nally uses djview	to perform the actual work.

	      A	 command  line	tool  for converting DjVu documents into Post-
	      Script .

	      A	command	line tool for  manipulating  bundled  multi-page  DjVu
	      documents.   This	 program  is  often used to collect individual
	      pages and	produce	a bundled document.

	      A	command	line tool for converting bundled documents to indirect
	      documents	and conversely.

	      A	 powerful  command line	tool for manipulating multi-page docu-
	      ments, creating or editing annotation chunks, creating or	 edit-
	      ing  hidden  text	 layers,  pre-computing	 thumbnail images, and

	      A	command	line tool to extract the hidden	text from  DjVu	 docu-

	      A	 command  line	tool  for inspecting DjVu files	and displaying
	      their internal structure.

	      A	command	line tool for dis-assembling DjVu image	files.

	      A	command	line tool for assembling DjVu image files.

	      A	CGI program for	generating indirect multi-page DjVu  documents
	      on the fly.

       djvutoxml(1), djvuxmlparser(1)
	      Command line tools to edit DjVu metadata as XML files.

       DjVuLibre comes with a variety of specialized encoders, c44(1) for pho-
       tographic images, cjb2(1) for bitonal images, and cpaldjvu(1)  for  im-
       ages with few distinct colors.  Although	these encoders perform well in
       their specialized domain, they cannot handle  complex  tasks  involving
       segmentation and	multipage encoding.

       The Lizardtech commercial products (see
       tions/document) can perform these complex encoding tasks

       Another	solution  is   provided	  by   the   compression   server   at
       (	 This machine uses pre-lizardtech pro-
       totype encoders from AT&T Labs and performs almost as well as the  com-
       mercial Lizardtech encoders.  Please note that the Any2DjVu compression
       server comes with no guarantee, that nothing is	done  to  ensure  that
       your  documents	will  remain  confidential, and	that there is only one
       computer	working	for the	whole planet.

       Numerous	people have contributed	to the DjVu  source  code  during  the
       last  five years.  Please submit	a sourceforge bug report to update the
       following list.

	  Yoshua Bengio, Leon Bottou, Chakradhar Chandaluri, Regis M. Chaplin,
	  Ming	Chen,  Parag  Deshmukh,	Royce Edwards, Andrew Erofeev, Praveen
	  Guduru, Patrick Haffner, Paul	G. Howard, Orlando Keise, Yann Le Cun,
	  Artem	 Mikheev,  Florin  Nicsa, Joseph M. Orost, Steven Pigeon, Bill
	  Riemers, Patrice Simard, Jeffery Triggs, Luc	Vincent,  Pascal  Vin-

DjVuLibre-3.5			  10/11/2001			       DJVU(1)


Want to link to this manual page? Use this URL:

home | help