Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
CATDVI(1)		    General Commands Manual		     CATDVI(1)

       catdvi -	a DVI to plain text converter

       catdvi  [-d debuglevel, --debug=debuglevel] [-e outenc, --output-encod-
       ing=outenc] [-p pagespec, --first-page=pagespec]	[-l pagespec,  --last-
       page=pagespec]	[-N,   --list-page-numbers]  [-s,  --sequential]  [-U,
       --show-unknown-glyphs] [-h,  --help]  [--version]  [--copyright]	 [dvi-

       This manual page	documents catdvi version 0.14

       catdvi  reads the DVI (typesetter DeVice	Independent) file dvi-file and
       dumps a plain text approximation	of the document	it describes  to  std-
       out.   If the argument dvi-file is omitted or a dash (`-'), catdvi will
       read from stdin.	 Several output	encodings (different character sets of
       the plain text output) are supported, most notably UTF-8.

       The  current version of catdvi is a work	in progress; it	may not	be ro-
       bust enough for production use, but already works fine with linear eng-
       lish  text.   Many  mathematical	symbols	(e.g. the uppercase greek let-
       ters) and moderately complex formulae also come out right.

       The program needs to read the TFM (Tex Font Metric) files corresponding
       to  the fonts used in the DVI file.  These are searched (and, if	neces-
       sary and	possible, created on the fly) through the Kpathsea library.

       In order	to correctly translate a DVI file to text, the input  encoding
       of  the	fonts  used in it (i.e.	a meaning-preserving mapping from font
       code points to Unicode) must be known. There are	 a  lot	 of  different
       font  encodings	in use.	At the time of writing,	catdvi understands the
       following input encodings:

       `TEX TEXT'
	      Knuth's original font encoding, also known as OT1.

	      A	variant	of the above.

	      The Cork encoding, also known as T1.

	      The encoding of Knuth's math italic fonts, also known as OML.

	      The encoding of Knuth's math symbol fonts, also known as OMS.

       `TEX MATH EXTENSION' (most of it)
	      The encoding of Knuth's math  extension  fonts  (big  operators,
	      brackets,	etc.), also known as OMX.

	      The encoding of Knuth's typewriter type fonts.

	      The encoding of the lasy fonts.

       Henrik Theilings	European currency symbol (`eurosym') font.

       `TEX TEXT COMPANION SYMBOLS 1---TS1' (almost everything)
	      The encoding of the text companion fonts.

       Martin Vogels symbol (`MarVoSym') font.
	      Both  the	1998 and the 2000 version are supported	as far as pos-
	      sible -- about half of the symbols are not representable in Uni-

	      The encoding of the blackboard bold math (`bbm') fonts.

       All AMS fonts except the	Cyrillic ones.
	      This  includes  the  AMS math symbols group A and	group B, Euler
	      fraktur, Euler cursive, Euler script and Euler compatible	exten-
	      sion fonts.

       It  is  impossible  to  do  perfect translation from unmarked-up	DVI to
       plain text, since the former does only describe the layout of  a	 page,
       and  a translator such as this should really know where words and para-
       graphs end, and more importantly, which glyphs should be	aligned	verti-
       cally  and  which  shouldn't.  The current alignment algorithm tries to
       preserve	the relative horizontal	positions  of  word  beginnings;  this
       works  well  in	most  cases.   Word  breaks  are detected using	simple
       heuristics; paragraphs are not detected at all (and no  paragraph  fill
       is attempted).

       The  price  of alignment	is that	the output will	likely be more than 80
       columns wide, even though catdvi	tries very hard	not to use  more  col-
       umns than strictly necessary.  Output is	usually	less than 120 columns,
       almost always less than 132 columns wide. It may	 be  a	good  idea  to
       switch your terminal to one of these modes if possible.

       The  program  follows  the usual	GNU command line syntax, with long op-
       tions starting with two dashes.

       -d debuglevel, --debug=debuglevel
	      Set the debug output level to debuglevel (default	is 10).	 Large
	      values  will  result  in lots of debug output, 0 in none at all.
	      The maximal debug	output level currently used is 150.

       -e outenc, --output-encoding=outenc
	      Specify the encoding of the output character set.	 outenc	can be
	      one  of  the  numbers  or	names from the table below.  Names are
	      case insensitive.	 The  following	 output	 encodings  should  be

	      0: UTF-8
	      1: US-ASCII
	      2: ISO-8859-1
	      3: ISO-8859-15

	      The  command  catdvi  --help (see	below) will give a more	up-to-
	      date list	of all compiled-in output encodings. The  default  en-
	      coding is	1.

       -p pagespec, --first-page=pagespec
	      Do  not  output pages before page	pagespec.  Pages can be	speci-
	      fied in three different ways; the	first two are exactly the same
	      as for dvips(1).

	      A	 (possibly  negative)  number num specifies a TeX page number,
	      which is stored as the so-called count0 value in	the  DVI  file
	      for every	page.  Plain TeX uses negative page numbers for	roman-
	      numbered frontmatter (title page,	preface,  TOC,	etc.)  so  the
	      count0 values compare as
		     -1	< -2 < -3 < ...	< 1 < 2	< 3 < ...
	      There  may be several pages with the same	count0 value in	a sin-
	      gle DVI file. This usually happens in documents with a per-chap-
	      ter page numbering scheme.

	      A	 number	prefixed by an equals sign (`=num') specifies a	physi-
	      cal page,	i.e. the num-th	page appearing in the DVI  file.  Num-
	      bering  starts  with 1.  Note that with the long form of the op-
	      tion you actually	need two equals	signs, one as part of the long
	      option and one as	part of	the page specification.	Example:
		     catdvi --first-page==5 foo.dvi

	      The third	form of	a page specification, two numbers separated by
	      a	colon (`num1:num2'), is	useful for documents with  separately-
	      numbered	parts,	e.g.  chapters.	  It  refers  to the page with
	      count0 value equal to num2 that catdvi believes to  be  in  part
	      num1.   Since those part numbers are not stored in the DVI file,
	      the program has to guess them: an	internal  chapter  counter  is
	      increased	by one every time the count0 value of the current page
	      is not greater (in above ordering) than  that  of	 the  previous
	      page.   The  counter  is	initialized to 1 if the	first page has
	      negative count0 value and	to 0 otherwise.	(A document with sepa-
	      rately  numbered	parts  will  probably have separately numbered
	      frontmatter as well, and	then  this  rule  keeps	 the  internal
	      counter equal to real world part numbers.)

       -l pagespec, --last-page=pagespec
	      Do  not  output  pages after page	pagespec.  Pages are specified
	      exactly as for the --first-page option above.

       -N, --list-page-numbers
	      Instead of the contents of pages,	 output	 their	physical  page
	      count,  count0 value and chapter count (see the --first-page op-
	      tion above for a definition of these).

       -s, --sequential
	      Do not attempt to	reproduce the page layout;  output  glyphs  in
	      the  order  they appear in the DVI file. This may	be useful with
	      e.g. multi-column	page layouts.

       -U, --show-unknown-glyphs
	      Show the Unicode number of unknown glyphs	instead	of `?'.

       -h, --help
	      Show usage information and a list	of available output encodings,
	      then exit.

	      Show version information and exit.

	      Show copyright information and exit.

       The  usual  environment variables TFMFONTS, TEXFONTS, etc. for Kpathsea
       font search and creation	apply.	Refer to  the  Kpathsea	 documentation
       for details.

       xdvi(1),	dvips(1), tex(1), mktextfm(1), the Kpathsea texinfo documenta-
       tion, utf-8(7).

       These things do not work	(yet):

       o      No rules are converted.

       o      Extensible recipes (very large brackets, braces, etc. built  out
	      of several smaller pieces) are not properly handled.

       o      Complicated  math	 formulae are sometimes	misaligned (mostly due
	      to lack of appropriate word break	heuristics).

       o      Some fonts and font encodings are	not recognised yet.

       o      Most mathematical	symbols	have no	representation in  the	avail-
	      able  output character sets except Unicode, and hence show up as
	      `?' unless UTF-8 output encoding is selected.  A	textual	 tran-
	      scription	would be desirable.

       Watch out for these:

       o      If  there	 is a space where it does not belong or	if there is no
	      space where there	should be one, report this as a	bug (send  the
	      DVI file to the catdvi maintainer, stating where in the file the
	      bug is seen).

       catdvi was written by Antti-Juhani Kaijanaho <>, based on  a
       skeletal	   version    by    J.H.M. Dassen    (Ray).	Bjoern	 Brill
       <>	did further improvements and currently
       maintains the program.

       The manual page was compiled by Bjoern Brill, using material written by
       the first two program authors.

				8 November 2002			     CATDVI(1)


Want to link to this manual page? Use this URL:

home | help