Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
DICTFMT(1)							    DICTFMT(1)

NAME
       dictfmt - formats a DICT	protocol dictionary database

SYNOPSIS
       dictfmt	-c5|-t|-e|-f|-h|-j|-p [options]	 basename
       dictfmt	-i|-I [options]

DESCRIPTION
       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
       named basename.dict, that conforms to the DICT protocol.	 It also  cre-
       ates  an	 index	file  named  basename.index.  By default, the index is
       sorted according	to the C locale, and only alphanumeric characters  and
       spaces  are used	in sorting, however this may be	changed	with the --lo-
       cale and	--allchars options.  ( basename	is commonly chosen  to	corre-
       spond to	the basename of	FILE , but this	is not mandatory.)

       Unless  the  database is	extremely small, it is highly recommended that
       basename.dict be	 compressed  with  /usr/bin/dictzip  to	 create	 base-
       name.dict.dz.  (dictzip is included in the dictd	source package.)

       FILE  may  be in	any of the several formats described by	the format op-
       tions -c5, -t, -e, -f, -h, -j, -p, -i or	-I.  Exactly one of these  op-
       tions must be given.

       dictfmt	prepends  several headers are to the .dict file.  The 00-data-
       base-url	header gives the value of the -u option	as the URL of the site
       from  which  the	original database was obtained.	 The 00-database-short
       header gives the	value of the -s	option as the short name of  the  dic-
       tionary.	  (This	 "short	 name"	is  the	 identifying name given	by the
       "dict- D" option.)  If the -u and/or -s options are omitted, these val-
       ues  will  be  shown  as	"unknown", which is undesirable	for a publicly
       distributed database.

       The date	of conversion (formatting) is given  in	 the  00-database-info
       header.	All text in the	input file prior to the	first headword (as de-
       fined by	the appropriate	formatting option) is appended to this header.
       All  text  in the input file following a	headword, up to	the next head-
       word, is	copied unchanged to the	.dict file.

FORMATTING OPTIONS
       -c5    FILE is formatted	with headwords preceded	by 5  or  more	under-
	      score  characters	(_) and	a blank	line.  All text	until the next
	      headword is considered the definition.  Any leading `@'  charac-
	      ters are stripped	out, but the file is otherwise unchanged. This
	      option was written to format the CIA WORLD FACTBOOK 1995.

       -t     -c5, --without-info and --without-headword options are  implied.
	      Use  this	 option,  if an	input database comes from dictunformat
	      utility.

       -e     FILE is in html  format,	with  the  headword  tagged  as	 bold.
	      (<B>headword - </B>)
	      This  option  was	 written to format EASTON'S 1897 BIBLE DICTIO-
	      NARY.  A typical entry from Easton is:

	      <A NAME="T0000005">
	      <B>Abagtha - </B>
	      one of the seven eunuchs	in  Ahasuerus's	 court	(Esther	 1:10;
	      2:21).

	      This is converted	to:
	      Abagtha
		 one  of  the seven eunuchs in Ahasuerus's court (Esther 1:10;
	      2:21).

	      The heading "<A NAME="T0000005"> is omitted,  and	 the  headword
	      `Abagtha'	is indexed.

	      NOTE:  This option should	be used	with caution.  It removes sev-
	      eral html	tags (enough to	format Easton properly), but not  all.
	      The  Makefile  that was originally written to format dict-easton
	      uses sed scripts to modify certain cross reference tags.	It may
	      be  necessary  to	 pipe  the input file through a	sed script, or
	      hack the source of dictfmt in order  to  properly	 format	 other
	      html databases.

       -f     FILE  is formatted with the headwords starting in	column 0, with
	      the definition indented at least one space (or tab character) on
	      subsequent  lines.  The third line starting in column 0 is taken
	      as the first headword , and the first two	lines starting in col-
	      umn  0 are treated as part of the	00-database-info header.  This
	      option was written to format the F.O.L.D.O.C.

       -h     FILE is formatted	with the headwords starting in column 0,  fol-
	      lowed  by	 a  comma,  with the definition	continuing on the same
	      line.  All text before the first single character	 line  is  in-
	      cluded in	00-database-info header, and lines with	only one char-
	      acter are	omitted	from the .dict file.  The first	headword is on
	      the  line	 following the first single character line.  The head-
	      word is indexed; the text	of the file is not changed.  This  op-
	      tion was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.

       -j     FILE  is formatted with headwords	starting in col	0, enclosed in
	      colons, followed by the definition.  The colons surrounding  the
	      headword are removed, and	the headword is	indexed.  Lines	begin-
	      ning with	'*', '=', or '-' are also removed.   All  text	before
	      the  first headword is included in the headers.  This option was
	      written to format	the JARGON FILE.
	      NOTE: Some recent	versions of the	JARGON FILE had	 three	blanks
	      inserted before the first	colon at each headword.	 These must be
	      removed before processing	with dictfmt.  (sed scripts have  been
	      used  for	this purpose. ed, awk, or perl scripts are also	possi-
	      ble.)

       -p     FILE is formatted	with `%h' in column 0, followed	 by  a	blank,
	      followed by the headword,	optionally followed by a line contain-
	      ing `%d' in column 0.  The definition starts  on	the  following
	      line.   The  first  line	beginning '%h' and any lines beginning
	      '%d' are stripped	from the .dict file, and  '%h  '  is  stripped
	      from  in front of	the headword.  All text	before the first head-
	      word is included in the headers.	The second line	beginning '%h'
	      is taken as the first headword.  This option was written to for-
	      mat Jay Kominek's	elements database.

       -i -I  These two	options	are different from all	other  formatting  op-
	      tions.  They are intended	to resort (according to	dictd require-
	      ment) an .index file given on stdin.  That is .dict file is  not
	      generated	at all.	Only resorting is made.	 Three-	or four-column
	      .index like input	is expected.  -i expects  decimal  offset  and
	      length, while -I expects them in base64 format.

OPTIONS
       -u url Specifies	 the  URL  of the site from which the raw database was
	      obtained.	 If this option	is specified, 00-database-url headword
	      and appropriate definition will be ignored.

       -s name
	      Specifies	the name and, optionally, the version and date,	of the
	      database.	 (If this contains spaces, it  must  be	 quoted.)   If
	      this  option is specified, 00-database-short headword and	appro-
	      priate definition	will be	ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a	help message

       --locale	locale
	      Specifies	the locale used	for sorting.  If no locale  is	speci-
	      fied,  the  "C"  locale is used. For using UTF-8 mode, --utf8 is
	      needed.

       --8bit generates	database in 8-bit mode,	see --locale option also.
	      Note: This option	is deprecated.	 Use  it  for  creating	 8-bit
	      (non-UTF8)  dictionaries only.  In order to create UTF-8 dictio-
	      nary, use	--utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
	      Specifies	that all characters should be used for the search,  by
	      default  only  alphabetic, numeric characters and	spaces are put
	      to .index	file and therefore are used  in	 search.  Creates  the
	      special entry 00-database-allchars.

       --case-sensitive
	      makes  the  search  case	sensitive.   Creates the special entry
	      00-database-case-sensitive.

       --headword-separator sep
	      sets the headword	separator, which allows	several	words to  have
	      the same definition.  For	example, if '--headword-separator %%%'
	      is given,	and the	input file contains 'autumn%%%fall', both 'au-
	      tumn'  and  'fall'  will be indexed as  headwords, with the same
	      definition.

       --index-data-separator sep
	      sets the index/data separator, which allows to set the first and
	      fourth  columns  of .index file independently. That is the first
	      column can be treated as an index	column (where the  MATCH  com-
	      mand  searches)  and the fourth column as	a result column	(where
	      the MATCH	gets things to be returned), and they (1-st  and  4-th
	      columns)	are completely independant of each other.  The default
	      value for	this separator is ASCII	symbol " \034".

       --break-headwords
	      multiple headwords will be written  on  separate	lines  in  the
	      .dict file.  For use with	'--headword-separator.

       --index-keep-orig
	      When  --utf-8  is	specified headwords are	lowercased and non-al-
	      phanumeric characters are	removed	from it	before saving to  .in-
	      dex  file	 in  order to simplify the search.  When --index-keep-
	      orig option is used fourth column	is created (if	necessary)  in
	      .index file, and contains	an original headword which is returned
	      by MATCH command.	 This option may be useful to prevent convert-
	      ing  "  AT&T"  to	" ATT" or to keep proper nouns with uppercased
	      first letter.

       --without-headword
	      headwords	will not be included in	.dict file

       --without-header
	      header will not be copied	to DB info entry

       --without-url
	      URL will not be copied to	DB info	entry

       --without-time
	      time of creation will not	be copied to DB	info entry

       --without-ver
	      By default dictfmt creates a special entry  00-database-dictfmt-
	      X.Y.Z  that  contains  (in .dict file) dictfmt version in	format
	      dictfmt-X.Y.Z. This option suppresses this.

       --without-info
	      DB info entry will not  be  created.   This  may	be  useful  if
	      00-database-info	headword  is expected from stdin (dictunformat
	      outputs it).

       --columns columns
	      By default dictfmt wraps strings read from stdin to 72  columns.
	      This  option changes this	default. If it is set to zero or nega-
	      tive value, wrapping is off.

       --default-strategy strategy
	      Sets the default search strategy for the database.  It  will  be
	      used  instead  of	 strategy  '.'.	 Special entry 00-database-de-
	      fault-strategy is	created	for this purpose.  This	option may  be
	      useful,  for example, for	dictionaries containing	mainly phrases
	      but the single words.  In	any case, use this option if  you  are
	      absolutely sure what you are doing.

       --mime-header mime_header
	      When client sends	OPTION MIME command to the dictd , definitions
	      found in this database  are  prepended  by  the  specified  MIME
	      header. Creates the special entry	00-database-mime-header.

CREDITS
       dictfmt	was  written  by  Rik  Faith (faith@cs.unc.edu)	as part	of the
       dict-misc package.  dictfmt is distributed under	the terms of  the  GNU
       General	Public	License.  If you need to distribute under other	terms,
       write to	the author.

AUTHOR
       This manual page	 was  written  by  Robert  D.  Hilliard	 <hilliard@de-
       bian.org> .

SEE ALSO
       dict(1),	 dictd(8),  dictzip(1),	 dictunformat(1), http://www.dict.org,
       RFC 2229

			       25 December 2000			    DICTFMT(1)

NAME | SYNOPSIS | DESCRIPTION | FORMATTING OPTIONS | OPTIONS | CREDITS | AUTHOR | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=dictfmt&sektion=1&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help