Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
dtsrfzkfiles(special file)			    dtsrfzkfiles(special file)

       dtsrfzkfiles -- Describes the formats of	DtSearch fzk files


       An fzk file contains one	or more	documents to be	loaded into a database
       in a  simple  canonical	format.	 It  is	 read  by  both	 dtsrload  and
       dstrindex.  It  is  typically a transient file created only for loading
       and indexing, and then discarded.

   Header Portion
       The header portion of each document in an fzk file consists of 4	 lines
       of ASCII	text, ie 4 ASCII strings, each ending in ASCII line feed char-
       acters (0fP, 0x0A).

       Line 1 of each document in a DtSearch fzk file must contain  the	 hard-
       coded string 0,20fP.

       Line  2	must  contain the string ABSTRACT: beginning in	column 1, fol-
       lowed by	the text desired to be returned	on the results list  when  the
       document	is the result of a successful search by	the API.  The abstract
       can contain any desired text up to the maximum length in	 bytes	speci-
       fied  for  the database at creation time. Abstracts are often displayed
       to the user after a successful search as	an aid in deciding whether  to
       retrieve	 the full document. Alternatively abstracts may	be a file name
       or URL used as a	reference by the developer's application  to  retrieve
       the document without further assistance from the	search engine.

       Line  3	must  contain the unique document key beginning	in column 1. A
       document	key is a text string containing	all text up to the linefeed at
       the  end	 of the	line, up to the	maximum	database key size specified by
       the DtSrMAX_DB_KEYSIZE constant.	Unique means that if the  key  already
       exists  in  the database, the load program will replace the document in
       its entirety by the new document	(an update). If	the key	does  not  al-
       ready exist, the	document will be newly created (an add).

       The first character of the unique document key is called	the "keytype".
       The search engine has the ability to limit searches to  user  specified
       subsets	of  keytypes, so keytypes are a	logical, second	level of data-
       base organization. Typically, keytypes are used by developers  to  dis-
       tinguish	 document  "types"  or	"sources" in a manner that may be per-
       ceived as meaningful to the application or users.

       Line 4 is the document date. It must begin in column 1 and  conform  to
       this exact pattern:


       The  slashes,  tilde,  and colon	are mandatory.	The numeric values are
       integers	based on the Gregorian calendar:

       yy	 The number of years since 1900.

       mm	 A month number	from 1 to 12.

       dd	 A day number from 1 to	31, but	valid for the indicated	month.

       hh	 A 24-hour clock hour number (military designation), where "0"
		 is midnight, "13" is one o'clock pm, etc.

       mm	 The minutes number from "0" to	"59".

       The  search  engine has the ability to limit searches to	ranges of user
       specified document dates. If Line 4 contains an	invalid	 date  format,
       the  load  program  will	provide	a default document date	of the current
       run date.  Documents may	be marked "undated" with the null date	string
       "0/0/0~0:0".   Undated documents	always qualify for results lists irre-
       spective	of date	range qualifiers in the	API search function  DtSearch-

   Text	Portion
       All subsequent text (that is, all characters in the fzk file stream af-
       ter Line	4 and up to the	end-of-record delimiter	 string)  is  document
       text.   The text	portion	is not presumed	to be ASCII nor	presumed to be
       periodically marked by ASCII linefeeds.	Although typical,  it  is  not
       strictly	 necessary that	the text portion of a document in the fzk file
       be identical for	both programs.

       dtsrload	reads only the text portion for	 AusText  type	databases.  It
       compresses and stores AusText type text in the database document	repos-
       itory (see dtsrcreate(1)). In this case,	the text portion should	be the
       exact  desired  image to	be retrieved by	subsequent API retrieval func-
       tions. The text portion of a document in	an fzk	file  for  a  DtSearch
       type database is	discarded by dtsrload.

       On  the other hand, dtsrindex reads the text portion for	all databases,
       but only	to parse and index words for subsequent	API search  functions.
       Word  parsing  is  performed  in	 the specified language	and linguistic
       codeset of the database.

       As an example of	how the	fzk file might be different for	document load-
       ing  and	word parsing, consider a tag-formatted document.  The document
       in its entirety might be	in the text portion of the fzk file for	 dtsr-
       load, while the tags might be stripped from the file for	dtsrindex.

   ETX String
       Documents  are  delimited in an fzk file	by a special end-of-text (ETX)
       string occurring	at the end of the text portion.	By convention the  ETX
       string  is  an  ASCII  formfeed character followed by an	ASCII linefeed
       character (fP, 0x0C0A). However,	dtsrload  and  dtsrindex  can  be  in-
       structed	 to use	a different string by optional command line arguments.
       The ETX string is strictly a record separator;  it  is  not  considered
       part of the text	of the previous	record and is always discarded.

       dtsrcreate(1),  dtsrhan(1),  dtsrload(1), dtsrindex(1), DtSrAPI(3), Dt-

						    dtsrfzkfiles(special file)


Want to link to this manual page? Use this URL:

home | help