Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
dtsrhanfile(special file)			     dtsrhanfile(special file)

       dtsrhanfile -- Describes	the format and syntax of DtSearch han files


       Han files are the user generated	profile	files for dtsrhan.  They iden-
       tify fields in incoming text from which output fzk file fields  can  be
       constructed.  The data from han files are loaded	into memory by dtsrhan
       at initialization time.	dtsrhan	and han	files have not	been  interna-
       tionalized; han files may only contain ASCII characters.

   General Format
       All identifiers must begin with a letter, and must be composed entirely
       of alphanumerics	and/or the underscore.

       Observe the following points when using using "strings":

	  o  If	an identifying string contains quotes, use a backslash to cre-
	     ate the quote. Example:

       this string

		 would find the	string this string "contains" quotes.

	  o  The  above	 point makes it	necessary to use double	backslashes to
	     create a single backslash.	Example:

       this string has a \ backslash

		 would find the	string this string has a  backslash.

	  o  Actually, using the backslash in any string will cause  the  next
	     character	to  be included	without	exception. Thus, a string with
	     this is  test will	end up being this is a test.  The backslash is
	     ignored,  and  the	next character is imbedded in the string. This
	     is	only needed in the two cases described above, but can be  used
	     for any purpose.

   Individual Line Syntax
       # ... | blank line
		 Han file comment. Any line beginning with a pound sign	in the
		 first column, or any blank line, is discarded.

       line identifier = physical_line_number
		 Defines a line	with a physical	line  number  in  the  record.
		 physical_line_number must be a	number.

       line identifier = column_number,
		 Defines a line	using a	column number and a 'signature'	string
		 that should appear at that column.  column_number  can	 be  a
		 number,  or  *	 for 'any column'. "string" should be a	string
		 that occurs on	the line in question. It is possible to	define
		 complex signatures using multiple clauses.

       field identifier	= line_identifier,
		 Defines  a  field based on a declared line, a string found on
		 that line, the	offset from the	first letter  of  the  string,
		 and the length	of field.

		 line_identifier  is  an identifier declared with the line di-
		 rective (see above).

		 "string" is a string for relative positioning,	where a	 field
		 will  follow  a  string that may not always occur in the same
		 position on a line. If	it is known that the field will	always
		 be  in	 the  same  position, an empty string("") may be used.
		 string	must be	enclosed in double quotes.  offset must	 be  a
		 number,  identifying  the  offset from	the first character in
		 the string. It	starts at position 1, not 0, and may be	 nega-

		 length	 represents  the length	of the field. It may be	a num-
		 ber, or it may	be one of two special tokens:

		 eow	   End of word.	The field will	begin  at  offset  and
			   continue until the next white-space character.

		 eoln	   End	of  line.  The	field will begin at offset and
			   continue to the end of the line.

		 An  identifier	 string	 beginning  with   3   uppercase   M's
		 ("MMM...")  will  be considered an English month name string.
		 At run	time, if the first 3 chars of the field's value	 equal
		 the  first  three  chars  of an English month name, the value
		 string	will be	translated to a	two character string of	digits
		 in  the range "01" to "12".  For example, if field MMMmymonth
		 had an	original value of "April ", it will be	translated  to
		 "04" before use.

		 In the	case where a line identifier is	associated with	multi-
		 ple lines in a	single document, the field value will  be  de-
		 termined  from	 the  last  occurrence	of the line within the

       constant	identifier =
		 Defines a constant field that can be used  in	abstracts  and
		 keys.	The  identifier	is defined exactly the same as a field
		 identifier. The value must be enclosed	in double quotes.

       date = null | field_id [+ field_id] ...
		 Defines the document date for each document. It will be  con-
		 verted	into a correctly formated fzk file date	line.

		 null  specifies  undated  documents. Undated documents	always
		 qualify for searches irrespective of date qualifiers  in  Dt-

		 field_id  is  an  identifier declared using the field or con-
		 stant directives (see above).	"MMM" fields are often	useful
		 for date assemblies.

		 Multiple fields may be	concatenated into a date.

		 After	concatenation,	the assembled date must	be of the fol-
		 lowing	format:	YYYYMMDDhhmm (exactly 12 digits). For example,
		 199404171701  is  April 17, 1994 at 5:01 pm.  200405031000 is
		 May 3,	2004, at 10:00 am (10 o'oclock).

		 Dates before 1900 or after 5995 are invalid.

		 If date is not	specified or  is  invalid,  a  generated  date
		 based	on  the	current	date and time will be used, but	an in-
		 valid date will also generate an error	message.

       key = field_id [+ field_id] ... | time |	count
		 Defines the unique database key for  each  record  in	a  fzk

		 field_id  is  a  field	identifier declared using the field or
		 constant directives.

		 Multiple fields may be	concatenated into a key.

		 time is a special keyword used	to generate keys based on  the
		 current run date and time, plus a sequential count suffix.

		 count	is  a special keyword used to generate keys based on a
		 sequential count of records.

       upper	 Specifies that	keys written by	handel are to be entirely con-
		 verted	 to  upper  case. Without using	this directive,	mixed-
		 case keys are allowed.

       keychar = A | B | ...Z
		 Defines the character used to categorize keys	for  DtSearch.
		 It must be an uppercase ASCII alphabetic character.

       delimiter = line_identifer, bottom
		 Defines  the  end  of text (ETX) delimiter that will separate

		 line_identifier is an identifier declared with	the  line  di-

		 bottom	 is  required. It specifies that the ETX will occur at
		 the bottom of each record. Top	of record delimiters  are  not

       image = all | none
		 Defines  whether  the document	image retrieved	by DtSearchRe-
		 trieve	is to contain all or none of the record, prior to  ap-
		 plication of imageinclude or imageexclude directives later in
		 the han file. It defaults to all.

       imageinclude = line_identifier [- line_identifier]
		 Defines a line	(or range of lines) to be included in the  im-
		 age.  line_identifier is an identifier	declared with the line

       imageexclude = line_identifier [- line_identifier]
		 Defines a line	(or range of lines) to be  excluded  from  the
		 image.	  line_identifier  is  an identifier declared with the
		 line directive.

       abstract	= field(s) field_identifier [+ field_identifier]...
		 Defines the abstract to be placed into	the fzk	 file.	It  is
		 created  from	the concatenations of fields. field_identifier
		 is an identifier declared with	the field directive.

       delblanklines = true | false
		 Determines if blank lines are to be removed from  the	record
		 image or not. It defaults to false.

       The  sample han file shown here describes a text	file containing	a con-
       catenated set of	man pages documents.

       # All records in	the incoming text file are delimited by	the same
       # end of	text convention	as the default for an fzk file,	namely
       # a linefeed (control-L)	on a line by itself (").
       # Define	a line named "etx" with	that description,
       # and declare it	to be the <delimiter>.
       # Note that there must be a real	ASCII control-L	character between
       # the quotes in the line	below.
       line etx	= *,"^L"
       delimiter = etx,	bottom

       # The command name that the man page is describing is on	the first line.
       # To access it we need to define	a line directive for line number 1.
       line line1 = 1

       # The name of the man page command begins in column 3 of	line 1,
       # and the length	is variable.  So we define a field identifier
       # named "command1" from column 3	to the end of the word.
       field command1 =	line1,"",3,eow

       # We want each document abstract	to have	a constant prefix
       # followed by the name of the command.
       constant	preabs = "Man Pages for	"
       abstract	= fields preabs	+ command1

       # We want all keys to be	the name of the	command, prefixed with
       # the same identifying character, an uppercase M.
       keychar = M
       key = command1

       # We want the each document date	to be equivalent to the	release
       # date of the original man pages, which we choose here to hard code
       # as November 1,	1994, at 1 o'clock in the afternoon.
       constant	datecons = "199411011300"
       date = datecons

       dtsrhan(1),  dtsrindex(1),   dtsrfzkfiles(4),   dtsrlangfiles(4),   Dt-

						     dtsrhanfile(special file)


Want to link to this manual page? Use this URL:

home | help