Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordType(3)		   Library Functions Manual		   WordType(3)

NAME
       WordType	- defines a word in term of allowed characters,	length etc.

SYNOPSIS
       Only called thru	WordContext::Initialize()

DESCRIPTION
       WordType	 defines  an indexed word and operations to validate a word to
       be indexed. All words inserted into the mifluz index  are  Normalize  d
       before  insertion. The configuration options give some control over the
       definition of a word.

CONFIGURATION
       For more	information on the configuration  attributes  and  a  complete
       list of attributes, see the mifluz(3) manual page.

       wordlist_locale <locale>	(default C)
	      Set the locale of	the program to locale for more information.

       wordlist_allow_numbers {true|false} <number> (default false)
	      A	 digit	is  considered a valid character within	a word if this
	      configuration parameter is set to	true otherwise it is an	 error
	      to  insert  a  word containing digits.  See the Normalize	method
	      for more information.

       wordlist_mimimun_word_length <number> (default 3)
	      The minimum length of a word.  See the Normalize method for more
	      information.

       wordlist_maximum_word_length <number> (default 25)
	      The maximum length of a word.  See the Normalize method for more
	      information.

       wordlist_allow_numbers {true|false} <number> (default false)
	      A	digit is considered a valid character within a	word  if  this
	      configuration  parameter is set to true otherwise	it is an error
	      to insert	a word containing digits.  See	the  Normalize	method
	      for more information.

       wordlist_truncate {true|false} <number> (default	true)
	      If   a   word  is	 too  long  according  to  the	wordlist_maxi-
	      mum_word_length it is truncated if this configuration  parameter
	      is true otherwise	it is considered an invalid word.

       wordlist_lowercase {true|false} <number>	(default true)
	      If  a word contains upper	case letters it	is converted to	lower-
	      case if this configuration parameter is true,  otherwise	it  is
	      left untouched.

       wordlist_valid_punctuation [characters] (default	none)
	      A	 list  of  punctuation	characters  that may appear in a word.
	      These characters will be removed from the	word before  insertion
	      in the index.

METHODS
       int Normalize(String &s)	const
	      Normalize	 a  word according to configuration specifications and
	      builtin transformations.	Every word inserted  in	 the  inverted
	      index  goes  thru	 this  function. If a word is rejected (return
	      value has	WORD_NORMALIZE_NOTOK bit set) it will not be  inserted
	      in  the index. If	a word is accepted (return value has WORD_NOR-
	      MALIZE_OK	bit set) it will be inserted in	the index. In addition
	      to these two bits, informational values are stored that give in-
	      formation	on the processing done on the  word.   The  bit	 field
	      values and their meanings	are as follows:

       WORD_NORMALIZE_TOOLONG
	      the word length exceeds the value	of
		  the wordlist_maximum_word_length configuration parameter.

       WORD_NORMALIZE_TOOSHORT
	      the word length is smaller than the value	of
		  the wordlist_minimum_word_length configuration parameter.

       WORD_NORMALIZE_CAPITAL
	      the word contained capital letters and has been converted
		  to lowercase.	This bit is only set
		  if the wordlist_lowercase configuration parameter
		  is true.

       WORD_NORMALIZE_NUMBER
	      the word contains	digits and the configuration
		  parameter wordlist_allow_numbers is set to false.

       WORD_NORMALIZE_CONTROL
	      the word contains	control	characters.

       WORD_NORMALIZE_BAD
	      the word is listed in the	file pointed by
		  the wordlist_bad_word_list configuration parameter.

       WORD_NORMALIZE_NULL
	      the word is a zero length	string.

       WORD_NORMALIZE_PUNCTUATION
	      at least one character listed in
		  the wordlist_valid_punctuation attribute was removed
		  from the word.

       WORD_NORMALIZE_NOALPHA
	      the word does not	contain	any alphanumerical character.

       static String NormalizeStatus(int flags)
	      Returns  a  string  explaining the return	flags of the Normalize
	      method.

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1),	mifluzload(1),
       mifluzsearch(1),	  mifluzdict(1),  WordContext(3),  WordList(3),	 Word-
       Dict(3),	 WordListOne(3),  WordKey(3),  WordKeyInfo(3),	WordDBInfo(3),
       WordRecordInfo(3),   WordRecord(3),   WordReference(3),	WordCursor(3),
       WordCursorOne(3), WordMonitor(3), Configuration(3), mifluz(3)

				     local			   WordType(3)

NAME | SYNOPSIS | DESCRIPTION | CONFIGURATION | METHODS | AUTHORS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=WordType&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help