Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordList(3)		   Library Functions Manual		   WordList(3)

NAME
       WordList	-

       abstract	class to manage	and use	an inverted index file.

SYNOPSIS
       #include	<mifluz.h>

       WordContext context;

       WordList* words = context->List();

       delete words;

DESCRIPTION
       WordList	 is the	mifluz equivalent of a database	handler. Each WordList
       object is bound to an inverted index file and implements	the operations
       to  create  it,	fill  it with word occurrences and search for an entry
       matching	a given	criterion.

       WordList	is an abstract class and cannot	 be  instanciated.   The  List
       method  of  the class WordContext will create an	instance using the ap-
       propriate derived class,	either WordListOne or WordListMulti. Refer  to
       the  corresponding  manual pages	for more information on	their specific
       semantic.

       When doing bulk insertions, mifluz creates temporary files that contain
       the  entries  to	 be  inserted  in the index. Those files are typically
       named indexC00000000 temporary file is wordlist_cache_size  /  2.  When
       the  maximum  size of the temporary file	is reached, mifluz creates an-
       other temporary file named indexC00000001 created 50 temporary file. At
       this  point  it	merges	all temporary files into one that replaces the
       first indexC00000000 to create temporary	file again and keeps following
       this  algorithm until the bulk insertion	is finished. When the bulk in-
       sertion is finished, mifluz has one big file named indexC00000000  that
       contains	 all  the  entries to be inserted in the index.	mifluz inserts
       all the entries from indexC00000000 into	the index and delete the  tem-
       porary file when	done. The insertion will be fast since all the entries
       in indexC00000000 are already sorted.

       The parameter wordlist_cache_max	can be used to prevent	the  temporary
       files  to grow indefinitely. If the total cumulated size	of the indexC*
       files grow beyond this parameter, they are merged into the  main	 index
       and  deleted.  For  instance  setting  this  parameter  value  to 500Mb
       garanties that the total	size of	the indexC* files will not grow	 above
       500Mb.

CONFIGURATION
       For  more  information  on  the configuration attributes	and a complete
       list of attributes, see the mifluz(3) manual page.

       wordlist_extend {true|false} (default false)
	      If true maintain reference count of unique  words.  The  Noccur-
	      rence method gives access	to this	count.

       wordlist_verbose	<number> (default 0)
	      Set the verbosity	level of the WordList class.

	      1	walk logic

	      2	walk logic details

	      3	walk logic lots	of details

       wordlist_page_size <bytes> (default 8192)
	      Berkeley DB page size (see Berkeley DB documentation)

       wordlist_cache_size <bytes> (default 500K)
	      Berkeley	DB  cache  size	 (see Berkeley DB documentation) Cache
	      makes a huge difference in performance. It must be at  least  2%
	      of the expected total data size. Note that if compression	is ac-
	      tivated the data size is eight times larger than the actual file
	      size.  In	 this  case the	cache must be scaled to	2% of the data
	      size, not	2% of the file size. See Cache tuning  in  the	mifluz
	      guide  for more hints.  See WordList(3) for the rationale	behind
	      cache file handling.

       wordlist_cache_max <bytes> (default 0)
	      Maximum size of the cumulated cache files	generated  when	 doing
	      bulk  insertion  with the	BatchStart() function. When this limit
	      is reached, the cache files are all merged into the inverted in-
	      dex.   The value 0 means infinite	size allowed.  See WordList(3)
	      for the rationale	behind cache file handling.

       wordlist_cache_inserts {true|false} (default false)
	      If true all Insert calls are cached in memory. When the WordList
	      object  is  closed  or  a	 different access method is called the
	      cached entries are flushed in the	inverted index.

       wordlist_compress {true|false} (default false)
	      Activate compression of the index. The resulting index is	 eight
	      times smaller than the uncompressed index.

METHODS
       inline WordContext* GetContext()
	      Return  a	 pointer to the	WordContext object used	to create this
	      instance.

       inline const WordContext* GetContext() const
	      Return a pointer to the WordContext object used to  create  this
	      instance as a const.

       virtual inline int Override(const WordReference&	wordRef)
	      Insert wordRef in	index. If the Key() part of the	wordRef	exists
	      in the index, override it.  Returns OK on	success, NOTOK on  er-
	      ror.

       virtual int Exists(const	WordReference& wordRef)
	      Returns OK if wordRef exists in the index, NOTOK otherwise.

       inline int Exists(const String& word)
	      Returns OK if word exists	in the index, NOTOK otherwise.

       virtual int WalkDelete(const WordReference& wordRef)
	      Delete all entries in the	index whose key	matches	the Key() part
	      of wordRef , using the Walk method.  Returns the number  of  en-
	      tries successfully deleted.

       virtual int Delete(const	WordReference& wordRef)
	      Delete  the  entry  in  the index	that exactly matches the Key()
	      part of wordRef.	Returns	OK if deletion is  successfull,	 NOTOK
	      otherwise.

       virtual int Open(const String& filename,	int mode)
	      Open  inverted  index filename.  mode may	be O_RDONLY or O_RDWR.
	      If mode is O_RDWR	it can be or'ed	with O_TRUNC to	reset the con-
	      tent of an existing inverted index.  Return OK on	success, NOTOK
	      otherwise.

       virtual int Close()
	      Close inverted index.  Return OK on success, NOTOK otherwise.

       virtual unsigned	int Size() const
	      Return the size of the index in pages.

       virtual int Pagesize() const
	      Return the page size

       virtual WordDict	*Dict()
	      Return a pointer to the inverted index dictionnary.

       const String& Filename()	const
	      Return the filename given	to the last call to Open.

       int Flags() const
	      Return the mode given to the last	call to	Open.

       inline List *Find(const WordReference& wordRef)
	      Returns the list of word occurrences exactly matching the	 Key()
	      part  of	wordRef.   The List returned contains pointers to Wor-
	      dReference objects. It is	the responsibility of  the  caller  to
	      free the list. See List.h	header for usage.

       inline List *FindWord(const String& word)
	      Returns  the list	of word	occurrences exactly matching the word.
	      The List returned	contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual List *operator [] (const	WordReference& wordRef)
	      Alias to the Find	method.

       inline List *operator []	(const String& word)
	      Alias to the FindWord method.

       virtual List *Prefix (const WordReference& prefix)
	      Returns the list of word occurrences matching the	Key() part  of
	      wordRef.	 In  the Key() , the string (accessed with GetWord() )
	      matches any string that begins with it. The List	returned  con-
	      tains  pointers to WordReference objects.	It is the responsibil-
	      ity of the caller	to free	the list.

       inline List *Prefix (const String& prefix)
	      Returns the list of word occurrences matching the	word.  In  the
	      Key() , the string (accessed with	GetWord() ) matches any	string
	      that begins with it. The List returned contains pointers to Wor-
	      dReference  objects.  It	is the responsibility of the caller to
	      free the list.

       virtual List *Words()
	      Returns a	list of	all unique words contained in the inverted in-
	      dex.  The	 List returned contains	pointers to String objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual List *WordRefs()
	      Returns  a  list of all entries contained	in the inverted	index.
	      The List returned	contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual WordCursor  *Cursor(wordlist_walk_callback_t  callback,	Object
       *callback_data)
	      Create  a	 cursor	 that  searches	all the	occurrences in the in-
	      verted index and call ncallback with  ncallback_data  for	 every
	      match.

       virtual	WordCursor  *Cursor(const WordKey &searchKey, int action = HT-
       DIG_WORDLIST_WALKER)
	      Create a cursor that searches all	the  occurrences  in  the  in-
	      verted  index  and  that match nsearchKey.  If naction is	set to
	      HTDIG_WORDLIST_WALKER	calls	  searchKey.callback	  with
	      searchKey.callback_data  for  every  match. If naction is	set to
	      HTDIG_WORDLIST_COLLECT push each match  in  searchKey.collectRes
	      data  member as a	WordReference object. It is the	responsibility
	      of the caller to free the	searchKey.collectRes list.

       virtual	   WordCursor	   *Cursor(const      WordKey	   &searchKey,
       wordlist_walk_callback_t	callback, Object * callback_data)
	      Create  a	 cursor	 that  searches	all the	occurrences in the in-
	      verted index and that match nsearchKey and calls ncallback  with
	      ncallback_data for every match.

       virtual WordKey Key(const String& bufferin)
	      Create  a	WordKey	object and return it. The bufferin argument is
	      used to initialize the key, as in	the WordKey::Set method.   The
	      first component of bufferin must be a word that is translated to
	      the  corresponding  numerical  id	 using	the   WordDict::Serial
	      method.

       virtual WordReference Word(const	String&	bufferin, int exists = 0)
	      Create  a	WordReference object and return	it. The	bufferin argu-
	      ment is used to initialize the structure,	as in  the  WordRefer-
	      ence::Set	 method.   The	first  component of bufferin must be a
	      word that	is translated to the corresponding numerical id	 using
	      the  WordDict::Serial  method.  If the exists argument is	set to
	      1, the method WordDict::SerialExists is used instead, that is no
	      serial  is assigned to the word if it does not already have one.
	      Before translation  the  word  is	 normalized  using  the	 Word-
	      Type::Normalize  method.	The word is saved using	the WordRefer-
	      ence::SetWord method.

       virtual WordReference WordExists(const String& bufferin)
	      Alias for	Word(bufferin, 1).

       virtual void BatchStart()
	      Accelerate bulk insertions in the	inverted index.	All  insertion
	      done  with  the Override method are batched instead of being up-
	      dating the inverted index	immediately.  No  update  of  the  in-
	      verted index file	is done	before the BatchEnd method is called.

       virtual void BatchEnd()
	      Terminate	a bulk insertion started with a	call to	the BatchStart
	      method. When all insertions are done the AllRef method is	called
	      to restore statistics.

       virtual	int  Noccurrence(const String& key, unsigned int& noccurrence)
       const
	      Return in	noccurrence the	number of occurrences  of  the	string
	      contained	 in the	GetWord() part of key.	Returns	OK on success,
	      NOTOK otherwise.

       virtual int Write(FILE* f)
	      Write on file descriptor f an ASCII description  of  the	index.
	      Each  line  of  the file contains	a WordReference	ASCII descrip-
	      tion.  Return OK on success, NOTOK otherwise.

       virtual int WriteDict(FILE* f)
	      Write on file descriptor f the complete dictionnary with statis-
	      tics.  Return OK on success, NOTOK otherwise.

       virtual int Read(FILE* f)
	      Read  WordReference ASCII	descriptions from f , returns the num-
	      ber of inserted WordReference or < 0 if an error occurs. Invalid
	      descriptions are ignored as well as empty	lines.

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1),	mifluzload(1),
       mifluzsearch(1),	mifluzdict(1), WordContext(3),	WordDict(3),  WordLis-
       tOne(3),	 WordKey(3),  WordKeyInfo(3), WordType(3), WordDBInfo(3), Wor-
       dRecordInfo(3), WordRecord(3), WordReference(3),	 WordCursor(3),	 Word-
       CursorOne(3), WordMonitor(3), Configuration(3), mifluz(3)

				     local			   WordList(3)

NAME | SYNOPSIS | DESCRIPTION | CONFIGURATION | METHODS | AUTHORS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=WordList&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help