Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
ODEUM(3)		    Quick Database Manager		      ODEUM(3)

NAME
       Odeum - the inverted API	of QDBM

SYNOPSIS
       #include	<depot.h>
       #include	<cabin.h>
       #include	<odeum.h>
       #include	<stdlib.h>

       typedef struct {	int id;	int score; } ODPAIR;

       ODEUM *odopen(const char	*name, int omode);

       int odclose(ODEUM *odeum);

       int odput(ODEUM *odeum, const ODDOC *doc, int wmax, int over);

       int odout(ODEUM *odeum, const char *uri);

       int odoutbyid(ODEUM *odeum, int id);

       ODDOC *odget(ODEUM *odeum, const	char *uri);

       ODDOC *odgetbyid(ODEUM *odeum, int id);

       int odgetidbyuri(ODEUM *odeum, const char *uri);

       int odcheck(ODEUM *odeum, int id);

       ODPAIR *odsearch(ODEUM *odeum, const char *word,	int max, int *np);

       int odsearchdnum(ODEUM *odeum, const char *word);

       int oditerinit(ODEUM *odeum);

       ODDOC *oditernext(ODEUM *odeum);

       int odsync(ODEUM	*odeum);

       int odoptimize(ODEUM *odeum);

       char *odname(ODEUM *odeum);

       double odfsiz(ODEUM *odeum);

       int odbnum(ODEUM	*odeum);

       int odbusenum(ODEUM *odeum);

       int oddnum(ODEUM	*odeum);

       int odwnum(ODEUM	*odeum);

       int odwritable(ODEUM *odeum);

       int odfatalerror(ODEUM *odeum);

       int odinode(ODEUM *odeum);

       time_t odmtime(ODEUM *odeum);

       int odmerge(const char *name, const CBLIST *elemnames);

       int odremove(const char *name);

       ODDOC *oddocopen(const char *uri);

       void oddocclose(ODDOC *doc);

       void oddocaddattr(ODDOC *doc, const char	*name, const char *value);

       void oddocaddword(ODDOC *doc, const char	*normal, const char *asis);

       int oddocid(const ODDOC *doc);

       const char *oddocuri(const ODDOC	*doc);

       const char *oddocgetattr(const ODDOC *doc, const	char *name);

       const CBLIST *oddocnwords(const ODDOC *doc);

       const CBLIST *oddocawords(const ODDOC *doc);

       CBMAP *oddocscores(const	ODDOC *doc, int	max, ODEUM *odeum);

       CBLIST *odbreaktext(const char *text);

       char *odnormalizeword(const char	*asis);

       ODPAIR  *odpairsand(ODPAIR *apairs, int anum, ODPAIR *bpairs, int bnum,
       int *np);

       ODPAIR *odpairsor(ODPAIR	*apairs, int anum, ODPAIR *bpairs,  int	 bnum,
       int *np);

       ODPAIR  *odpairsnotand(ODPAIR  *apairs,	int  anum, ODPAIR *bpairs, int
       bnum, int *np);

       void odpairssort(ODPAIR *pairs, int pnum);

       double odlogarithm(double x);

       double odvectorcosine(const int *avec, const int	*bvec, int vnum);

       void odsettuning(int ibnum, int idnum, int cbnum, int csiz);

       void odanalyzetext(ODEUM	*odeum,	const char *text, CBLIST *awords,  CB-
       LIST *nwords);

       void  odsetcharclass(ODEUM  *odeum,  const char *spacechars, const char
       *delimchars, const char *gluechars);

       ODPAIR *odquery(ODEUM *odeum, const char	*query,	int *np,  CBLIST  *er-
       rors);

DESCRIPTION
       Odeum is	the API	which handles an inverted index.  An inverted index is
       a data structure	to retrieve a list of some documents that include  one
       of  words  which	 were extracted	from a population of documents.	 It is
       easy to realize a full-text  search  system  with  an  inverted	index.
       Odeum  provides	an abstract data structure which consists of words and
       attributes of a document.  It is	used when an application stores	a doc-
       ument  into a database and when an application retrieves	some documents
       from a database.

       Odeum does not provide methods to extract the text  from	 the  original
       data  of	 a  document.	It should be implemented by applications.  Al-
       though Odeum provides utilities to extract words	from  a	 text,	it  is
       oriented	to such	languages whose	words are separated with space charac-
       ters as English.	 If an application handles such	languages  which  need
       morphological  analysis or N-gram analysis as Japanese, or if an	appli-
       cation perform more such	rarefied  analysis  of	natural	 languages  as
       stemming, its own analyzing method can be adopted.  Result of search is
       expressed as an array contains elements which are  structures  composed
       of  the	ID number of documents and its score.  In order	to search with
       two or more words, Odeum	provides utilities of set operations.

       Odeum is	implemented, based on Curia, Cabin, and	Villa.	Odeum  creates
       a  database  with  a directory name.  Some databases of Curia and Villa
       are placed in the specified  directory.	 For  example,	`casket/docs',
       `casket/index', and `casket/rdocs' are created in the case that a data-
       base directory named as `casket'.  `docs' is a  database	 directory  of
       Curia.	The key	of each	record is the ID number	of a document, and the
       value is	such attributes	as URI.	 `index' is a  database	 directory  of
       Curia.	The  key  of each record is the	normalized form	of a word, and
       the value is an array whose element is a	pair of	the  ID	 number	 of  a
       document	 including the word and	its score.  `rdocs' is a database file
       of Villa.  The key of each record is the	URI of	a  document,  and  the
       value is	its ID number.

       In  order  to  use  Odeum,  you	should	include	 `depot.h', `cabin.h',
       `odeum.h' and `stdlib.h'	in the source files.  Usually,	the  following
       description will	be near	the beginning of a source file.

	      #include <depot.h>
	      #include <cabin.h>
	      #include <odeum.h>
	      #include <stdlib.h>

       A  pointer  to `ODEUM' is used as a database handle.  A database	handle
       is opened with the function `odopen' and	closed	with  `odclose'.   You
       should  not refer directly to any member	of the handle.	If a fatal er-
       ror occurs in a database, any access method via the handle except  `od-
       close'  will  not  work and return error	status.	 Although a process is
       allowed to use multiple database	handles	at the same time,  handles  of
       the same	database file should not be used.

       A  pointer  to `ODDOC' is used as a document handle.  A document	handle
       is opened with the function `oddocopen' and closed  with	 `oddocclose'.
       You  should not refer directly to any member of the handle.  A document
       consists	of attributes and words.  Each word is expressed as a pair  of
       a normalized form and a appearance form.

       Odeum  also assign the external variable	`dpecode' with the error code.
       The function `dperrmsg' is used in order	to get the message of the  er-
       ror code.

       Structures  of  `ODPAIR'	 type  is  used	 in order to handle results of
       search.

       typedef struct {	int id;	int score; } ODPAIR;
	      `id' specifies the ID number of a	document.   `score'  specifies
	      the  score  calculated from the number of	searching words	in the
	      document.

       The function `odopen' is	used in	order to get a database	handle.

       ODEUM *odopen(const char	*name, int omode);
	      `name' specifies the name	 of  a	database  directory.   `omode'
	      specifies	  the  connection  mode:  `OD_OWRITER'	as  a  writer,
	      `OD_OREADER' as a	reader.	 If the	mode is	`OD_OWRITER', the fol-
	      lowing  may  be added by bitwise or: `OD_OCREAT',	which means it
	      creates a	new database if	not exist, `OD_OTRUNC',	which means it
	      creates  a  new  database	 regardless  if	 one  exists.  Both of
	      `OD_OREADER' and `OD_OWRITER' can	be added  to  by  bitwise  or:
	      `OD_ONOLCK',  which  means it opens a database directory without
	      file locking, or `OD_OLCKNB', which means	locking	 is  performed
	      without  blocking.   The	return value is	the database handle or
	      `NULL' if	it is not successful.  While connecting	as  a  writer,
	      an  exclusive  lock is invoked to	the database directory.	 While
	      connecting as a reader, a	shared lock is invoked to the database
	      directory.   The	thread	blocks until the lock is achieved.  If
	      `OD_ONOLCK' is used, the application is responsible  for	exclu-
	      sion control.

       The function `odclose' is used in order to close	a database handle.

       int odclose(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is true, else, it is  false.   Because  the	 region	 of  a
	      closed handle is released, it becomes impossible to use the han-
	      dle.  Updating a database	is assured to be written when the han-
	      dle  is closed.  If a writer opens a database but	does not close
	      it appropriately,	the database will be broken.

       The function `odput' is used in order to	store a	document.

       int odput(ODEUM *odeum, const ODDOC *doc, int wmax, int over);
	      `odeum' specifies	a  database  handle  connected	as  a  writer.
	      `doc'  specifies	a  document  handle.  `wmax' specifies the max
	      number of	words to be stored in the document database.  If it is
	      negative,	the number is unlimited.  `over' specifies whether the
	      data of the duplicated document is overwritten or	not.  If it is
	      false  and  the  URI of the document is duplicated, the function
	      returns as an error.  If successful, the return value  is	 true,
	      else, it is false.

       The function `odout' is used in order to	delete a document specified by
       a URI.

       int odout(ODEUM *odeum, const char *uri);
	      `odeum' specifies	a  database  handle  connected	as  a  writer.
	      `uri'  specifies	the  string of the URI of a document.  If suc-
	      cessful, the return value	is true, else, it is false.  False  is
	      returned when no document	corresponds to the specified URI.

       The  function  `odoutbyid' is used in order to delete a document	speci-
       fied by an ID number.

       int odoutbyid(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle connected as a writer.  `id'
	      specifies	 the  ID number	of a document.	If successful, the re-
	      turn value is true, else,	it is false.  False is	returned  when
	      no document corresponds to the specified ID number.

       The  function `odget' is	used in	order to retrieve a document specified
       by a URI.

       ODDOC *odget(ODEUM *odeum, const	char *uri);
	      `odeum' specifies	a database handle.  `uri' specifies the	string
	      of  the  URI  of a document.  If successful, the return value is
	      the handle of the	corresponding document,	else,  it  is  `NULL'.
	      `NULL' is	returned when no document corresponds to the specified
	      URI.  Because the	handle of the return value is opened with  the
	      function	`oddocopen',  it  should  be  closed with the function
	      `oddocclose'.

       The function `odgetbyid'	is used	in order to retrieve a document	by  an
       ID number.

       ODDOC *odgetbyid(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle.  `id' specifies the ID num-
	      ber of a document.  If successful, the return value is the  han-
	      dle  of  the corresponding document, else, it is `NULL'.	`NULL'
	      is returned when no document corresponds	to  the	 specified  ID
	      number.	Because	 the handle of the return value	is opened with
	      the function `oddocopen',	it should be closed with the  function
	      `oddocclose'.

       The  function `odgetidbyuri' is used in order to	retrieve the ID	of the
       document	specified by a URI.

       int odgetidbyuri(ODEUM *odeum, const char *uri);
	      `odeum' specifies	a database handle.  `uri' specifies the	string
	      the  URI	of a document.	If successful, the return value	is the
	      ID number	of the document, else, it is -1.  -1 is	returned  when
	      no document corresponds to the specified URI.

       The  function  `odcheck'	is used	in order to check whether the document
       specified by an ID number exists.

       int odcheck(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle.  `id' specifies the ID num-
	      ber of a document.  The return value is true if the document ex-
	      ists, else, it is	false.

       The function `odsearch' is used in order	to search the  inverted	 index
       for documents including a particular word.

       ODPAIR *odsearch(ODEUM *odeum, const char *word,	int max, int *np);
	      `odeum' specifies	a database handle.  `word' specifies a search-
	      ing word.	 `max' specifies the max number	of documents to	be re-
	      trieve.	`np'  specifies	the pointer to a variable to which the
	      number of	the elements of	the return value is assigned.  If suc-
	      cessful,	the  return value is the pointer to an array, else, it
	      is `NULL'.  Each element of the array is a pair of the ID	number
	      and  the	score of a document, and sorted	in descending order of
	      their scores.  Even if no	document corresponds to	the  specified
	      word,  it	 is not	error but returns an dummy array.  Because the
	      region of	the return value is allocated with the `malloc'	 call,
	      it should	be released with the `free' call if it is no longer in
	      use.  Note that each element of the array	of  the	 return	 value
	      can be data of a deleted document.

       The  function `odsearchnum' is used in order to get the number of docu-
       ments including a word.

       int odsearchdnum(ODEUM *odeum, const char *word);
	      `odeum' specifies	a database handle.  `word' specifies a search-
	      ing word.	 If successful,	the return value is the	number of doc-
	      uments including the word, else, it is -1.  Because  this	 func-
	      tion  does  not  read  the  entity  of the inverted index, it is
	      faster than `odsearch'.

       The function `oditerinit' is used in order to initialize	 the  iterator
       of a database handle.

       int oditerinit(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is true, else, it is false.	 The iterator is used in order
	      to access	every document stored in a database.

       The  function  `oditernext' is used in order to get the next key	of the
       iterator.

       ODDOC *oditernext(ODEUM *odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	 the  handle of	the next document, else, it is `NULL'.
	      `NULL' is	returned when no document is to	be get out of the  it-
	      erator.  It is possible to access	every document by iteration of
	      calling this function.  However, it is not assured  if  updating
	      the  database is occurred	while the iteration.  Besides, the or-
	      der of this traversal access method is arbitrary,	so it  is  not
	      assured  that the	order of string	matches	the one	of the traver-
	      sal access.  Because the handle of the return  value  is	opened
	      with  the	 function  `oddocopen',	 it  should be closed with the
	      function `oddocclose'.

       The function `odsync' is	used in	order to synchronize updating contents
       with the	files and the devices.

       int odsync(ODEUM	*odeum);
	      `odeum'  specifies  a database handle connected as a writer.  If
	      successful, the return value is true, else, it is	 false.	  This
	      function is useful when another process uses the connected data-
	      base directory.

       The function `odoptimize' is used in order to optimize a	database.

       int odoptimize(ODEUM *odeum);
	      `odeum' specifies	a database handle connected as a  writer.   If
	      successful,  the	return value is	true, else, it is false.  Ele-
	      ments of the deleted documents in	the inverted index are purged.

       The function `odname' is	used in	order to get the name of a database.

       char *odname(ODEUM *odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	the pointer to the region of the name of the database,
	      else, it is `NULL'.  Because the region of the return  value  is
	      allocated	with the `malloc' call,	it should be released with the
	      `free' call if it	is no longer in	use.

       The function `odfsiz' is	used in	order to get the total size  of	 data-
       base files.

       double odfsiz(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the total size of the database files, else, it is -1.0.

       The function `odbnum' is	used in	order to get the total number  of  the
       elements	of the bucket arrays in	the inverted index.

       int odbnum(ODEUM	*odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the total number	of the elements	of the bucket  arrays,
	      else, it is -1.

       The  function  `odbusenum'  is used in order to get the total number of
       the used	elements of the	bucket arrays in the inverted index.

       int odbusenum(ODEUM *odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value is the total number	of the used elements of	the bucket ar-
	      rays, else, it is	-1.

       The function `oddnum' is	used in	order to get the number	of  the	 docu-
       ments stored in a database.

       int oddnum(ODEUM	*odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the number of the documents  stored  in	the  database,
	      else, it is -1.

       The  function  `odwnum' is used in order	to get the number of the words
       stored in a database.

       int odwnum(ODEUM	*odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	 the number of the words stored	in the database, else,
	      it is -1.	 Because of the	I/O buffer, the	return	value  may  be
	      less than	the hard number.

       The  function `odwritable' is used in order to check whether a database
       handle is a writer or not.

       int odwritable(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return value  is  true
	      if the handle is a writer, false if not.

       The  function  `odfatalerror' is	used in	order to check whether a data-
       base has	a fatal	error or not.

       int odfatalerror(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return value  is  true
	      if the database has a fatal error, false if not.

       The  function  `odinode'	 is used in order to get the inode number of a
       database	directory.

       int odinode(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return	value  is  the
	      inode number of the database directory.

       The  function  `odmtime'	is used	in order to get	the last modified time
       of a database.

       time_t odmtime(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return	value  is  the
	      last modified time of the	database.

       The function `odmerge' is used in order to merge	plural database	direc-
       tories.

       int odmerge(const char *name, const CBLIST *elemnames);
	      `name' specifies the name	of a  database	directory  to  create.
	      `elemnames'  specifies a list of names of	element	databases.  If
	      successful, the return value is true, else, it is	false.	If two
	      or more documents	which have the same URL	come in, the first one
	      is adopted and the others	are ignored.

       The function `odremove' is used in order	to remove  a  database	direc-
       tory.

       int odremove(const char *name);
	      `name'  specifies	the name of a database directory.  If success-
	      ful, the return value is true, else, it is  false.   A  database
	      directory	 can contain databases of other	APIs of	QDBM, they are
	      also removed by this function.

       The function `oddocopen'	is used	in order to get	a document handle.

       ODDOC *oddocopen(const char *uri);
	      `uri' specifies the URI of a document.  The return  value	 is  a
	      document	handle.	  The  ID  number of a new document is not de-
	      fined.  It is defined when the document is stored	in a database.

       The function `oddocclose' is used in order to close a document handle.

       void oddocclose(ODDOC *doc);
	      `doc' specifies a	document handle.   Because  the	 region	 of  a
	      closed handle is released, it becomes impossible to use the han-
	      dle.

       The function `oddocaddattr' is used in order to add an attribute	 to  a
       document.

       void oddocaddattr(ODDOC *doc, const char	*name, const char *value);
	      `doc'  specifies a document handle.  `name' specifies the	string
	      of the name of an	attribute.  `value' specifies  the  string  of
	      the value	of the attribute.

       The  function  `oddocaddword' is	used in	order to add a word to a docu-
       ment.

       void oddocaddword(ODDOC *doc, const char	*normal, const char *asis);
	      `doc' specifies  a  document  handle.   `normal'	specifies  the
	      string  of  the normalized form of a word.  Normalized forms are
	      treated as keys of the inverted index.  If the  normalized  form
	      of  a  word is an	empty string, the word is not reflected	in the
	      inverted index.  `asis' specifies	the string of  the  appearance
	      form  of the word.  Appearance forms are used after the document
	      is retrieved by an application.

       The function `oddocid' is used in order to get the ID number of a docu-
       ment.

       int oddocid(const ODDOC *doc);
	      `doc'  specifies	a document handle.  The	return value is	the ID
	      number of	a document.

       The function `oddocuri' is used in order	to get the URI of a document.

       const char *oddocuri(const ODDOC	*doc);
	      `doc' specifies a	document handle.   The	return	value  is  the
	      string of	the URI	of a document.

       The function `oddocgetattr' is used in order to get the value of	an at-
       tribute of a document.

       const char *oddocgetattr(const ODDOC *doc, const	char *name);
	      `doc' specifies a	document handle.  `name' specifies the	string
	      of  the name of an attribute.  The return	value is the string of
	      the value	of the attribute, or `NULL'  if	 no  attribute	corre-
	      sponds.

       The function `oddocnwords' is used in order to get the list handle con-
       tains words in normalized form of a document.

       const CBLIST *oddocnwords(const ODDOC *doc);
	      `doc' specifies a	document handle.  The return value is the list
	      handle contains words in normalized form.

       The function `oddocawords' is used in order to get the list handle con-
       tains words in appearance form of a document.

       const CBLIST *oddocawords(const ODDOC *doc);
	      `doc' specifies a	document handle.  The return value is the list
	      handle contains words in appearance form.

       The  function `oddocscores' is used in order to get the map handle con-
       tains keywords in normalized form and their scores.

       CBMAP *oddocscores(const	ODDOC *doc, int	max, ODEUM *odeum);
	      `doc' specifies a	document handle.  `max'	specifies the max num-
	      ber  of  keywords	 to  get.  `odeum' specifies a database	handle
	      with which the IDF for weighting is calculate.  If it is `NULL',
	      it  is  not  used.   The return value is the map handle contains
	      keywords and their scores.   Scores  are	expressed  as  decimal
	      strings.	 Because the handle of the return value	is opened with
	      the function `cbmapopen',	it should be closed with the  function
	      `cbmapclose' if it is no longer in use.

       The  function `odbreaktext' is used in order to break a text into words
       in appearance form.

       CBLIST *odbreaktext(const char *text);
	      `text' specifies the string of a text.  The return value is  the
	      list  handle contains words in appearance	form.  Words are sepa-
	      rated with space characters and such delimiters as period, comma
	      and  so  on.   Because  the handle of the	return value is	opened
	      with the function	`cblistopen', it should	 be  closed  with  the
	      function `cblistclose' if	it is no longer	in use.

       The  function `odnormalizeword' is used in order	to make	the normalized
       form of a word.

       char *odnormalizeword(const char	*asis);
	      `asis' specifies the string of the appearance form  of  a	 word.
	      The  return value	is is the string of the	normalized form	of the
	      word.  Alphabets of the ASCII code are unified into lower	cases.
	      Words  composed of only delimiters are treated as	empty strings.
	      Because the region of the	return value  is  allocated  with  the
	      `malloc'	call, it should	be released with the `free' call if it
	      is no longer in use.

       The function `odpairsand' is used in order to get the  common  elements
       of two sets of documents.

       ODPAIR  *odpairsand(ODPAIR *apairs, int anum, ODPAIR *bpairs, int bnum,
       int *np);
	      `apairs' specifies the pointer to	 the  former  document	array.
	      `anum'  specifies	the number of the elements of the former docu-
	      ment array.  `bpairs' specifies the pointer to the latter	 docu-
	      ment  array.  `bnum' specifies the number	of the elements	of the
	      latter document array.  `np' specifies the pointer to a variable
	      to  which	 the number of the elements of the return value	is as-
	      signed.  The return value	is the pointer to a new	document array
	      whose  elements commonly belong to the specified two sets.  Ele-
	      ments of the array are  sorted  in  descending  order  of	 their
	      scores.	Because	 the  region  of the return value is allocated
	      with the `malloc'	call, it should	be released  with  the	`free'
	      call if it is no longer in use.

       The function `odpairsor'	is used	in order to get	the sum	of elements of
       two sets	of documents.

       ODPAIR *odpairsor(ODPAIR	*apairs, int anum, ODPAIR *bpairs,  int	 bnum,
       int *np);
	      `apairs'	specifies  the	pointer	 to the	former document	array.
	      `anum' specifies the number of the elements of the former	 docu-
	      ment  array.  `bpairs' specifies the pointer to the latter docu-
	      ment array.  `bnum' specifies the	number of the elements of  the
	      latter document array.  `np' specifies the pointer to a variable
	      to which the number of the elements of the return	value  is  as-
	      signed.  The return value	is the pointer to a new	document array
	      whose elements belong to both or either  of  the	specified  two
	      sets.   Elements	of the array are sorted	in descending order of
	      their scores.  Because the region	of the return value  is	 allo-
	      cated  with  the	`malloc'  call,	it should be released with the
	      `free' call if it	is no longer in	use.

       The function `odpairsnotand' is used in order to	get the	difference set
       of documents.

       ODPAIR  *odpairsnotand(ODPAIR  *apairs,	int  anum, ODPAIR *bpairs, int
       bnum, int *np);
	      `apairs' specifies the pointer to	 the  former  document	array.
	      `anum'  specifies	the number of the elements of the former docu-
	      ment array.  `bpairs' specifies the pointer to the latter	 docu-
	      ment  array of the sum of	elements.  `bnum' specifies the	number
	      of the elements of the latter document  array.   `np'  specifies
	      the pointer to a variable	to which the number of the elements of
	      the return value is assigned.  The return	value is  the  pointer
	      to  a new	document array whose elements belong to	the former set
	      but not to the latter set.  Elements of the array	are sorted  in
	      descending order of their	scores.	 Because the region of the re-
	      turn value is allocated with the `malloc'	call, it should	be re-
	      leased with the `free' call if it	is no longer in	use.

       The  function `odpairssort' is used in order to sort a set of documents
       in descending order of scores.

       void odpairssort(ODPAIR *pairs, int pnum);
	      `pairs' specifies	the pointer to a document array.  `pnum' spec-
	      ifies the	number of the elements of the document array.

       The  function  `odlogarithm'  is	used in	order to get the natural loga-
       rithm of	a number.

       double odlogarithm(double x);
	      `x' specifies a number.  The return value	is the	natural	 loga-
	      rithm  of	 the  number.	If the number is equal to or less than
	      1.0, the return value is 0.0.  This function is useful  when  an
	      application calculates the IDF of	search results.

       The function `odvectorcosine' is	used in	order to get the cosine	of the
       angle of	two vectors.

       double odvectorcosine(const int *avec, const int	*bvec, int vnum);
	      `avec' specifies the pointer to one array	 of  numbers.	`bvec'
	      specifies	 the  pointer  to  the other array of numbers.	`vnum'
	      specifies	the number of elements	of  each  array.   The	return
	      value  is	the cosine of the angle	of two vectors.	 This function
	      is useful	when an	application  calculates	 similarity  of	 docu-
	      ments.

       The  function  `odsettuning'  is	used in	order to set the global	tuning
       parameters.

       void odsettuning(int ibnum, int idnum, int cbnum, int csiz);
	      `ibnum' specifies	the number of buckets  for  inverted  indexes.
	      `idnum'  specifies  the division number of inverted index.  `cb-
	      num' specifies the number	of buckets for dirty buffers.	`csiz'
	      specifies	 the  maximum  bytes  to use memory for	dirty buffers.
	      The default setting  is  equivalent  to  `odsettuning(32749,  7,
	      262139,  8388608)'.  This	function should	be called before open-
	      ing a handle.

       The function `odanalyzetext' is used in order  to  break	 a  text  into
       words and store appearance forms	and normalized form into lists.

       void  odanalyzetext(ODEUM *odeum, const char *text, CBLIST *awords, CB-
       LIST *nwords);
	      `odeum' specifies	 a  database  handle.	`text'	specifies  the
	      string  of  a text.  `awords' specifies a	list handle into which
	      appearance form is store.	 `nwords' specifies a list handle into
	      which normalized form is store.  If it is	`NULL',	it is ignored.
	      Words are	separated with space characters	and such delimiters as
	      period, comma and	so on.

       The  function  `odsetcharclass'	is used	in order to set	the classes of
       characters used by `odanalyzetext'.

       void odsetcharclass(ODEUM *odeum, const char  *spacechars,  const  char
       *delimchars, const char *gluechars);
	      `odeum'  specifies  a database handle.  `spacechars' spacifies a
	      string contains  space  characters.   `delimchars'  spacifies  a
	      string  contains	delimiter characters.  `gluechars' spacifies a
	      string contains glue characters.

       The function `odquery' is used in order to query	 a  database  using  a
       small boolean query language.

       ODPAIR  *odquery(ODEUM  *odeum, const char *query, int *np, CBLIST *er-
       rors);
	      `odeum' specifies	a database handle.  'query' specifies the text
	      of the query.  `np' specifies the	pointer	to a variable to which
	      the number of the	elements of  the  return  value	 is  assigned.
	      `errors'	specifies  a list handle into which error messages are
	      stored.  If it is	`NULL',	it is ignored.	If successful, the re-
	      turn value is the	pointer	to an array, else, it is `NULL'.  Each
	      element of the array is a	pair of	the ID number and the score of
	      a	 document,  and	 sorted	 in  descending	order of their scores.
	      Even if no document corresponds to the specified	condition,  it
	      is  not error but	returns	an dummy array.	 Because the region of
	      the return value is allocated with the `malloc' call, it	should
	      be  released  with  the  `free'  call if it is no	longer in use.
	      Note that	each element of	the array of the return	value  can  be
	      data of a	deleted	document.

       If  QDBM	 was  built  with  POSIX  thread  enabled, the global variable
       `dpecode' is treated as thread specific data, and  functions  of	 Odeum
       are  reentrant.	In that	case, they are thread-safe as long as a	handle
       is not accessed by threads at the same time,  on	 the  assumption  that
       `errno',	`malloc', and so on are	thread-safe.

       If  QDBM	was built with ZLIB enabled, records in	the database for docu-
       ment attributes are compressed.	In that	case, the size of the database
       is  reduced  to	30%  or	less.  Thus, you should	enable ZLIB if you use
       Odeum.  A database of Odeum created without ZLIB	enabled	is not	avail-
       able on environment with	ZLIB enabled, and vice versa.  If ZLIB was not
       enabled but LZO,	LZO is used instead.

       The query language of the function `odquery' is a basic	language  fol-
       lowing this grammar:

	      expr ::= subexpr ( op subexpr )*
	      subexpr ::= WORD
	      subexpr ::= LPAREN expr RPAREN

       Operators  are  "&"  (AND),  "|"	 (OR),	and "!"	(NOTAND).  You can use
       parenthesis to group sub-expressions together in	order to change	 order
       of operations.  The given query is broken up using the function `odana-
       lyzetext', so if	you want to specify  different	text  breaking	rules,
       then  make sure that you	at least set "&", "|", "!", "(", and ")" to be
       delimiter characters.  Consecutive words	are treated as having  an  im-
       plicit  "&"  operator  between  them,  so "zed shaw" is actually	"zed &
       shaw".

       The encoding of the query text should be	the same with the encoding  of
       target  documents.  Moreover, each of space characters, delimiter char-
       acters, and glue	characters should be single byte.

SEE ALSO
       qdbm(3),	depot(3), curia(3), relic(3),  hovel(3),  cabin(3),  villa(3),
       ndbm(3),	gdbm(3)

Man Page			  2004-04-22			      ODEUM(3)

NAME | SYNOPSIS | DESCRIPTION | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=odeum&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help