Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
SWISH-LIBRARY(1)	     SWISH-E Documentation	      SWISH-LIBRARY(1)

NAME
       SWISH-LIBRARY - Interface to the	Swish-e	C library

OVERVIEW
       The C library in	an interface to	the Swish-e search code.  It provides
       a way to	embed Swish-e into your	applications.  This API	is based on
       Swish-e version 2.3.

       Note: This is a NEW API as of Swish-e version 2.3.  The C language in-
       terface has changed as has the perl interface to	Swish-e.  The new Perl
       interface is the	SWISH::API module and is included with the Swish-e
       distribution.  The old SWISHE perl module has been rewritten to work
       with the	new API.  The SWISHE perl module is no longer included with
       the Swish-e distribution, but can be downloaded from the	Swish-e	web
       site.

       The advantage of	the library is that the	index files or files can be
       opened one time and many	queries	made on	the open index.	 This saves
       the startup time	required to fork and run the swish-e binary, and the
       expensive time of opening up the	index file.  Some benchmarks have
       shown a three fold increase in speed.

       The downside is that your program now has more code and data in it (the
       index tables can	use quite a bit	of memory), and	if a fatal error hap-
       pens in swish it	will bring down	your program.  These are things	to
       think about, especially if embedding swish into a web server such as
       Apache where there are many processes serving requests.

       The best	way to learn about the library is to look at two files in-
       cluded with the Swish-e distribution that make use of the library.

       src/libtest.c
	   This	file gives a basic overview of linking a C program with	the
	   Swish-e library.  Not all available functions are used in that ex-
	   ample, but it should	give you a good	overview of building a C pro-
	   gram	with swish-e.

	   To build and	run libtest chdir to the src directory and run the
	   commands:

	       $ make libtest
	       $ ./libtest [optional name of index file]

	   You will be prompted	for the	search words.  The default index used
	   is index.swish-e.  This can be overridden by	placing	a list of in-
	   dex files in	a quote-protected string.

	       $ ./libtest 'index1 index2 index3'

       perl/API.xs
	   The API.xs file is a	Perl "xsub" interface to the C library and is
	   part	of the SWISH::API Perl module.	This is	an object-oriented in-
	   terface to the Swish-e library and demonstrates how the various
	   search "objects" are	created	by C calls and how they	are destroyed
	   when	no longer needed.

Installing the Swish-e library
       The Swish-e library is installed	when you run "make install" when
       building	Swish-e.  No extra installation	steps are required.

       The library consists of a header	file "swish-e.h" and a library "lib-
       swish-e.*" that can either be a static or shared	library	depending on
       your platform.

Library	Overview
       When you	first attach to	an index file (or index	files) you are re-
       turned a	"swish handle".	 From the handle you create one	or more
       "search objects"	which holds the	parameters to query the	index, such as
       the query string, sort order, search phrase delimiter, limit parameters
       and HTML	structure bits.	 The "object" is really	just a pointer to a C
       structure, but it's helpful to think of it as an	object that data and
       functionality associated	with it.

       The search object is used to query the index.  A	query returns a	"re-
       sults object".  The results object holds	the number of hits, the	parsed
       query per index,	and the	result set.  The results object	keeps track of
       the current position in the result set.	You may	"seek" to a specific
       record within the result	set (useful for	displaying a page of results).

       Finally,	a result object	represents a single result from	the result
       list.  A	result object provides access to the result's properties (such
       as file name, rank, etc.).

       In addition to results, there are functions available to	access the
       header values stored in the index file, functions to check and report
       errors, and a few utility functions.

Available Functions
       Below is	the list of available function included	in the Swish-e C lan-
       guage API.

       These functions (and typedefs) are defined in the swish-e.h header
       file.  The common objects (e.g. structures) used	are:

	   SW_HANDLE  -	swish handle that associates with an index file
	   SW_SEARCH  -	search "object"	that holds search parameters
	   SW_RESULTS -	results	"object" that holds a result set
	   SW_RESULT  -	a single result	used for accessing the result's	properties
	   SW_FUZZYWORD	- used for fuzzy (stemming) word conversion

       Searching

       SW_HANDLE SwishInit(char	*IndexFiles);
	   This	functions opens	and reads the header info of the index files
	   included in IndexFiles string.  The string should contain a space-
	   separated list of index files.

	       SW_HANDLE myhandle;
	       myhandle	= SwishInit("file1.idx");

	   Typically you will open a handle at the beginning of	your program
	   and use it to make multiple queries on an index.

	   This	function will always return a swish handle.  You must check
	   for errors, and on error free the memory used by the	handle,	or
	   abort.

	   Here's an example of	aborting:

	       SW_HANDLE swish_handle;
	       swish_handle = SwishInit("file1.idx file2.idx");
	       if ( SwishError(	swish_handle ) )
		   SwishAbortLastError(	swish_handle );

	   And here's an example of catching the error:

	       SW_HANDLE swish_handle;
	       swish_handle = SwishInit("file1.idx file2.idx");
	       if ( SwishError(	swish_handle ) )
	       {
		   printf("Failed to connect to	swish. %s\n", SwishErrorString(	swish_handle ) );
		   SwishClose( swish_handle );	/* free	the memory used	*/
		   return 0;
	       }

	   You may have	more than one handle active at a time.

	   Swish-e will	not tell you if	the index file changes on disk (such
	   as after reindexing).  In a persistent environment (e.g. mod_perl)
	   the calling program should check to see if the index	file has
	   changed on disk.  A common way to do	this is	to store the inode
	   number before opening the index file(s), and	then stat the file
	   name	every so often and reopen the index files if the inode number
	   changes.

       void SwishClose(SW_HANDLE handle);
	   This	function closes	and frees the memory of	a Swish	handle.	 Every
	   swish handle	should be freed	when done searching the	index.	Fail-
	   ing to close	the handle will	result in a memory leak.

       SW_SEARCH New_Search_Object(SW_HANDLE handle, const char	*query);
	   Returns a new search	"object".  The search object holds the parame-
	   ters	used for searching an index.  A	single search object can be
	   used	to query the index multiple times.  The	available settings
	   listed below	are "sticky" in	that they remain set on	the search ob-
	   ject	until change.

       int SwishGetStructure( SW_SEARCH	srch );
	   Returns the "structure" flag	of the search object passed or 0 if
	   the search object is	NULL.

       void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );
	   Sets	the phrase delimiter character.	 The default is	double-quotes.

       char SwishGetPhraseDelimiter( SW_SEARCH srch );
	   Returns the phrase delimiter	character used in the search object or
	   0 if	the search object is NULL.

       void SwishSetStructure( SW_SEARCH srch, int structure );
	   Sets	the "structure"	flag in	the search object.  The	structure flag
	   is used to limit searches to	parts of HTML files (such as to	the
	   title or headers).  The default is to not limit.  This provides the
	   functionality of the	-H command line	switch.

       void SwishPhraseDelimiter( SW_SEARCH srch, char delimiter );
	   Sets	the phrase delimiter character.	 The default is	double-quotes.

       void SwishSetSort( SW_SEARCH srch, char *sort );
	   Sets	the sort order of the results.	This is	the same as the	-s
	   switch used with the	swish-e	binary.

       void SwishSetQuery( SW_SEARCH srch, char	*query );
	   Sets	the query string in the	search object.	This typically is not
	   needed since	it can be set when creating the	search object or when
	   executing a query.

       void SwishSetSearchLimit( SW_SEARCH srch, char *propertyname, char
       *low, char *hi);
	   Sets	the limit parameters for a search.  Provides the same func-
	   tionality as	the -L command line switch.  You may specify a range
	   of property values that search results must be within.  You may
	   call	SwishSetSearchLimit() only one time for	each property (but can
	   set limits on more than one property	at a time).

	   Unlike the other settings on	the search object, once	you run	a
	   query on the	search object you must call SwishResetSearchLimit() to
	   change or clear the limit parameters.

       void SwishResetSearchLimit( SW_SEARCH srch );
	   Resets the limits set on a search object set	by SwishSetSearch-
	   Limit().

       void Free_Search_Object(	SW_SEARCH srch );
	   Frees the search object.  This must be called when done with	the
	   search object.  Generally, you can reuse a search object for	multi-
	   ple queries so typically you	would call this	right before calling
	   SwishClose().

	   You may free	the search object before freeing and generated results
	   objects.

       SW_RESULTS SwishExecute(	SW_SEARCH search, const	char *query);
	   Searches the	index or indexes based on the parameters in the	search
	   object.  Returns a results object.  See below for functions to ac-
	   cess	the data stored	in the results object.

	   You should always check for errors after calling SwishExecute().

       SW_RESULTS SwishQuery(SW_HANDLE,	const char *words );
	   This	is a short-cut function	that bypasses the creation of a	search
	   object (actually, bypasses the need to create and free a search ob-
	   ject).  This	only allows passing in a query string; other search
	   parameters cannot be	set.  The results are sorted by	rank.

	   You should always check for errors after calling SwishQuery().

       Reading Results

       int SwishHits( SW_RESULTS results );
	   Returns the number of results in the	results	object.

       SWISH_HEADER_VALUE SwishParsedWords( SW_RESULTS,	const char *index_name
       );
	   Returns the tokenized query.	 Words are split by WordCharacters and
	   stopwords are removed.  The parsed words are	useful for highlight-
	   ing search terms in your program.

	   The "index_name" is the name	of the index supplied in the
	   SwishInit() function	call.

	   Returns a SWISH_HEADER_VALUE	union of type SWISH_LIST which is a
	   char	**.  See src/libtest.c for an example of accessing the strings
	   in this list, but in	general	you may	cast this to a (char **).

       SWISH_HEADER_VALUE SwishRemovedStopwords( SW_RESULTS, const char	*in-
       dex_name	);
	   Returns a list of stopwords removed from the	input query.

	   Returns a SWISH_HEADER_VALUE	union of type SWISH_LIST which is a
	   char	**.  See src/libtest.c for an example of accessing the strings
	   in this list, but in	general	you may	cast this to a (char **).

       int SwishSeekResult( SW_RESULTS,	int position );
	   Sets	the current seek position in the list of results, with posi-
	   tion	zero being the first record (unlike -b where one is the	first
	   result).

	   Returns the position	or a negative number on	error.

       SW_RESULT SwishNextResult( SW_RESULTS );
	   Returns the next result, or NULL if not more	results	are available.

	   The result object returned does not need to be freed	after use (un-
	   like	the swish handle, search object, and results object).

       const char *SwishResultPropertyStr(SW_RESULT, char *propertyname);
	   This	function is mostly useful for testing as it returns odd	re-
	   sults on errors.

	   Aborts if called with a NULL	SW_RESULT object

	   Returns a string value of the specified property.

	   Returns the empty string "" if the current result does not have the
	   specified property assigned.

	   Returns the string "(null)" on invalid property name	(i.e. property
	   name	is not defined in the index) and sets an error (see below) in-
	   dicating the	invalid	property name.

	   The string returned does not	need to	be freed, but is only valid
	   for the current result.  If you wish	to save	the string you must
	   copy	it locally.

	   Dates are formatted using the hard-coded format string: "%Y-%m-%d
	   %H:%M:%S" in	localtime.

       unsigned	long SwishResultPropertyULong(SW_RESULT	r, char	*property-
       name);
	   Returns a numeric property as an unsigned long.  Numeric properties
	   are used for	both PropertyNamesNumeric and PropertyNamesDate	type
	   of properties.  Dates are returned as a unix	timestamp as reported
	   by the system when the index	was created.

	   Swish-e will	abort if called	with a NULL SW_RESULT object.  Without
	   the SW_RESULT object	swish-e	cannot set any error codes.

	   On error returns UMAX_LONG.	This is	commonly defined in limits.h.
	   Check SwishError() (see below) for the type of error.

	   If SwishError() returns false (zero)	then it	simply means that this
	   result does not have	any data for the specified property.

	   If SwishError() returns true	(non-zero) then	either the property-
	   name	specified is invalid, or the property requested	is not a nu-
	   meric (or date) property (e.g. it's a string	property).

	   See below on	how to fetch the specific error	message	when SwishEr-
	   ror() is true.

       PropValue *getResultPropValue (SW_RESULT	r, char	*propertyname, int ID
       );
	   This	is a low-level function	to fetch a property regardless of
	   type.  This is likely the best function for accessing properties.

	   Swish-e will	abort if called	with a NULL SW_RESULT object.  Proper-
	   tyname is the name of the property.	ID is the id number of the
	   property, if	known.	ID is not normally used	in the API, but	it's
	   purpose is to avoid looking up the property ID for every result
	   displayed.

	   The return PropValue	is a structure that contains a flag to indi-
	   cate	the type, and a	union that holds the property value.  They
	   flags and structure are defined in swish-e.h.

	   The property	must be	copied locally and the returned	"PropValue"
	   value must be freed by calling freeResultPropValue()	to avoid a
	   memory leak.

	   On error returns NULL.  Check SwishError() (see below) for the type
	   of error.

	   If returns NULL but SwishError() returns false (zero) then it sim-
	   ply means that this result does not have any	data for the specified
	   property.

	   If SwishError() returns true	(non-zero) then	the property name
	   specified is	invalid	(i.e. not defined for the index).

	   See below on	how to fetch the specific error	message	when SwishEr-
	   ror() is true.

	   See perl/API.xs for an example on using this	function.

       void freeResultPropValue(void)
	   Frees the "PropValue" returned after	calling	getResultPropValue().

       void Free_Results_Object( SW_RESULTS results );
	   Frees the results object (frees the result set).  This must be
	   called when done reading the	results	and before calling Swish-
	   Close().

       Accessing the Index Header Values

       Each index file has associated header values that describe the index.
       These functions provide access to this data.  The header	data is	re-
       turned as a union SWISH_HEADER_VALUE, and a pointer to a
       SWISH_HEADER_TYPE is passed in and the returned value indicates the
       type of data that is returned.  See src/libtest.c and perl/API.xs for
       examples.

       const char **SwishHeaderNames( SW_HANDLE	);
	   Returns the list of possible	header names.  This list is the	same
	   for all index files of a given version of Swish-e.  It provides a
	   way to gain access to all headers without having to list them in
	   your	program.

       const char **SwishIndexNames( SW_HANDLE );
	   Returns a list of index files opened.  This is just the list	of in-
	   dex files specified in the SwishInit() call.	 You need the name of
	   the index file to access a specific index's header values.

       SWISH_HEADER_VALUE SwishHeaderValue( SW_HANDLE, const char *index_name,
       const  char *cur_header,	SWISH_HEADER_TYPE *type	);
	   Fetches the header value for	the given index	file, and the header
	   name.  The call sets	the "type" passed in to	the type of value re-
	   turned.

	   See src/libtest.c and perl/API.xs for examples.

       SWISH_HEADER_VALUE SwishResultIndexValue( SW_RESULT, const char *name,
       SWISH_HEADER_TYPE *type );
	   This	is like	SwishHeaderValue() above, but instead of supplying an
	   index file name and a swish handle, supply a	result object and the
	   header value	is fetched from	the result's related index file.

       Accessing Property Meta Data

       In addition to the pre-defined standard properties, you have the	option
       of adding additional "meta" properties to be indexed and/or added to
       the list	of properties returned with each result.  Consult the sections
       on the MetaNames	and PropteryNames directives in	the CONFIGURATION FILE
       for an explanation of how to do this.

       These functions provide access to the meta data stored in an index.
       You can use them	to determine what meta/property	information is avail-
       able for	an index including all the pre-defined standard	properties.
       See libtest.c for an example.

       SWISH_META_LIST SwishMetaList( SW_HANDLE, const char *index_name	);
	   Returns the list of meta entries for	the given index	file  as a
	   null-terminated array of SW_META objects.  Use the functions	below
	   to extract specific fields from the SW_META structure.  Meta's are
	   distinct from properties.

       SWISH_META_LIST SwishPropertyList( SW_HANDLE, const char	*index_name );
	   This	function is the	same as	SwishMetaList()	but it returns an ar-
	   ray of properties as	opposed	to meta	objects.  Property attributes
	   can be extracted in the same	was as meta objects using the func-
	   tions below.

       SWISH_META_LIST SwishResultMetaList( SW_RESULT );
	   This	is like	SwishMetaList()	above but determines the index to use
	   from	a result object.

       SWISH_META_LIST SwishResultPropertyList(	SW_RESULT );
	   This	is like	SwishPropertyList() above but like SwishResultMetaL-
	   ist() uses a	result object instead of an index name.

       const char *SwishMetaName( SW_META );
	   Given a SW_META object returned by one of the above,	this function
	   will	return the meta/property's name.  You can use this name	to ac-
	   cess	a property's value for a given as described above.

       int SwishMetaType( SW_META );
	   Get the data	type for the given meta/property. Known	types are
	   listed in swish-e.h

       SwishMetaID( SW_META );
	   Get the internal ID number for the given meta/property.  These id's
	   are unique per index	file but are not unique	per results.

       Checking	for Errors

       You should check	for errors after all calls.  The last error is stored
       in the swish handle object, and is only valid until the next operation
       (which resets the error flags).

       Currently, some errors are flagged as "critical"	errors.	 In these
       cases you should	destroy	(by calling the	SwishClose() function )	the
       current swish handle.  If you have other	objects	in scope (e.g. a
       search object or	results	object)	destroy	those first.

       The types of errors that	are critical can be seen in src/error.c.  Cur-
       rently the list includes:

	   Could not open index	file
	   Unknown index file format
	   Index file(s) is empty
	   Index file error
	   Invalid swish handle
	   Invalid results object

       int  SwishError(	SW_HANDLE );
	   This	returns	true if	an error condition exists.  It returns the er-
	   ror number, which is	a integer less than zero on error.  This
	   should be checked before calling any	of the other error functions
	   below.

       const char *SwishErrorString( SW_HANDLE );
	   This	returns	a general text description of the current error.

       const char *SwishLastErrorMsg( SW_HANDLE	);
	   In some cases this will return a string with	specifics about	the
	   current error.  For example,	SwishErrorString() may return "Unknown
	   metaname", but SwishLastErrorMsg() will return a string with	the
	   name	of the unknown metaname.

       int  SwishCriticalError(	SW_HANDLE );
	   Returns true	if the current error condition is a critical error.
	   On critical errors you should free up any current objects and call
	   SwishClose()	as swish may be	in an unstable state.

       void SwishAbortLastError( SW_HANDLE );
	   This	is a convenience  function that	will format and	print the last
	   error message, and then abort the program.

       void set_error_handle( FILE *where );
	   Sets	where errors and warnings are printed (when printed by swish).
	   For historical reasons, when	swish-e	first starts up	errors and
	   warnings are	sent to	stdout.

       void SwishErrorsToStderr( void );
	   A convenience method	to send	errors to stderr instead of stdout.

       Utility Functions

       const char *SwishWordsByLetter(SWISH * sw, char *indexname, char	c);
	   Returns all the words in the	index "indexname" that begin with the
	   letter passed in.  Returns NULL if the name of the index file is
	   invalid.

	   This	fuction	may change in the future since only 8-bit chars	can
	   currently be	used.

       char * SwsishStemWord( SW_HANDLE	sw, char *in_word );
	   Deprecated

	   This	can be used to convert a word to its stem.  It uses only the
	   original Porter Stemmer.

       SW_FUZZYWORD SwishFuzzyWord( SW_RESULT r, char *word );
	   Stems "word"	based on the fuzzy mode	selected during	indexing.

	   The fuzzy mode used during indexing is stored in the	index file.
	   Since each result is	linked to a given index	file this method al-
	   lows	stemming a word	based on it's index file.

	   One possible	use for	this is	to highlight search terms in a docu-
	   ment	summary, which would be	based on a given result.

	   The methods below can be used to access the data returned.  The
	   SW_FUZZYWORD	object must be freed when done to avoid	a memory leak.

       const char **SwishFuzzyWordList(	SW_FUZZYWORD fw	);
	   Returns a null terminated list of strings returned by the stemmer.
	   In most cases this will be a	single string.

	   Here's an example:

	       SW_FYZZYWORD fuzzy_word = SwishFuzzyWord( result	);
	       const char **word_list =	SwishFuzzyWordList( fuzzy_word );
	       while ( *word_list )
	       {
		   printf("%s\n", *word_list );
		   word_list++;
	       }
	       SwishFuzzyWordFree( fuzzy_word );

	   If the stemmer does not convert the string (for example attempting
	   to stem numeric data) the word_list will contain the	original word.
	   To tell if the stemmer actually stemmed the word check the return
	   value with SwishFuzzyWordError().

       int SwishFuzzyWordError(	SW_FUZZYWORD fw	);
	   This	returns	zero if	the stemming operation was sucessfull, other-
	   wise	it returns a value indicating the reason the word was not
	   stemmed.  The return	values are defined in the swish-e src/stem-
	   mer.h file.

	   Not all stemmers set	this value correctly.  But since SwishFuzzy-
	   WordList() will return a valid string regardless of the return
	   value, you can often	just ignore this setting.  That's what I do.

       int SwishFuzzyWordCount(	SW_FUZZYWORD fw	);
	   Returns the count of	string in the word list	available by calling
	   SwishFuzzyWordList().

	   This	is normally just one, but in the case of DoubleMetaphone it
	   can be one or two (i.e. DoubleMetaphone can return one or two
	   strings).

       const char *SwishFuzzyMode( SW_RESULT r );
	   Returns the name of the stemmer used	for the	given result (which is
	   related to an index).

       void SwishFuzzyWordFree(	SW_FUZZYWORD fw	);
	   Frees the memory used by  the SW_FUZZYWORD.

Bug-Reports
       Please report bug reports to the	Swish-e	discussion group.  Feel	also
       free to improve or enhance this feature.

Author
       Original	interface: Aug 2000 Jose Ruiz jmruiz@boe.es

       Updated:	Aug 22,	2002 - Bill Moseley

       Interface redesigned for	Swish-e	version	2.3 Oct	17, 2002 - Bill	Mose-
       ley

Document Info
       $Id: SWISH-LIBRARY.pod 1906 2007-02-07 19:25:16Z	moseley	$

       .

2.4.7				  2009-04-04		      SWISH-LIBRARY(1)

NAME | OVERVIEW | Installing the Swish-e library | Library Overview | Available Functions | Bug-Reports | Author | Document Info

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=SWISH-LIBRARY&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help