Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Search::InvertedIndex(User Contributed Perl DocumentatSearch::InvertedIndex(3)

NAME
       Search::InvertedIndex - A manager for inverted index maps

SYNOPSIS
	  use Search::InvertedIndex;

	  my $database = Search::InvertedIndex::DB::DB_File_SplitHash->new({
				 -map_name => '/www/search-engine/databases/test-maps/test',
					-multi => 4,
				-file_mode => 0644,
				-lock_mode => 'EX',
			 -lock_timeout => 30,
		       -blocking_locks => 0,
				-cachesize => 1000000,
			-write_through => 0,
		  -read_write_mode => 'RDWR';
			});

	  my $inv_map =	Search::Inverted->new({	-database => $database });

	##########################################################
	# Example Update
	##########################################################

	  my $index_data = "Some scalar	- complex structure refs are ok";

	  my $update = Search::InvertedIndex::Update->new({
								   -group => 'keywords',
								   -index => 'http://www.nihongo.org/',
								    -data => $index_data,
								    -keys => {
										'some' => 10,
									      'scalar' => 20,
									     'complex' => 15,
									   'structure' => 15,
										'refs' => 15,
										 'are' => 15,
										  'ok' => 15,
									     },
									     });
	  my $result = $inv_map->update({ -update => $update });

	##########################################################
	# Example Query
	# '-nodes' is an anon list of Search::InvertedIndex::Query
	# objects (this	allows constructing complex booleans by
	# nesting).
	#
	# '-leafs' is an anon list of Search::InvertedIndex::Query::Leaf
	# objects (used	for individual search terms).
	#
	##########################################################

	  my $query_leaf1 = Search::InvertedIndex::Query::Leaf->new({
									      -key => 'complex',
									    -group => 'keywords',
									   -weight => 1,
									   });

	  my $query_leaf2 = Search::InvertedIndex::Query::Leaf->new({
									      -key => 'structure',
									    -group => 'keywords',
									   -weight => 1,
									   });
	  my $query_leaf3 = Search::InvertedIndex::Query::Leaf->new({
									      -key => 'gold',
									    -group => 'keywords',
									   -weight => 1,
									   });
	  my $query1 = Search::InvertedIndex::Query->new({
						 -logic	=> 'and',
						-weight	=> 1,
						 -nodes	=> [],
						 -leafs	=> [$query_leaf1,$query_leaf2],
					  });
	  my $query2 = Search::InvertedIndex::Query->new({
						 -logic	=> 'or',
						-weight	=> 1,
						 -nodes	=> [$query1],
						 -leafs	=> [$query_leaf3],
					  });

	  my $result = $inv_map->search({ -query => $query2 });

	##########################################################

	  $inv_map->close;

DESCRIPTION
       Provides	the core of an inverted	map based search engine. By mapping
       'keys' to 'indexes' it provides ultra-fast look ups of all 'indexes'
       containing specific 'keys'. This	produces highly	scalable behavior
       where thousands,	or even	millions of records can	be searched extremely
       quickly.

       Available database drivers are:

	Search::InvertedIndex::DB::DB_File_SplitHash
	Search::InvertedIndex::DB::Mysql

       Check the POD documentation for each database driver to determine
       initialization requirements.

CHANGES
	1.00 1999.06.16	- Initial release

	1.01 1999.06.17	- Documentation	fixes and fix to 'close' method	in
					  Search::InvertedIndex::DB::DB_File_SplitHash

	1.02 1999.06.18	- Major	bugfix to locking system.
					  Performance tweaking.	Roughly	3x improvement.

	1.03 1999.06.30	- Documentation	fixes.

	1.04 1999.07.01	- Documentation	fixes and caching system bugfixes.

	1.05 1999.10.20	- Altered ranking computation on search	results

	1.06 1999.10.20	- Removed 'use attrs' usage to improve portability

	1.07 1999.11.09	- "Cosmetic" changes to	avoid warnings in Perl 5.004

	1.08 2000.01.25	- Bugfix to 'Search::InvertedIndex::DB:DB_File_SplitHash' submodule
					  and documentation additions/fixes

	1.09 2000.03.23	- Bugfix to 'Search::InvertedIndex::DB:DB_File_SplitHash' submodule
					  to manage case where 'open' is not performed before close is called.

	1.10 2000.07.05	- Delayed loading of serializer	and added option to select
					  which	serializer (Storable or	Data::Dumper) to use at	instance 'new' time.
					  This should allow module to be loaded	by mod_perl via	the 'PerlModule'
					  conf directive and enable use	on platforms that do not support
					  'Storable' (such as Macintosh).

	1.11 2000.11.29	- Added	'Search::InvertedIndex::DB::Mysql' (authored by
					  Michael Cramer <cramer@webkist.com>) database	driver
					  to package.

	1.12 2002.04.09	- Squashed bug in removal of an	index from a group when	the index doesn't
					  exist	in that	group that caused index	counts for the group to	be decremented
					  in error.

	1.13 2003.09.28	- Interim release. Fixed false error return from 'first_key_in_group' for a group
			  that has not yet had any keys	set.  Tightened	calling
			  parm parses. Tweaked performance of preload updating code.
			  Added	taint fix for stringifier identifier.
			  This release was driven by the taint issue and code bug as crisis items.
			  Hopefully a 1.14 release will	be in the not too distant future.

	1.14 2003.11.14	- Patch	to the MySQL database driver to	accomodate changes in DBD::mysql.
			  Addition of a	test for MySQL functionality. Patch and	test thanks to
			  Kate L Pugh.

   Public API
       "new({ -database	=> $database_object [,'-search_cache_size' => 1000,
       -search_cache_dir => '/var/tmp/search_cache', -stringifier =>
       ['Storable','Data::Dumper'],  ] });"
	   Provides the	interface for obtaining	a new Search::InvertedIndex
	   object for manipulating a inverted database.

	   Example 1:

	    my $database = Search::InvertedIndex::DB::DB_File_SplitHash->new({
				    -map_name => '/www/databases/test-map_names/test',
					   -multi => 4,
				   -file_mode => 0644,
				   -lock_mode => 'EX',
			    -lock_timeout => 30,
		      -blocking_locks => 0,
				   -cachesize => 1000000,
			   -write_through => 0,
		     -read_write_mode => 'RDONLY',
			   });

	    my $inv_map	= Search::InvertedIndex->new({
					   '-database' => $database,
		      '-search_cache_size' => 1000,
			   '-search_cache_dir' => '/var/tmp/search_cache',
				      -stringifier => ['Storable','Data::Dumper'],
		    });

	   Parameter explanations:

	     -database		- A database interface object. Defined database	interfaces
						      are currently Search::InvertedIndex::DB::DB_File_SplitHash
						      and Search::InvertedIndex::DB::Mysql. (Required)

	     -stringifier	- Declares the stringifier used	to store information in	the
						      underlaying database. Currently defined stringifiers are
						      'Storable' and 'Data::Dumper'. The default is to use
						      'Storable' with fallback to 'Data::Dumper'. (Optional)

	     -search_cache_size	- Sets the number of cached searched to	hold in	the search cache (Optional)

	     -search_cache_dir	- Sets the directory to	be used	for the	search cache
						      (Required	if search_cache_size is	set to something other than 0)

	   The -database parameter is required and must	be a
	   'Search::InvertedIndex::DB::...'  type database object. The other
	   two parameters are optional and define the location and size	of the
	   search cache. If omitted, no	search caching will be done.

	   The optional	'-stringifier' parameter can be	used to	override the
	   default use of 'Storable' (with fallback to 'Data::Dumper') as the
	   stringifier used for	storing	data by	the module. Specifiying
	   -stringifier	=> 'Data::Dumper' would	specify	using 'Data::Dumper'
	   (only) as the stringifier while specifiying -stringifier =>
	   ['Data::Dumper','Storable'] would specify to	use Data::Dumper by
	   preference (but to fall back	to 'Storable' if Data::Dumper was not
	   available). If a database was created using a particular
	   serializer, it will automatically detect it and attempt to use the
	   correct one.

       "lock({ -lock_mode =" 'EX|SH|UN'	[, -lock_timeout => 30]	[,
       -blocking_locks => 0] });>
	   Changes a lock on the underlaying database.

	   Forces 'sync' if the	stat is	changed	from 'EX' to a lower lock
	   state (i.e. 'SH' or 'UN'). Croaks on	errors.

	   Example:

		   $inv->lock({	-lock_mode => 'EX' [, -lock_timeout => 30] [, -blocking_locks => 0],
			     });

	   The only _required_ parameter is the	-lock_mode. The	other
	   parameters can be inherited from the	object state. If the other
	   parameters are used,	they change the	object state to	match the new
	   settings.

       "status(-open|-lock_mode);"
	   Returns the requested status	line for the database. Allowed
	   requests are	'-open', and '-lock'.

	   Example 1:
	    my $status = $inv_map->status(-open); # Returns either '1' or '0'

	   Example 2:
	    my $status = $inv_map->status(-lock_mode); # Returns 'UN', 'SH' or
	   'EX'

       "update({ -update => $update });"
	   Performs an update on the map. This is designed for
	   adding/changing/deleting a bunch of related information in a	single
	   block update.  It takes a Search::InvertedIndex::Update object as
	   input. It assumes that you wish to remove all references to the
	   specified index and replace them with a new list of references. It
	   can also will update	the -data for the -index. If -data is passed
	   and the -index does not already exist, a new	index record will be
	   created. It is a fatal error	to pass	a non-existant index without a
	   -data parm to initialize it.	It is also a fatal error to pass an
	   update for a	non-existant -group.

	   Passing an empty -keys has the effect of deleting the index from
	   group (but not from the system).

	   Example:

	    my $update = Search::InvertedIndex::Update->new(...);
	    $inv_map->update({ -update => $update });

	   It is much faster to	update a index using the update	method than
	   the add_entry_to_group method in most cases because the batching of
	   changes allows for efficiency optimizations when there is more than
	   one key.

       "preload_update({ -update => $update });"
	   'preload_update' places the passed 'update' object data into	a
	   pending queue which is not reflected	in the searchable database
	   until the 'update_group' method has been called. This allows	the
	   loading process to be streamlined for maximum performance on	large
	   full	updates. This method is	not appropriate	to incremental updates
	   as the 'update_group' method	destroys the previous searchable data
	   set on execution.

	   It also places the database effectively offline during the update,
	   so this is not a suitable method for	updating a 'online' database.
	   Updates should happen on an 'offline' copy that is then swapped
	   into	place with the 'online'	database.

	   Example:

	    my $update = Search::InvertedIndex::Update->new(...);
	    $inv_map->preload_update({ -update => $update });
			   .
			   .
			   .
	    $inv_map->update_group({ -group => 'test' });

       "clear_preload_update_for_group({ -group	=> $group });"
	   This	clears all the data from the preload area for the specified
	   group.

       "update_group({ -group => $group[, -block_size => 65536]	});"
	   This	clears the specifed group and loads all	preloaded data
	   (updates batch loaded through the 'preload_update' method pending
	   finalization.

	   This	is by far the fastest way to load a large set of data into the
	   search system - but it is an	'all or	nothing' approach. No
	   'incremental' updating is possible via this interface - the
	   update_group	completely erases all previously searchable data from
	   the group and replaces it with the pending 'preload'ed data.

	   Examples:

	     $inv_map->update_group({ -group =>	'test' });

	     $inv_map->update_group({ -group =>	'test',	-block_size => 65536 });

	   -block_size determines the 'chunking	factor'	used to	limit the
	   amount of memory the	update uses (it	corresponds roughly to the
	   number of line entry	items to be processed in memory	at one time).
	   Higher '-block_size's will improve performance until	you run	out of
	   real	memory.	The default is 65536.

	   Since an exclusive lock should be held during the entire process,
	   the database	is essentially inaccessible until the update is
	   complete. It	is probably inadvisable	to use this method of updating
	   without keeping an 'online' and a seperate 'offline'	database and
	   copy	over the 'offline' to 'online' after completion	of the mass
	   update on the 'offline' database.

       "search({ -query	=> $query [,-cache => 1] });"
	   Performs a query on the map and returns the results as a
	   Search::InvertedIndex::Result object	containing the keys and
	   rankings.

	   Example:

	    my $query =	Search::InvertedIndex::Query->new(...);
	    my $result = $inv_map->search({ -query => $query });

	   Performs a complex multi-key	match search with boolean logic	and
	   optional search term	weighting.

	   The search request is formatted as follows:

	   my $result =	$inv_map->search({ -query => $query });

	   where '$query' is a Search::InvertedIndex::Query object.

	   Each	node can either	be a specific search term with an optional
	   weighting term (a Search::InvertedIndex::Query::Leaf	object)	or a
	   logic term with its own sub-branches	(a Search::Inverted::Query
	   object).

	   The weightings are applied to the returned matches for each search
	   term	by multiplication of their base	ranking	before combination
	   with	the other logic	terms.

	   This	allows recursive use of	search to resolve arbitrarily complex
	   boolean searches and	weight different search	terms.

	   The optional	-cache parameter instructs the database	to cache ( if
	   the -search_cache_dir and -search_cache_size	initialization
	   parameters are configured for use) the search and results for
	   performance on repeat searches. '1' means use the cache, '0'	means
	   do not.

       "data_for_index({ -index	=> $index });"
	   Returns the data record for the passed -index. Returns undef	if no
	   matching -index is in the system.

	   Example:
	     my	$data =	$self->data_for_index({	-index => $index });

       "clear_all;"
	   Completely clears the contents of the database and the search
	   cache.

       "clear_cache;"
	   Completely clears the contents of the search	cache.

       "close;"
	   Closes the currently	open -map and flushes all associated buffers.

       "number_of_groups;"
	   Returns the raw number of groups in the system.

	   Example: my $n = $inv_map->number_of_groups;

       "number_of_indexes;"
	   Returns the raw number of indexes in	the system.

	   Example: my $n = $inv_map->number_of_indexes;

       "number_of_keys;"
	   Returns the raw number of keys in the system.

	   Example: my $n = $inv_map->number_of_keys;

       "number_of_indexes_in_group({ -group => $group });"
	   Returns the raw number of indexes in	a specific group.

	   Example: my $n = $inv_map->number_of_indexes_in_group({ -group =>
	   $group });

       "number_of_keys_in_group({ -group => $group });"
	   Returns the raw number of keys in a specific	group.

	   Example: my $n = $inv_map->number_of_keys_in_group({	-group =>
	   $group });

       "add_group({ -group => $group });"
	   Adds	a new '-group' to the map. There is normally no	need to	call
	   this	method from outside the	module.	The addition of	new -groups is
	   done	automatically when adding new entries.

	   Example: $inv_map->add_group({ -group => $group });

	   croaks if unable to successfuly create the group for	some reason.

	   It silently eats attempts to	create an existing group.

       "add_index({ -index => $index, -data => $data });"
	   Adds	a index	entry to the system.

	   Example: $inv_map->add_index({ -index => $index, -data => $data });

	   If the 'index' is the same as an existing index, the	'-data'	for
	   that	index will be updated.

	   -data can be	pretty much any	scalar.	strings/object/hash/array
	   references are ok.  They will be transparently serialized using
	   Storable (preferred)	or Data::Dumper.

	   This	method should be called	to set the '-data' record returned by
	   searches to something useful. If you	do not,	you will have to
	   maintain the	information you	want to	show to	users seperately from
	   the main search engine core.

	   The method returns the index_enum of	the index.

       "add_index_to_group({ -group => $group, -index => $index[, -data	=>
       $data] });"
	   Adds	an index entry to a group. If the index	does not already exist
	   in the system, adds it to the system	as well.

	   Examples:

	      $inv_map->add_index_to_group({ -group => $group, '-index'	=> $index});

	      $inv_map->add_index_to_group({ -group => $group, '-index'	=> $index, -data => $data});

	   Returns the 'index_enum' for	the index record.

	   If the 'index' is the same as an existing key, the 'index_enum' of
	   the existing	index will be returned.

	   There is normally no	need to	call this method directly. Addition of
	   index to groups is handled automatically during addition of new
	   entries.

	   It cannot be	used to	add index to non-existant groups. This is a
	   feature not a bug.

	   The -data parameter is optional

       "add_key_to_group({ -group => $group, -key => $key });"
	   Adds	a key entry to a group.

	   Example: $inv_map->_add_key({ -group	=> $group, -key	=> $key	});

	   Returns the 'key_enum' for the key record.

	   If the 'key'	is the same as an existing key,	the 'key_enum' of the
	   existing key	will be	returned.

	   There is normally no	need to	call this method directly. Addition of
	   keys	to groups is handled automatically during addition of new
	   entries.

	   It cannot be	used to	add keys to non-existant groups. This is a
	   feature not a bug.

       "add_entry_to_group({ -group => $group, -key => $key, -index => $index,
       -ranking	=> $ranking });"
	   Adds	a reference to a particular index for a	key with a ranking to
	   a specific group.

	   Example: $inv_map->add_entry_to_group({ -group => $group, -key =>
	   $key, -index	=> $index, -ranking => $ranking	});

	   This	method cannot be used to create	new -indexes or	-groups. This
	   is a	feature, not a bug.  It	*will* create new -keys	as needed.

       "remove_group({ -group => $group	});"
	   Remove all entries for a group from the map.

	   Example: $inv_map->remove_group({ -group => $group });

	   This	removes	all key	and key/index entries for the group and	all
	   other group specific	data from the map.

	   Use this method when	you wish to completely delete a	searchable
	   'group' from	the map	without	disturbing other existing groups.

       "remove_entry_from_group({ -group => $group, -key => $key, -index =>
       $index });"
	   Remove a specific key<->index entry from the	map for	a group.

	   Example: $inv_map->remove_entry_from_group({	-group => $group, -key
	   => $key, -index => $index });

	   Does	not remove the -key or -index from the database	or the group -
	   only	the entries mapping the	two to each other.

       "remove_index_from_group	({ -group => $group, -index => $index });"
	   Remove all references to a specific index for all keys for a	group.

	   Example: $inv_map->_remove_index_from_group({ -group	=> $group,
	   -index => $index });

	   Note: This *does not* remove	the index from the _system_ - just a
	   specific	   group.

	   It is a null	operation to remove an undeclared index	or to remove a
	   declared index from a group where it	is not used.

       "remove_index_from_all ({ -index	=> $index });"
	   Remove all references to a specific index from the system.

	   Example: $inv_map->_remove_index_from_all({ -index => $index	});

	   This	*completely* removes it	from all groups	and the	master system
	   entries.

	   It is a null	operation to remove an undefined index.

       "remove_key_from_group({	-group => $group, -key => $key });"
	   Remove all references to a specific key for all indexes for a
	   group.

	   Example: $inv_map->remove({ -group => $group, -key => $key });

	   Returns undef if the	key speced was not even	in database.  Returns
	   '1' if the key speced was in	the database, and has
			  been successfully deleted.

	   croaks on errors.

       "list_all_keys_in_group({ -group	=> $group });"
	   Returns an anonymous	array containing a list	of all defined keys in
	   the specified group.

	   Example:
	    $keys = $inv_map->list_all_keys_in_group({ -group => $group	});

	   Note: This can result in *HUGE* returned lists. If you have a lot
	   of records in the group, you	are better off using the iteration
	   support ('first_key_in_group', 'next_key_in_group').

       "first_key_in_group({ -group => $group_name });"
	   Returns the 'first' key in the -group based on hash ordering.

	   Returns 'undef' if there are	no keys	in the group.

	   Example: my $first_key = $inv_map->first_key_in_group({-group =>
	   $group});

       "next_key_in_group({ -group => $group, -key => $key });"
	   Returns the 'next' key in the group based on	hash ordering.

	   Returns 'undef' when	there are no more keys in the group or if the
	   passed -key is not in the group map.

	   Example: my $next_key = $inv_map->next_key_in_group({ -group	=>
	   $group, -key	=> $key	});

       "list_all_indexes_in_group({ -group => $group });"
	   Returns an anonymous	array containing a list	of all defined indexes
	   in the group

	   Example: $indexes = $inv_map->list_all_indexes_in_group({ -group =>
	   $group });

	   Note: This can result in *HUGE* returned lists. If you have a lot
	   of records in the group, you	are better off using the iteration
	   support (first_index_in_group(), next_index_in_group())

       "first_index_in_group;"
	   Returns the 'first' index in	the -group based on hash ordering.
	   Returns 'undef' if there are	no indexes in the group.

	   Example: my $first_index = $inv_map->first_index_in_group({ -group
	   => $group });

       "next_index_in_group({-group =" $group, -index => $index});>
	   Returns the 'next' index in the -group based	on hash	ordering.
	   Returns 'undef' if there are	no more	indexes.

	   Example: my $next_index = $inv_map->next_index_in_group({-group =>
	   group, -index => $index});

       "list_all_indexes;"
	   Returns an anonymous	array containing a list	of all defined indexes
	   in the map.

	   Example: $indexes = $inv_map->list_all_indexes;

	   Note: This can result in *HUGE* returned lists. If you have a lot
	   of records in the map or do not have	a lot memory, you are better
	   off using the iteration support ('first_index', 'next_index')

       "first_index;"
	   Returns the 'first' index in	the system based on hash ordering.
	   Returns 'undef' if there are	no indexes.

	   Example: my $first_index = $inv_map->first_index;

       "next_index({-index => $index});"
	   Returns the 'next' index in the system based	on hash	ordering.
	   Returns 'undef' if there are	no more	indexes.

	   Example: my $next_index = $inv_map->next_index({-index => $index});

       "list_all_groups;"
	   Returns an anonymous	array containing a list	of all defined groups
	   in the map.

	   Example: $groups = $inv_map->list_all_groups;

	   If you have a lot of	groups in the map or do	not have a lot of
	   memory, you are better off using the	iteration support
	   ('first_group', 'next_group')

       "first_group;"
	   Returns the 'first' group in	the system based on hash ordering.
	   Returns 'undef' if there are	no groups.

	   Example: my $first_group = $inv_map->first_group;

       "next_group ({-group => $group });"
	   Returns the 'next' group in the system based	on hash	ordering.
	   Returns 'undef' if there are	no more	groups.

	   Example: my $next_group = $inv_map->next_group({-group => $group});

VERSION
       1.14

COPYRIGHT
       Copyright 1999-2002, Benjamin Franz
       (<URL:http://www.nihongo.org/snowhare/>)	and FreeRun Technologies, Inc.
       (<URL:http://www.freeruntech.com/>). All	Rights Reserved.  This
       software	may be copied or redistributed under the same terms as Perl
       itelf.

AUTHOR
       Benjamin	Franz

TODO
       Integrate code and documentation	patches	from Kate Pugh.	Seperate POD
       into .pod files.

       Concept item for	evaluation: By storing a dense list of all indexed
       keywords, you would be able to use a regular expression or other	fuzzy
       search matching scheme comparatively efficiently, locate	possible words
       via a grep and then search on the possibilities.	Seems to make sense to
       implement that as _another_ module that uses this module	as a backend.
       'Search::InvertedIndex::Fuzzy' perhaps.

SEE ALSO
	Search::InvertedIndex::Query  Search::InvertedIndex::Query::Leaf
	Search::InvertedIndex::Result Search::InvertedIndex::Update
	Search::InvertedIndex::DB::DB_File_SplitHash
	Search::InvertedIndex::DB::Mysql

perl v5.32.0			  2003-11-14	      Search::InvertedIndex(3)

NAME | SYNOPSIS | DESCRIPTION | CHANGES | VERSION | COPYRIGHT | AUTHOR | TODO | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Search::InvertedIndex&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help