Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Genezzo::Index::bt2(3)User Contributed Perl DocumentatioGenezzo::Index::bt2(3)

       construct comparison/equality callbacks

	   my $cmp1 = sub
	       my ($k1,	$k2) = @_;

	       # NOTE: use "spaceship" (-1,0,1)	comparison with
	       # short-circuit OR (which returns 0 or VALUE, not 0 or 1)
	       # to perform multi-column key comparison
	       # a la Schwartzian Transform

	       return (
		       (   ($k1->[0] <=> $k2->[0])
			|| ($k1->[1] <=> $k2->[1])) == -1

	   my $eq1 = sub
	       my ($k1,	$k2) = @_;
	       return (($k1->[0] == $k2->[0])
		       && ($k1->[1] == $k2->[1])

       Genezzo::Index::bt2 - basic btree

       A btree built of	row directory blocks.

	use Genezzo::Index::bt?;

	my $tt = Genezzo::Index::btree->new();

	$tt->insert(1, "hi");
	$tt->insert(7, "there");

       This btree algorithm is a bottom-up implementation based	upon ideas
       from Chapter 16 of "Algorithms in C++ (third edition)", by Robert
       Sedgewick, 1998 and Chapter 15, "Access Paths", of "Transaction
       Processing: Concepts and	Techniques" by Jim Gray	and Andreas Reuter,
       1993.  The pedagogical examples use a fixed number of entries per node,
       or fixed-size keys in each block, but this implementation has
       significant extensions to support variable numbers of variably-sized
       keys in fixed-size disk blocks, with the	associated error handling,
       plus support for	reverse	scans.

       This package supports a constructor "new", plus standard	b-tree methods
       like insert, delete, search.

   "new" constructor
       The "new" constructor takes many	arguments, but they are	all optional.
       If none are specified, the constructor will allocate 100	blocks of the
       default size for	a b-tree.  The default assumption is to	support	scalar
       string keys with	a scalar string	values.	 The tree will have a maximum
       of 50 entries per node.

       maxsize (default	50)
	   The maximum number of entries in a node.  If	set to zero, the
	   insert will pack as many entries as space allows in each node

       numblocks (default 100)
	   The constructor will	allocate a private buffer cache	for the	b-tree
	   of up to the	number of blocks specified.  If	numblocks=0, no	cache
	   is created.	In this	case, the user must create a subclass to
	   overload the	make_new_block and getarr methods.

       blocksize (default DEFBLOCKSIZE)
	   The size of each block in the b-tree

       key_type	(null by default)
	   The key type	is either a single scalar "c" (for char) or "n"	(for
	   number), or a ref to	an array of "c"	and/or "n" values.  If
	   key_type is specified, bt2 finds or constructs the appropriate
	   compare/equals and pack/unpack functions, overriding	any user-
	   supplied arguments.	If key type is not specified, bt2 processes
	   the insert keys as a	scalar strings.

       compare,	equal (default string comparison -- ignored if key_type
       argument	specified)
	   Supply methods to compare your key.	This package contains special
	   comparison methods for numeric and multi-column keys, and their
	   associated packing functions.

       pack_fn/unpack_fn (default single scalar	key and	value -- ignored if
       key_type	specified)
	   "Packing" functions convert key/value pairs to and from a byte
	   representation which	gets stored in the nodes of the	b-tree.	 The
	   b-tree package supports scalar keys and values by default.  It also
	   contains methods for	multi-column keys with a single	value.

       use_IOT (default	off)
	   special flag	for Index Organized Tables, which means	the "value"
	   can be an array, not	a scalar.  This	approach requires  a couple
	   extra checks	in the branch nodes, since branches contain (key,
	   nodeid) pairs, and leaves contain (key, array of values).
	   Normally, indexes only have a scalar	value: a nodeid	or a rid.

       unique_key (default off)
	   Enforce uniqueness (no duplicates) at insertion time

       use_keycount (default off)
	   Special case	for building non-unique	indexes	where the "value" is
	   null	because	it is already part of the key vector.  In this usage,
	   we construct	a unique index (unique_key=1) where the	key vector is
	   the key columns *plus* the table rid, and the value is null.	 The
	   key columns might be	duplicates, but	the addition of	the rid
	   guarantees uniqueness.  The fetch is	asymmetric: the	table rid is
	   returned as both the	last key column	and the	value.

	   Q:  Why not just have a non-unique index and	store the rids as
	   regular values?  A: This approach clusters related rids, so index
	   scans are more efficient and	deletes	are easier.  Note that the
	   basic index row physical storage is unaffected.  Only the unpack
	   function needs an extra argument to describe	the number of key
	   columns.  Q:	But doesn't the	extra comparison for the rid column
	   make	inserts	more expensive?	 A: Yes, but we're trading off insert
	   performance against index scan performance.	The workload of	most
	   database applications is typically dominated	by selects, not

       hash_key/array_offset iterators:	FIRSTKEY, NEXTKEY, FETCH, plus reverse
       iterators LASTKEY, PREVKEY.
       DBI-style search	interface: SQLPrepare, Execute,	Fetch


       hkey/offset functions: should be	able to	convert	between	different
       "place" formats (Array and Hash prefixes), like the common fetch
       routine,	or ASSERT that prefix matches.
       add reverse scan	to search/SQLFetch
       support multicol	keys, non-unique keys (via combo of key	+ rid as
       support transaction unique constraints -- probably via treat key+rid as
       unique, then turn on true unique	key, and scan for duplicates?
       find out	why can't do pctfree=0
       Work on RDBlk_NN	support.
       search with startkey/stopkey support, vs	supplying compare/equal
       methods.	restricting the	search api to straight "=","<" comparisons
       means can try the estimation function
       need to handle partial startkey/stopkey comparison in searchR/SQLFetch
       for multi-col keys
       semantics of nulls in multi-col keys -- sort low?
       simplify	_pack_row with splice and a supplied split position, something
       like -1 for normal indexes (n-1 key cols, 1 val col, so pop the val) or
       "N=?" for index-organized tables	(N key cols, M val cols, so splice N)
       reorganize along	the lines of "GiST" Generalized	Search Trees (Paul
       Aoki, J.	Hellerstein, UCB)
       ecount support?

       Jeffrey I. Cohen,


       Copyright (c) 2003, 2004	Jeffrey	I Cohen.  All rights reserved.

	   This	program	is free	software; you can redistribute it and/or modify
	   it under the	terms of the GNU General Public	License	as published by
	   the Free Software Foundation; either	version	2 of the License, or
	   any later version.

	   This	program	is distributed in the hope that	it will	be useful,
	   but WITHOUT ANY WARRANTY; without even the implied warranty of
	   GNU General Public License for more details.

	   You should have received a copy of the GNU General Public License
	   along with this program; if not, write to the Free Software
	   Foundation, Inc., 51	Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

       Address bug reports and comments	to:

       For more	information, please visit the Genezzo homepage at

perl v5.32.0			  2006-10-15		Genezzo::Index::bt2(3)


Want to link to this manual page? Use this URL:

home | help