Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
QDBM(3)			    Quick Database Manager		       QDBM(3)

       QDBM - quick database manager

       QDBM is a library of routines for managing a database.  The database is
       a simple	data file containing records, each is a	pair of	a  key	and  a
       value.  Every key and value is serial bytes with	variable length.  Both
       binary data and character string	can be used as	a  key	and  a	value.
       There  is  neither  concept of data tables nor data types.  Records are
       organized in hash table or B+ tree.

       As for database of hash table, each key must be unique within  a	 data-
       base, so	it is impossible to store two or more records with a key over-
       laps.  The following access methods are provided	to the database: stor-
       ing  a  record  with a key and a	value, deleting	a record by a key, re-
       trieving	a record by a key.  Moreover, traversal	access	to  every  key
       are  provided,  although	 the order is arbitrary.  These	access methods
       are similar to ones of DBM (or its followers: NDBM  and	GDBM)  library
       defined	in  the	UNIX standard.	QDBM is	an alternative for DBM because
       of its higher performance.

       As for database of B+ tree, records whose keys are  duplicated  can  be
       stored.	 Access	 methods of storing, deleting, and retrieving are pro-
       vided as	with the database of hash table.  Records are stored in	 order
       by  a  comparing	function assigned by a user.  It is possible to	access
       each record with	the cursor in ascending	or descending order.   Accord-
       ing  to	this  mechanism, forward matching search for strings and range
       search for integers are realized.  Moreover, transaction	 is  available
       in database of B+ tree.

       QDBM  is	 developed  referring to GDBM for the purpose of the following
       three points: higher processing speed, smaller size of a	database file,
       and  simpler  API.   They  have been achieved.  Moreover, the following
       three restrictions of traditional DBM: a	process	can  handle  only  one
       database,  the size of a	key and	a value	is bounded, a database file is
       sparse, are cleared.

       QDBM uses hash algorithm	to retrieve records.  If a  bucket  array  has
       sufficient  number  of  elements,  the  time complexity of retrieval is
       `O(1)'.	That is, time required for retrieving a	 record	 is  constant,
       regardless of the scale of a database.  It is also the same about stor-
       ing and deleting.  Collision of hash  values  is	 managed  by  separate
       chaining.  Data structure of the	chains is binary search	tree.  Even if
       a bucket	array has unusually scarce elements, the  time	complexity  of
       retrieval is `O(log n)'.

       QDBM  attains improvement in retrieval by loading RAM with the whole of
       a bucket	array.	If a bucket array is on	RAM, it	is possible to	access
       a  region  of  a	target record by about one path	of file	operations.  A
       bucket array saved in a file is not read	into RAM with the `read'  call
       but  directly  mapped to	RAM with the `mmap' call.  Therefore, prepara-
       tion time on connecting to a database is	very short, and	 two  or  more
       processes can share the same memory map.

       If  the	number	of elements of a bucket	array is about half of records
       stored within a database, although it depends on	characteristic of  the
       input,  the  probability	 of  collision	of  hash values	is about 56.7%
       (36.8% if the same, 21.3% if twice, 11.5% if four times,	6.0% if	 eight
       times).	 In  such  case, it is possible	to retrieve a record by	two or
       less paths of file operations.  If it is	made into a performance	index,
       in  order  to  handle  a	 database containing one million of records, a
       bucket array with half a	million	of elements is needed.	 The  size  of
       each  element  is 4 bytes.  That	is, if 2M bytes	of RAM is available, a
       database	containing one million records can be handled.

       QDBM provides  two  modes  to  connect  to  a  database:	 `reader'  and
       `writer'.   A  reader  can  perform  retrieving but neither storing nor
       deleting.  A writer can perform all access methods.  Exclusion  control
       between	processes  is  performed when connecting to a database by file
       locking.	 While a writer	is connected to	a  database,  neither  readers
       nor  writers  can be connected.	While a	reader is connected to a data-
       base, other readers can be connect, but writers can not.	 According  to
       this  mechanism,	 data consistency is guaranteed	with simultaneous con-
       nections	in multitasking	environment.

       Traditional DBM provides	two modes of the storing operations:  `insert'
       and  `replace'.	In the case a key overlaps an existing record, the in-
       sert mode keeps the existing value, while the replace  mode  transposes
       it to the specified value.  In addition to the two modes, QDBM provides
       `concatenate' mode.  In the mode, the specified value  is  concatenated
       at  the	end  of	the existing value and stored.	This feature is	useful
       when adding a element to	a value	as an array.  Moreover,	 although  DBM
       has  a  method to fetch out a value from	a database only	by reading the
       whole of	a region of a record, QDBM has a method	to fetch out a part of
       a region	of a value.  When a value is treated as	an array, this feature
       is also useful.

       Generally speaking, while  succession  of  updating,  fragmentation  of
       available  regions  occurs,  and	 the size of a database	grows rapidly.
       QDBM deal with this problem by coalescence of dispensable  regions  and
       reuse of	them, and featuring of optimization of a database.  When over-
       writing a record	with a value whose size	is greater than	 the  existing
       one,  it	 is  necessary to remove the region to another position	of the
       file.  Because the time complexity of the operation depends on the size
       of  the	region	of  a record, extending	values successively is ineffi-
       cient.  However,	QDBM deal with this problem by alignment.   If	incre-
       ment can	be put in padding, it is not necessary to remove the region.

       As  for many file systems, it is	impossible to handle a file whose size
       is more than 2GB.  To deal with this problem, QDBM provides a directory
       database	 containing  multiple database files.  Due to this feature, it
       is possible to handle a database	whose total size is up to 1TB in  the-
       ory.   Moreover,	 because  database  files  can be deployed on multiple
       disks, the speed	of updating operations can be improved as with	RAID-0
       (striping).   It	 is  also possible for the database files to deploy on
       multiple	file servers using NFS and so on.

       Although	B+ tree	database is slower than	hash database, it features or-
       dering  access  to  each	 record.   The order can be assigned by	users.
       Records of B+ tree are sorted and arranged in  logical  pages.	Sparse
       index organized in B tree that is multiway balanced tree	are maintained
       for each	page.  Thus, the time complexity of retrieval  and  so	on  is
       `O(log  n)'.   Cursor  is provided to access each record	in order.  The
       cursor can jump to a position specified by a key	and can	 step  forward
       or  backward  from the current position.	 Because each page is arranged
       as double linked	list,  the  time  complexity  of  stepping  cursor  is

       B+ tree database	is implemented,	based on above hash database.  Because
       each page of B+ tree is stored as each record of	hash database, B+ tree
       database	 inherits  efficiency  of storage management of	hash database.
       Because the header of each record is smaller and	alignment of each page
       is  calculated  statistically, in most cases, the size of database file
       is cut by half compared to one of hash database.	 Although operation of
       many  pages  are	required to update B+ tree, QDBM expedites the process
       by caching pages	and reducing file operations.  In most cases,  because
       whole  of  the  sparse index is cached on memory, it is possible	to re-
       trieve a	record by one or less path of file operations.

       B+ tree database	features transaction mechanism.	  It  is  possible  to
       commit  a series	of operations between the beginning and	the end	of the
       transaction in a	lump, or to abort the transaction and perform rollback
       to  the state before the	transaction.  Even if the process of an	appli-
       cation is crushed while the transaction,	the database file is not  bro-

       In case that QDBM is built with ZLIB, LZO, or BZIP2 enabled, a lossless
       data-compression	library, the content of	each page of B+	tree  is  com-
       pressed	and stored in a	file.  Because each record in a	page has simi-
       lar patterns, high efficiency of	compression is	expected  due  to  the
       Lempel-Ziv  algorithm  and  the	like.  In case handling	text data, the
       size of a database is reduced to	about 25%.  If the scale of a database
       is  large  and  disk I/O	is the bottleneck, featuring compression makes
       the processing speed improved to	a large	extent.

       QDBM provides very simple APIs.	You can	perform	database I/O as	 usual
       file  I/O  with	`FILE' pointer defined in ANSI C.  In the basic	API of
       QDBM, entity of a database is recorded as one file.   In	 the  extended
       API,  entity  of	 a database is recorded	as several files in one	direc-
       tory.  Because the two APIs are very similar with each  other,  porting
       an application from one to the other is easy.

       APIs  which  are	 compatible  with NDBM and GDBM	are also provided.  As
       there are a lot of applications using NDBM or GDBM, it is easy to  port
       them  onto QDBM.	 In most cases,	it is completed	only by	replacement of
       header including	(#include) and re-compiling.  However,	QDBM  can  not
       handle database files made by the original NDBM or GDBM.

       In  order  to  handle records on	memory easily, the utility API is pro-
       vided.  It implements memory allocating functions,  sorting  functions,
       extensible datum, array list, hash map, and so on.  Using them, you can
       handle records in C language cheaply as in  such	 script	 languages  as
       Perl or Ruby.

       B+  tree	 database  is used with	the advanced API.  The advanced	API is
       implemented using the basic API and the utility API.  Because  the  ad-
       vanced API is also similar to the basic API and the extended API, it is
       easy to learn how to use	it.

       In order	to handle an inverted index which is used by full-text	search
       systems,	 the inverted API is provided.	If it is easy to handle	an in-
       verted index of documents, an application can focus on text  processing
       and  natural  language processing.  Because this	API does not depend on
       character codes nor languages, it is possible to	implement a  full-text
       search system which can respond to various requests from	users.

       Along  with  APIs  for  C,  QDBM	provides APIs for C++, Java, Perl, and
       Ruby.  APIs for C are composed of seven kinds: the basic	API,  the  ex-
       tended API, the NDBM-compatible API, the	GDBM-compatible	API, the util-
       ity API,	the advanced API, and the inverted API.	 Command  line	inter-
       faces corresponding to each API are also	provided.  They	are useful for
       prototyping, testing, debugging,	and so on.  The	C++  API  encapsulates
       database	handling functions of the basic	API, the extended API, and the
       advanced	API with class mechanism of C++.   The	Java  API  has	native
       methods	calling	 the basic API,	the extended API, and the advanced API
       with Java Native	Interface.  The	Perl API has methods calling the basic
       API, the	extended API, and the advanced API with	XS language.  The Ruby
       API has method calling the basic	API, the extended  API,	 and  the  ad-
       vanced  API  as modules of Ruby.	 Moreover, CGI scripts for administra-
       tion of databases and full-text search are provided.

       QDBM is implemented being based on syntax of ANSI  C  (C89)  and	 using
       only  APIs  defined  in ANSI C or POSIX.	 Thus, QDBM works on most UNIX
       and its compatible OSs.	As for C API, checking	operations  have  been
       done  at	least on Linux 2.2, Linux 2.4, FreeBSD 4.8, FreeBSD 5.0, SunOS
       5.7, SunOS 5.8, SunOS 5.9, HP-UX	11.00, Cygwin 1.3.10, Mac OS  X	 10.2,
       and  RISC OS 5.03.  Although a database file created by QDBM depends on
       byte order of the processor, to do with it, utilities to	dump  data  in
       format which is independent to byte orders are provided.

       For  building a program using QDBM, the program should be linked	with a
       library file `libqdbm.a'	or `'.  For example,	the  following
       command is executed to build `sample' from `sample.c'.

	      gcc  -I/usr/local/include	 -o  sample  sample.c -L/usr/local/lib

       QDBM is written by Mikio	Hirabayashi.  You can contact  the  author  by
       e-mail to <>.  Any suggestion or bug report is welcome
       to the author.

       Copyright(c) 2000-2003 Mikio Hirabayashi

       QDBM is free software; you can redistribute it and/or modify  it	 under
       the  terms of the GNU Lesser General Public License as published	by the
       Free Software Foundation; either	version	2 of the License, or any later

       QDBM is distributed in the hope that it will be useful, but WITHOUT ANY
       WARRANTY; without even the implied warranty of MERCHANTABILITY or  FIT-
       NESS  FOR  A PARTICULAR PURPOSE.	 See the GNU Lesser General Public Li-
       cense for more details.

       You should have received	a copy of the GNU Lesser  General  Public  Li-
       cense  along  with QDBM;	if not,	write to the Free Software Foundation,
       Inc., 59	Temple Place, Suite 330, Boston, MA 02111-1307 USA.

       depot(3), curia(3), relic(3), hovel(3), cabin(3),  villa(3),  odeum(3),
       ndbm(3),	gdbm(3)

Man Page			  2004-04-22			       QDBM(3)


Want to link to this manual page? Use this URL:

home | help