Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Genezzo::Util(3)      User Contributed Perl Documentation     Genezzo::Util(3)

       PackRow2	takes list of items and	packs them (non-destructively) into a
       string of <= maxsize bytes.  If offset is not specified,	it builds the
       string starting with the	last item in the list, prepending it with each
       preceding item until it runs out	of space or the	list is	fully
       consumed.  If the packer	runs out of space, it returns the offset into
       the list	where it stopped.  The offset may be supplied as an argument
       to this function, and the packer	will pack the remainder	of the list
       starting	at the offset, working back to the beginning of	the list.  The
       final argument to the packer is a "next pointer", a string that
       identifies the location of the next part	of a row split into multiple
       pieces.	Since the packer processes a list from back to front, the
       address of the "next" piece can be obtained before constructing the
       preceding piece.	 If the	packer can process a complete list, it returns
       an array	containing a single packed string, a byte string consisting of
       a count of the number of	packed items, followed by length/value pairs
       for each	item.  If the packer runs out of space,	it returns an array of
       the packed string and the offset	of the remaining items

       For example, given the list @a =	qw(alpha bravo charlie delta), and a
       maxsize=15, PackRow2 returns a packed string (something like
       x01x05delta) and	the offset 3, indicating that the last item in the
       list was	processed, and the packer ran out of space at the third	item.
       The packed string could be stored in a pushhash,	which would return an
       index, e.g. "5/2", suitable for a next pointer.	Packing	the remainder
       of the string generates another packed string (e.g.
       x02x07charliex035/2) and	the offset 2.  The packing and storage process
       continues until the entire list is consumed.

   advanced topics
       null vector
	   The packed string always contains a bitstring to identify null
	   columns, which is used by UnPackRow to correctly distinguish
	   between nulls and zero length strings.

       next pointer
	   Since the next pointer is used to find the next part	of a split
	   row,	it must	always remain whole -- if it was split,	how could you
	   find	the next piece?	 The next pointer is a convention supported by
	   PackRow/UnPackRow to	facilitate the construction of methods that
	   manipulate split rows.  The packing function	only flattens an array
	   into	a byte string or series	of strings; it does not	provide	any
	   intrinsic support to	traverse these strings.	 Functions that
	   manipulate packed rows may use additional structures	to support
	   multi-part rows, such as external metadata in the block row
	   directory, or specialized metadata columns embedded in the row

       column splitting	(fragmentation)
	   The packer can support rows with individual columns that exceed the
	   maxsize.  The offset	can simultaneously maintain the	current	column
	   position, as	well as	the current character offset in	that column.
	   It's	wicked complicated.  Generally,	we say that a row is split
	   into	row pieces, and	the row	pieces are chained (via	the next
	   pointers), which lets us reconstruct	a complete row.	 Individual
	   columns that	are split are said to be fragmented.

   future work
       The packer could	be extended to support more complex structures than
       arrays of scalars.  In lieu of this ability, these structures can be
       flattened using Data::Dumper or YAML to large strings.

       Genezzo::Util - Utility functions

       Should bundle all data file utility functions, such as
       FileGetHeaderInfo, SetHeaderInfo, etc, under separate Util::DataFile
       FileGetHeaderInfo: need to handle case of header	which exceeds a	single
       block.  Probably	should keep increasing the buffer size until find null
       terminator (within reason).
       packrow:	store metadata in col0 vs trailing col with next ptr
       packrow:	check pack format for a	zero len row of	zero cols. Does	it
       need a nullvec?
       packrow/unpackrow: in Perl 5.8 could use	the nifty repeating templates
       to our advantage.
       packrow:	could generate skiplists as col	zero metadata tracking byte
       position	and column numbers to speed lookups

       Jeffrey I. Cohen,


       Copyright (c) 2003-2007 Jeffrey I Cohen.	 All rights reserved.

	   This	program	is free	software; you can redistribute it and/or modify
	   it under the	terms of the GNU General Public	License	as published by
	   the Free Software Foundation; either	version	2 of the License, or
	   any later version.

	   This	program	is distributed in the hope that	it will	be useful,
	   but WITHOUT ANY WARRANTY; without even the implied warranty of
	   GNU General Public License for more details.

	   You should have received a copy of the GNU General Public License
	   along with this program; if not, write to the Free Software
	   Foundation, Inc., 51	Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

       Address bug reports and comments	to:

       For more	information, please visit the Genezzo homepage at

perl v5.32.1			  2007-01-23		      Genezzo::Util(3)


Want to link to this manual page? Use this URL:

home | help