Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
VENTI(7)	       Miscellaneous Information Manual		      VENTI(7)

       venti - archival	storage	server

       Venti is	a block	storage	server intended	for archival data.  In a Venti
       server, the SHA1	hash of	a block's contents acts	as the	block  identi-
       fier  for  read	and write operations.  This approach enforces a	write-
       once policy, preventing accidental or malicious	destruction  of	 data.
       In  addition,  duplicate	 copies	of a block are coalesced, reducing the
       consumption of storage and simplifying the implementation of clients.

       This manual page	documents the basic concepts of	 block	storage	 using
       Venti as	well as	the Venti network protocol.

       documents some simple clients.  and are more complex clients.

       describes a C library interface for accessing Venti servers and manipu-
       lating Venti data structures.

       describes the programs used to run a Venti server.

       The SHA1	hash that identifies a block is	called its score.   The	 score
       of the zero-length block	is called the zero score.

       Scores  may  have an optional label: prefix, typically used to describe
       the format of the data.	For example, uses a vac:  prefix,  while  uses
       prefixes	 corresponding	to  the	file system types: ext2:, ffs:,	and so

   Files and Directories
       Venti accepts blocks up to 56 kilobytes in size.	 By convention,	 Venti
       clients	use  hash  trees  of  blocks  to represent arbitrary-size data
       files.  The data	to be stored is	split into fixed-size blocks and writ-
       ten  to	the server, producing a	list of	scores.	 The resulting list of
       scores is split into fixed-size pointer blocks (using only an  integral
       number  of  scores  per	block)	and written to the server, producing a
       smaller list of scores.	The process continues, eventually ending  with
       the  score  for	the hash tree's	top-most block.	 Each file stored this
       way is summarized by a VtEntry structure	recording the top-most	score,
       the depth of the	tree, the data block size, and the pointer block size.
       One or more VtEntry structures can be concatenated and stored as	a spe-
       cial file called	a directory.  In this manner, arbitrary	trees of files
       can be constructed and stored.

       Scores passed between programs conventionally refer to  VtRoot  blocks,
       which  contain descriptive information as well as the score of a	direc-
       tory block containing a small number of directory entries.

       Conventionally, programs	do not mix data	and directory entries  in  the
       same  file.   Instead, they keep	two separate files, one	with directory
       entries and one with metadata referencing those	entries	 by  position.
       Keeping	this parallel representation is	a minor	annoyance but makes it
       possible	for general programs like  venti/copy  (see  to	 traverse  the
       block  tree without knowing the specific	details	of any particular pro-
       gram's data.

   Block Types
       To allow	programs to traverse these structures without needing  to  un-
       derstand	 their	higher-level  meanings,	 Venti	tags each block	with a
       type.  The types	are:

	   VtDataType	  000  data
	   VtDataType+1	  001  scores of VtDataType blocks
	   VtDataType+2	  002  scores of VtDataType+1 blocks
	   VtDirType	  010  VtEntry structures
	   VtDirType+1	  011  scores of VtDirType blocks
	   VtDirType+2	  012  scores of VtDirType+1 blocks
	   VtRootType	  020  VtRoot structure

       The octal numbers listed	are the	type numbers used by the commands  be-
       low.  (For historical reasons, the type numbers used on disk and	on the
       wire are	different  from	 the  above.   They  do	 not  distinguish  Vt-
       DataType+n blocks from VtDirType+n blocks.)

   Zero	Truncation
       To  avoid storing the same short	data blocks padded with	differing num-
       bers of zeros, Venti clients working with fixed-size blocks convention-
       ally `zero truncate' the	blocks before writing them to the server.  For
       example,	if a 1024-byte data block contains the 11-byte	string	`hello
       world'  followed	 by  1013  zero	 bytes,	 a client would	store only the
       11-byte block.  When the	client later read the block from  the  server,
       it  would  append  zero	bytes to the end as necessary to reach the ex-
       pected size.

       When truncating pointer blocks (VtDataType+n and	 VtDirType+n  blocks),
       trailing	zero scores are	removed	instead	of trailing zero bytes.

       Because	of  the	truncation convention, any file	consisting entirely of
       zero bytes, no matter what its length, will be represented by the  zero
       score:  the data	blocks contain all zeros and are thus truncated	to the
       empty block, and	the pointer blocks contain all	zero  scores  and  are
       thus also truncated to the empty	block, and so on up the	hash tree.

   Network Protocol
       A  Venti	 session  begins when a	client connects	to the network address
       served by a Venti server; the conventional address is  tcp!server!venti
       (the  venti  port is 17034).  Both client and server begin by sending a
       version string of  the  form  venti-versions-comment\n.	 The  versions
       field is	a list of acceptable versions separated	by colons.  The	proto-
       col described here is version 02.  The client is	responsible for	choos-
       ing  a common version and sending it in the VtThello message, described

       After the initial version exchange, the client transmits	 requests  (T-
       messages)  to  the  server,  which subsequently returns replies (R-mes-
       sages) to the client.  The combined act of transmitting	(receiving)  a
       request of a particular type, and receiving (transmitting) its reply is
       called a	transaction of that type.

       Each message consists of	a sequence of bytes.  Two-byte fields hold un-
       signed  integers	represented in big-endian order	(most significant byte
       first).	Data items of variable lengths are represented by  a  one-byte
       field specifying	a count, n, followed by	n bytes	of data.  Text strings
       are represented similarly, using	a two-byte count with the text	itself
       stored  as  a  UTF-encoded  sequence  of	 Unicode  characters (see Text
       strings are not NUL-terminated: n counts	the bytes of UTF  data,	 which
       include	no  final  zero	 byte.	 The  NUL character is illegal in text
       strings in the Venti protocol.  The maximum string length in  Venti  is
       1024 bytes.

       Each  Venti  message  begins  with a two-byte size field	specifying the
       length in bytes of the message, not including the length	field  itself.
       The next	byte is	the message type, one of the constants in the enumera-
       tion in the include file	<venti.h>.  The	next byte  is  an  identifying
       tag,  used to match responses to	requests.  The remaining bytes are pa-
       rameters	of different sizes.  In	the message descriptions,  the	number
       of bytes	in a field is given in brackets	after the field	name.  The no-
       tation parameter[n] where n is not a constant  represents  a  variable-
       length  parameter: n[1] followed	by n bytes of data forming the parame-
       ter.  The notation string[s] (using a literal s character) is shorthand
       for  s[2]  followed by s	bytes of UTF-8 text.  The notation parameter[]
       where parameter is the last field in the	message	represents a variable-
       length field that comprises all remaining bytes in the message.

       All  Venti  RPC	messages  are prefixed with a field size[2] giving the
       length of the message that follows (not including the  size  field  it-
       self).  The message bodies are:

	      VtThello tag[1] version[s] uid[s]	strength[1] crypto[n] codec[n]
	      VtRhello tag[1] sid[s] rcrypto[1]	rcodec[1]

	      VtTping tag[1]
	      VtRping tag[1]

	      VtTread tag[1] score[20] type[1] pad[1] count[2]
	      VtRread tag[1] data[]

	      VtTwrite tag[1] type[1] pad[3] data[]
	      VtRwrite tag[1] score[20]

	      VtTsync tag[1]
	      VtRsync tag[1]

	      VtRerror tag[1] error[s]

	      VtTgoodbye tag[1]

       Each  T-message has a one-byte tag field, chosen	and used by the	client
       to identify the message.	 The server will echo the request's tag	 field
       in  the reply.  Clients should arrange that no two outstanding messages
       have the	same tag field so that responses can be	distinguished.

       The type	of an R-message	will either be one greater than	 the  type  of
       the  corresponding  T-message  or  Rerror,  indicating that the request
       failed.	In the latter case, the	error field contains a string describ-
       ing the reason for failure.

       Venti  connections  must	 begin with a hello transaction.  The VtThello
       message contains	the protocol version that the  client  has  chosen  to
       use.   The  fields strength, crypto, and	codec could be used to add au-
       thentication, encryption, and compression to the	Venti session but  are
       currently  ignored.  The	rcrypto, and rcodec fields in the VtRhello re-
       sponse are similarly ignored.  The uid and sid fields are  intended  to
       be the identity of the client and server	but, given the lack of authen-
       tication, should	be treated only	as advisory.  The initial hello	should
       be the only hello transaction during the	session.

       The  ping  message  has	no  effect  and	 is used mainly	for debugging.
       Servers should respond immediately to pings.

       The read	message	requests a block with the given	score and  type.   Use
       vttodisktype  and  vtfromdisktype (see to convert a block type enumera-
       tion value (VtDataType, etc.)  to the type used on disk and in the pro-
       tocol.	The  count  field  specifies  the maximum expected size	of the
       block.  The data	in the reply is	the block's contents.

       The write message writes	a new block of the given  type	with  contents
       data to the server.  The	response includes the score to use to read the
       block, which should be the SHA1 hash of data.

       The Venti server	may buffer written blocks in memory, waiting until af-
       ter  responding	to  the	write message before writing them to permanent
       storage.	 The server will delay the response to a  sync	message	 until
       after  all blocks in earlier write messages have	been written to	perma-
       nent storage.

       The goodbye message ends	a session.  There is no	VtRgoodbye:  upon  re-
       ceiving	the  VtTgoodbye	 message, the server terminates	up the connec-

       Version 04 of the Venti protocol	is similar to  version	02  (described
       above)  but  has	two changes to accomodates larger payloads.  First, it
       replaces	the leading 2-byte packet size with a  4-byte  size.   Second,
       the  count  in the VtTread packet may be	either 2 or 4 bytes; the total
       packet length distinguishes the two cases.

       Sean Quinlan and	Sean Dorward, ``Venti:	a  new	approach  to  archival
       storage'', Usenix Conference on File and	Storage	Technologies , 2002.



Want to link to this manual page? Use this URL:

home | help