Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
VENTI(7)	       Miscellaneous Information Manual		      VENTI(7)

       venti - archival	storage	server

       Venti is	a block	storage	server intended	for archival data.  In a Venti
       server, the SHA1	hash of	a block's contents acts	as the	block  identi-
       fier  for  read	and write operations.  This approach enforces a	write-
       once policy, preventing accidental or malicious	destruction  of	 data.
       In  addition,  duplicate	 copies	of a block are coalesced, reducing the
       consumption of storage and simplifying the implementation of clients.

       This manual page	documents the basic concepts of	 block	storage	 using
       Venti as	well as	the Venti network protocol.

       Venti(1)	  documents   some  simple  clients.   Vac(1),	vacfs(4),  and
       vbackup(8) are more complex clients.

       Venti(3)	describes a C library interface	for  accessing	Venti  servers
       and manipulating	Venti data structures.

       Venti(8)	describes the programs used to run a Venti server.

       The  SHA1  hash that identifies a block is called its score.  The score
       of the zero-length block	is called the zero score.

       Scores may have an optional label: prefix, typically used  to  describe
       the  format of the data.	 For example, vac(1) uses a vac: prefix, while
       vbackup(8) uses prefixes	corresponding to the file system types:	ext2:,
       ffs:, and so on.

   Files and Directories
       Venti  accepts blocks up	to 56 kilobytes	in size.  By convention, Venti
       clients use hash	trees  of  blocks  to  represent  arbitrary-size  data
       files.  The data	to be stored is	split into fixed-size blocks and writ-
       ten to the server, producing a list of scores.  The resulting  list  of
       scores  is split	into fixed-size	pointer	blocks (using only an integral
       number of scores	per block) and written	to  the	 server,  producing  a
       smaller	list of	scores.	 The process continues,	eventually ending with
       the score for the hash tree's top-most block.  Each  file  stored  this
       way  is summarized by a VtEntry structure recording the top-most	score,
       the depth of the	tree, the data block size, and the pointer block size.
       One or more VtEntry structures can be concatenated and stored as	a spe-
       cial file called	a directory.  In this manner, arbitrary	trees of files
       can be constructed and stored.

       Scores  passed  between programs	conventionally refer to	VtRoot blocks,
       which contain descriptive information as	well as	the score of a	direc-
       tory block containing a small number of directory entries.

       Conventionally,	programs  do not mix data and directory	entries	in the
       same file.  Instead, they keep two separate files, one  with  directory
       entries	and  one  with metadata	referencing those entries by position.
       Keeping this parallel representation is a minor annoyance but makes  it
       possible	 for  general  programs	like venti/copy	(see venti(1)) to tra-
       verse the block tree without knowing the	specific details of  any  par-
       ticular program's data.

   Block Types
       To  allow  programs to traverse these structures	without	needing	to un-
       derstand	their higher-level meanings, Venti  tags  each	block  with  a
       type.  The types	are:

	   VtDataType	  000  data
	   VtDataType+1	  001  scores of VtDataType blocks
	   VtDataType+2	  002  scores of VtDataType+1 blocks
	   VtDirType	  010  VtEntry structures
	   VtDirType+1	  011  scores of VtDirType blocks
	   VtDirType+2	  012  scores of VtDirType+1 blocks
	   VtRootType	  020  VtRoot structure

       The  octal numbers listed are the type numbers used by the commands be-
       low.  (For historical reasons, the type numbers used on disk and	on the
       wire  are  different  from  the	above.	 They  do  not distinguish Vt-
       DataType+n blocks from VtDirType+n blocks.)

   Zero	Truncation
       To avoid	storing	the same short data blocks padded with differing  num-
       bers of zeros, Venti clients working with fixed-size blocks convention-
       ally `zero truncate' the	blocks before writing them to the server.  For
       example,	 if  a 1024-byte data block contains the 11-byte string	`hello
       world' followed by 1013 zero bytes,  a  client  would  store  only  the
       11-byte	block.	 When the client later read the	block from the server,
       it would	append zero bytes to the end as	necessary  to  reach  the  ex-
       pected size.

       When  truncating	 pointer blocks	(VtDataType+n and VtDirType+n blocks),
       trailing	zero scores are	removed	instead	of trailing zero bytes.

       Because of the truncation convention, any file consisting  entirely  of
       zero  bytes, no matter what its length, will be represented by the zero
       score: the data blocks contain all zeros	and are	thus truncated to  the
       empty  block,  and  the	pointer	blocks contain all zero	scores and are
       thus also truncated to the empty	block, and so on up the	hash tree.

   Network Protocol
       A Venti session begins when a client connects to	 the  network  address
       served  by a Venti server; the conventional address is tcp!server!venti
       (the venti port is 17034).  Both	client and server begin	by  sending  a
       version	string	of  the	 form  venti-versions-comment\n.  The versions
       field is	a list of acceptable versions separated	by colons.  The	proto-
       col described here is version 02.  The client is	responsible for	choos-
       ing a common version and	sending	it in the VtThello message,  described

       After  the  initial version exchange, the client	transmits requests (T-
       messages) to the	server,	which  subsequently  returns  replies  (R-mes-
       sages)  to  the client.	The combined act of transmitting (receiving) a
       request of a particular type, and receiving (transmitting) its reply is
       called a	transaction of that type.

       Each message consists of	a sequence of bytes.  Two-byte fields hold un-
       signed integers represented in big-endian order (most significant  byte
       first).	 Data  items of	variable lengths are represented by a one-byte
       field specifying	a count, n, followed by	n bytes	of data.  Text strings
       are  represented	similarly, using a two-byte count with the text	itself
       stored as a UTF-encoded sequence	of Unicode  characters	(see  utf(7)).
       Text  strings  are  not NUL-terminated: n counts	the bytes of UTF data,
       which include no	final zero byte.  The NUL character is illegal in text
       strings	in  the	Venti protocol.	 The maximum string length in Venti is
       1024 bytes.

       Each Venti message begins with a	two-byte  size	field  specifying  the
       length  in bytes	of the message,	not including the length field itself.
       The next	byte is	the message type, one of the constants in the enumera-
       tion  in	 the  include file <venti.h>.  The next	byte is	an identifying
       tag, used to match responses to requests.  The remaining	bytes are  pa-
       rameters	 of  different sizes.  In the message descriptions, the	number
       of bytes	in a field is given in brackets	after the field	name.  The no-
       tation  parameter[n]  where  n is not a constant	represents a variable-
       length parameter: n[1] followed by n bytes of data forming the  parame-
       ter.  The notation string[s] (using a literal s character) is shorthand
       for s[2]	followed by s bytes of UTF-8 text.  The	 notation  parameter[]
       where parameter is the last field in the	message	represents a variable-
       length field that comprises all remaining bytes in the message.

       All Venti RPC messages are prefixed with	a  field  size[2]  giving  the
       length  of  the	message	that follows (not including the	size field it-
       self).  The message bodies are:

	      VtThello tag[1] version[s] uid[s]	strength[1] crypto[n] codec[n]
	      VtRhello tag[1] sid[s] rcrypto[1]	rcodec[1]

	      VtTping tag[1]
	      VtRping tag[1]

	      VtTread tag[1] score[20] type[1] pad[1] count[2]
	      VtRread tag[1] data[]

	      VtTwrite tag[1] type[1] pad[3] data[]
	      VtRwrite tag[1] score[20]

	      VtTsync tag[1]
	      VtRsync tag[1]

	      VtRerror tag[1] error[s]

	      VtTgoodbye tag[1]

       Each T-message has a one-byte tag field,	chosen and used	by the	client
       to  identify the	message.  The server will echo the request's tag field
       in the reply.  Clients should arrange that no two outstanding  messages
       have the	same tag field so that responses can be	distinguished.

       The  type  of  an R-message will	either be one greater than the type of
       the corresponding T-message or  Rerror,	indicating  that  the  request
       failed.	In the latter case, the	error field contains a string describ-
       ing the reason for failure.

       Venti connections must begin with a hello  transaction.	 The  VtThello
       message	contains  the  protocol	 version that the client has chosen to
       use.  The fields	strength, crypto, and codec could be used to  add  au-
       thentication,  encryption, and compression to the Venti session but are
       currently ignored.  The rcrypto,	and rcodec fields in the VtRhello  re-
       sponse  are  similarly ignored.	The uid	and sid	fields are intended to
       be the identity of the client and server	but, given the lack of authen-
       tication, should	be treated only	as advisory.  The initial hello	should
       be the only hello transaction during the	session.

       The ping	message	has no	effect	and  is	 used  mainly  for  debugging.
       Servers should respond immediately to pings.

       The  read  message requests a block with	the given score	and type.  Use
       vttodisktype and	vtfromdisktype (see venti(3)) to convert a block  type
       enumeration  value  (VtDataType,	etc.)  to the type used	on disk	and in
       the protocol.  The count	field specifies	the maximum expected  size  of
       the block.  The data in the reply is the	block's	contents.

       The  write  message  writes a new block of the given type with contents
       data to the server.  The	response includes the score to use to read the
       block, which should be the SHA1 hash of data.

       The Venti server	may buffer written blocks in memory, waiting until af-
       ter responding to the write message before writing  them	 to  permanent
       storage.	  The  server  will delay the response to a sync message until
       after all blocks	in earlier write messages have been written to	perma-
       nent storage.

       The  goodbye  message ends a session.  There is no VtRgoodbye: upon re-
       ceiving the VtTgoodbye message, the server terminates  up  the  connec-

       Version	04  of	the Venti protocol is similar to version 02 (described
       above) but has two changes to accomodates larger	payloads.   First,  it
       replaces	 the  leading  2-byte packet size with a 4-byte	size.  Second,
       the count in the	VtTread	packet may be either 2 or 4 bytes;  the	 total
       packet length distinguishes the two cases.

       venti(1), venti(3), venti(8)
       Sean  Quinlan  and  Sean	 Dorward,  ``Venti: a new approach to archival
       storage'', Usenix Conference on File and	Storage	Technologies , 2002.



Want to link to this manual page? Use this URL:

home | help