Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Digest(3)	      User Contributed Perl Documentation	     Digest(3)

       File::RsyncP::Digest - Perl interface to	rsync message digest

	   use File::RsyncP::Digest;

	   $rsDigest = new File::RsyncP::Digest;

	   # specify rsync protocol version (default is	<= 26 -> buggy digests).

	   # file MD4 digests

	   $digest = $rsDigest->digest();
	   $string = $rsDigest->hexdigest();

	   # Return 32 byte pair of digests (protocol <= 26 and	>= 27).
	   $digestPair = $rsDigest->digest2();

	   $digest = File::RsyncP::Digest->hash(SCALAR);
	   $string = File::RsyncP::Digest->hexhash(SCALAR);

	   # block digests
	   $digests = $rsDigest->blockDigest($data, $blockSize,	$md4DigestLen,

	   $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
				       $blockLastLen, $md4DigestLen, $checksumSeed);

	   $digests2 = $rsDigest->blockDigestExtract($digests16, $md4DigestLen);

       The File::RsyncP::Digest	module allows you to compute rsync digests,
       including the RSA Data Security Inc. MD4	Message	Digest algorithm, and
       Adler32 checksums from within Perl programs.

   Rsync Digests
       Rsync uses two main digests (or checksums), for checking	with very high
       probability that	the underlying data is identical, without the need to
       exchange	the underlying data.

       The server (remote) side	of rsync generates a checksumSeed (usually
       unix time()) that is exchanged during the protocol startup.  This seed
       is used in both the file	and MD4	checksum calculations.	This causes
       the block and file checksums to change every time Rsync is run.

       File Digest
	   This	is an MD4 digest of the	checksum seed, followed	by the entire
	   file's contents.  This digest is 128	bits long.  The	file digest is
	   sent	at the end of a	file's deltas to ensure	that the reconstructed
	   file	is correct.  This digest is also optionally computed and sent
	   as part of the file list if the --checksum option is	specified to

       Block digest
	   Each	file is	divided	into blocks of default length 700 bytes.  The
	   digest of each block	is formed by computing the Adler32 checksum of
	   the block, and also the MD4 digest of the block followed by the
	   checksum seed.  During phase	1, just	the first two bytes of the MD4
	   digest are sent, meaning the	total digest is	6 bytes	or 48 bits (4
	   bytes for Adler32 and the first 2 bytes of the MD4 digest).	During
	   phase 2 (which is necessary for received files that have an
	   incorrect file digest), the entire MD4 checksum is used (128	bits)
	   meaning the block digest is 20 bytes	or 160 bits.  (Prior to	rsync
	   protocol XXX, the full 20 byte digest was sent every	time and there
	   was only a single phase.)

       This module contains routines for computing file	and block digests in a
       manner that is identical	to rsync.

       Incidentally, rsync contains two	bugs in	its implementation of MD4 (up
       to and including	rsync protocol version 26):

       o   MD4Final() is not called when the data size (ie: file or block size
	   plus	4 bytes	for the	checksum seed) is a multiple of	64.

       o   MD4 is not correct for total	data sizes greater than	512MB (2^32
	   bits).  Rsync's MD4 only maintains the data size using a 32 bit
	   counter, so it overflows for	file sizes bigger than 512MB.

       The effects of these bugs are benign: the MD4 digest should not be
       cryptographically weakened and both sides are consistent.

       This module implements both versions of the MD4 digest: the buggy
       version for protocol versions <=	26 and the correct version for
       protocol	versions >= 27.	 The default mode is the buggy version
       (protocol versions <= 26).

       You can specify the rsync protocol version to determine which MD4
       version is used:

	   # specify rsync protocol version (default is	<= 26 -> buggy digests).

       Also, you can get both digests in a single call.	 The result is
       returned	as a single 32 byte scalar: the	first 16 bytes is the buggy
       digest and the second 16	bytes is the correct digest:

	   # Return 32 byte pair of digests (protocol <= 26 and	>= 27).
	   $digestPair = $rsDigest->digest2();

       A new rsync digest context object is created with the new operation.
       Multiple	simultaneous digest contexts can be maintained,	if desired.

   Computing Block Digests
       After a context is created, the function	to compute block checksums is:

	   $digests = $rsDigest->blockDigest($data, $blockSize,	$md4DigestLen,

       The first argument is the data, which can contain as much raw data as
       you wish	(ie: multiple blocks).	Both the Adler32 checksum and the MD4
       checksum	are computed for each block in data.  The partial end block
       (if present) is also processed.	The 4 bytes of the integer
       checksumSeed is added at	the end	of each	block digest calculation if it
       is non-zero.  The blockSize is specified	in the second argument
       (default	is 700).  The third argument, md4DigestLen, specifies how many
       bytes of	the MD4	digest are included in the returned data.  Rsync uses
       a value of 2 for	the first pass (meaning	6 bytes	of total digests are
       returned	per block), and	all 16 bytes for the second pass (meaning 20
       bytes of	total digests are returned per block).	The returned number of
       bytes is	the number of bytes in each digest (Alder32 + partial/compete
       MD4) times the number of	blocks:

	   (4 +	md4DigestLen) *	ceil(length(data) / blockSize);

       To allow	block checksums	to be cached (when checksumSeed	is unknown),
       and then	quickly	updated	with the known checksumSeed, the checksum data
       should be first computed	with a digest length of	-1 and a checksumSeed
       of 0:

	   $state = $rsDigest->blockDigest($data, $blockSize, -1, 0);

       The returned $state should be saved for later retrieval,	together with
       the length of the last partial block (eg: length($data) % $blockSize).
       The length of $state depends upon the number of blocks and the block
       size.  In addition to the 16 bytes of MD4 state,	up to 63 bytes of
       unprocessed data	per block also is saved	in $state.  For	each block,

	   16 +	($blockSize % 64)

       bytes are saved in $state, so $state is most compact when $blockSize is
       a multiple of 64.  (The last, partial, block might have a smaller block
       size, requiring up to 63	bytes of state even if $blockSize is a
       multiple	of 64.)

       Once the	checksumSeed is	known the updated checksums can	then be
       computed	using:

	   $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
				       $blockLastLen, $md4DigestLen, $checksumSeed);

       The first argument is the cached	checksums from blockDigest.  The third
       argument	is the length of the (partial) last block.

       Alternatively, I	hope to	add a --checksum-seed=n	option to rsync	that
       allows the checksum seed	to be set to 0.	 This causes the checksum seed
       to be omitted from the MD4 calculation and it makes caching the
       checksums much easier.  A zero checksum seed does not weaken the	block
       digest.	I'm not	sure whether or	not it weakens the file	digest (the
       checksum	seed is	applied	at the start of	the file digest	and end	of the
       block digest).  In this case, the full 16 byte checksums	should be
       computed	using:

	   $digests16 =	$rsDigest->blockDigest($data, $blockSize, 16, 0);

       and for phase 1 the 2 byte MD4 substrings can be	extracted with:

	   $digests2  =	$rsDigest->blockDigestExtract($digests16, 2);

       The original $digests16 does not	need any additional processing for
       phase 2.

   Computing File Digests
       In addition, functions identical	to Digest::MD4 are provided that allow
       rsync's MD4 file	digest to be computed.	The checksum seed, if non-
       zero, is	included at the	start of the data, before the file's contents
       are added.

       The context is updated with the add operation which adds	the strings
       contained in the	LIST parameter.	Note, however, that "add('foo',
       'bar')",	"add('foo')" followed by "add('bar')" and "add('foobar')"
       should all give the same	result.

       The final MD4 message digest value is returned by the digest operation
       as a 16-byte binary string. This	operation delivers the result of add
       operations since	the last new or	reset operation. Note that the digest
       operation is effectively	a destructive, read-once operation. Once it
       has been	performed, the context must be reset before being used to
       calculate another digest	value.

       Several convenience functions are also provided.	The addfile operation
       takes an	open file-handle and reads it until end-of file	in 1024	byte
       blocks adding the contents to the context. The file-handle can either
       be specified by name or passed as a type-glob reference,	as shown in
       the examples below. The hexdigest operation calls digest	and returns
       the result as a printable string	of hexdecimal digits. This is exactly
       the same	operation as performed by the unpack operation in the examples

       The hash	operation can act as either a static member function (ie you
       invoke it on the	MD4 class as in	the synopsis above) or as a normal
       virtual function. In both cases it performs the complete	MD4 cycle
       (reset, add, digest) on the supplied scalar value. This is convenient
       for handling small quantities of	data. When invoked on the class	a
       temporary context is created. When invoked through an already created
       context object, this context is used. The latter	form is	slightly more
       efficient. The hexhash operation	is analogous to	hexdigest.

	   use File::RsyncP::Digest;

	   my $rsDigest	= new File::RsyncP::Digest;
	   $rsDigest->add('foo', 'bar');
	   my $digest =	$rsDigest->digest();

	   print("Rsync	MD4 Digest is "	. unpack("H*", $digest)	. "\n");

       The above example would print out the message

	   Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21

       To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per
       block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's,
       with a checksum seed of 0x12345678:

	   use File::RsyncP::Digest;

	   my $rsDigest	= new File::RsyncP::Digest;
	   my $data = ("a" x 700) . ("b" x 700)	. ("c" x 600);
	   my $digest =	$rsDigest->rsyncChecksum($data,	700, 2,	0x12345678);

	   print("Rsync	block checksums	are " .	unpack("H*", $digest) .	"\n");

       This will print:

	   Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49

       The same	result can be achieved in two steps by saving the state, and
       then finishing the calculation:

	   my $state = $rsDigest->blockDigest($data, 700, -1, 0);

	   my $digest =	$rsDigest->blockDigestUpdate($state, 700,
					   length($data) % 700,	2, 0x12345678);

       or by computing full-length MD4 digests,	and extracting the 2 byte

	   my $digest16	= $rsDigest->blockDigest($data,	700, 16, 0x12345678);
	   my $digest	= $rsDigest->blockDigestExtract($digest16, 2);

       This program is free software: you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation, either	version	3 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       General Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program.  If not, see <>.

       The MD4 algorithm is defined in RFC1320.	The basic C code implementing
       the algorithm is	derived	from that in the RFC and is covered by the
       following copyright:

	  MD4 is Copyright (C) 1990-2, RSA Data	Security, Inc. All rights

	  License to copy and use this software	is granted provided that it
	  is identified	as the "RSA Data Security, Inc.	MD4 Message-Digest
	  Algorithm" in	all material mentioning	or referencing this software
	  or this function.

	  License is also granted to make and use derivative works provided
	  that such works are identified as "derived from the RSA Data
	  Security, Inc. MD4 Message-Digest Algorithm" in all material
	  mentioning or	referencing the	derived	work.

	  RSA Data Security, Inc. makes	no representations concerning either
	  the merchantability of this software or the suitability of this
	  software for any particular purpose. It is provided "as is"
	  without express or implied warranty of any kind.

	  These	notices	must be	retained in any	copies of any part of this
	  documentation	and/or software.

       This copyright does not prohibit	distribution of	any version of Perl
       containing this extension under the terms of the	GNU or Artistic

       File::RsyncP::Digest was	written	by Craig Barratt
       <>	based on Digest::MD4 and the Adler32
       implementation was based	on rsync 2.5.5.

       Digest::MD4 was adapted by Mike McCauley	(""), based
       entirely	on MD5-1.7, written by Neil Winton

       Rsync was written by Andrew Tridgell <> and Paul
       Mackerras.  It is available under a GPL license.	 See

       See <> for File::RsyncP's SourceForge
       home page.

       See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.

perl v5.24.1			  2015-01-18			     Digest(3)


Want to link to this manual page? Use this URL:

home | help