Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help

       lat_mem_rd - memory read	latency	benchmark

       lat_mem_rd  [  -P _parallelism_ ] [ -W _warmups_	] [ -N _repetitions_ ]
       size_in_megabytes stride	[ stride stride...  ]

       lat_mem_rd measures memory read latency for varying  memory  sizes  and
       strides.	  The  results	are  reported in nanoseconds per load and have
       been verified accurate to within	a few nanoseconds on an	SGI Indy.

       The entire memory hierarchy is measured,	including  onboard  cache  la-
       tency  and  size, external cache	latency	and size, main memory latency,
       and TLB miss latency.

       Only data accesses are measured;	the instruction	cache is not measured.

       The benchmark runs as two nested	loops.	The outer loop is  the	stride
       size.   The  inner  loop	 is  the array size.  For each array size, the
       benchmark creates a ring	of pointers that point	backward  one  stride.
       Traversing the array is done by

	    p =	(char **)*p;

       in  a  for  loop	(the over head of the for loop is not significant; the
       loop is an unrolled loop	100 loads long).

       The size	of the array  varies  from  512	 bytes	to  (typically)	 eight
       megabytes.  For the small sizes,	the cache will have an effect, and the
       loads will be much faster.  This	becomes	much more  apparent  when  the
       data is plotted.

       Since this benchmark uses fixed-stride offsets in the pointer chain, it
       may be vulnerable to smart, stride-sensitive  cache  prefetching	 poli-
       cies.   Older  machines	were typically able to prefetch	for sequential
       access patterns,	and some were able to prefetch for strided forward ac-
       cess  patterns, but only	a few could prefetch for backward strided pat-
       terns.  These capabilities are becoming more widespread in  newer  pro-

       Output  format  is  intended as input to	xgraph or some similar program
       (we use a perl script that produces pic input).	There is a set of data
       produced	 for  each  stride.  The data set title	is the stride size and
       the data	points are the array size in megabytes (floating point	value)
       and the load latency over all points in that array.

       The  output is best examined in a graph where you typically get a graph
       that has	four plateaus.	The graph should plotted in log	base 2 of  the
       array size on the X axis	and the	latency	on the Y axis.	Each stride is
       then plotted as a curve.	 The plateaus that appear  correspond  to  the
       onboard	cache  (if  present), external cache (if present), main	memory
       latency,	and TLB	miss latency.

       As a rough guide, you may be able to extract the	latencies of the vari-
       ous  parts  as follows, but you should really look at the graphs, since
       these rules of thumb do not always work (some systems do	not  have  on-
       board cache, for	example).

       onboard cache   Try stride of 128 and array size	of .00098.

       external	cache  Try stride of 128 and array size	of .125.

       main memory     Try stride of 128 and array size	of 8.

       TLB miss	       Try the largest stride and the largest array.

       This  program  is dependent on the correct operation of mhz(8).	If you
       are getting numbers that	seem off, check	that mhz(8) is	giving	you  a
       clock rate that you believe.

       Funding	for the	development of this tool was provided by Sun Microsys-
       tems Computer Corporation.

       lmbench(8), tlb(8), cache(8), line(8).

       Carl Staelin and	Larry McVoy

       Comments, suggestions, and bug reports are always welcome.

(c)1994	Larry McVoy		    $Date$			 LAT_MEM_RD(8)


Want to link to this manual page? Use this URL:

home | help