Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HPL_pdpanrlT(3)		     HPL Library Functions	       HPL_pdpanrlT(3)

       HPL_pdpanrlT - Right-looking panel factorization.

       #include	"hpl.h"

       void HPL_pdpanrlT( HPL_T_panel *	PANEL, const int M, const int N, const
       int ICOFF, double * WORK	);

       HPL_pdpanrlT factorizes	a panel	of columns  that is a sub-array	 of  a
       larger  one-dimensional	panel A	using the Right-looking	variant	of the
       usual one-dimensional algorithm.	 The lower triangular  N0-by-N0	 upper
       block of	the panel is stored in transpose form.

       Bi-directional	exchange   is  used  to	 perform  the  swap::broadcast
       operations  at once  for	one column in the panel.  This	results	 in  a
       lower  number  of slightly larger  messages than	usual.	On P processes
       and assuming bi-directional links,  the running time of	this  function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P	) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) *	gam2-3

       where  M	 is the	local number of	rows of	 the panel, lat	and bdwth  are
       the latency and bandwidth of the	network	for  double   precision	  real
       words,	and   gam2-3  is an estimate of	the  Level 2 and Level 3  BLAS
       rate of execution. The  recursive  algorithm  allows indeed  to	almost
       achieve	 Level	3 BLAS	performance  in	the panel factorization.  On a
       large  number of	modern machines,  this	operation is  however  latency
       bound,	meaning	  that its cost	can  be	estimated  by only the latency
       portion N0 * log_2(P) * lat.  Mono-directional links will  double  this
       communication cost.

       Note  that   one	 iteration of the the main loop	is unrolled. The local
       computation of the absolute value max of	the next column	 is  performed
       just  after  its	update by the current column. This allows to bring the
       current column only  once through  cache	at each	 step.	 The   current
       implementation	does  not perform  any blocking	 for  this sequence of
       BLAS operations,	however	the design allows for plugging in  an  optimal
       (machine-specific)  specialized	 BLAS-like kernel.  This idea has been
       suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

       PANEL   (local input/output)    HPL_T_panel *
	       On entry,  PANEL	 points	to the data structure  containing  the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local	number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local	number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On  entry,  ICOFF specifies the row and column offset of	sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

       HPL_dlocmax (3),	HPL_dlocswpN (3),  HPL_dlocswpT	(3),  HPL_pdmxswp (3),
       HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN	(3), HPL_pdpanllT (3),
       HPL_pdpanrlN (3).

HPL 2.1			       October 26, 2012		       HPL_pdpanrlT(3)


Want to link to this manual page? Use this URL:

home | help