Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PARALLELCPU(1)	      User Contributed Perl Documentation	PARALLELCPU(1)

       PDL::ParallelCPU	- Parallel Processor MultiThreading Support in PDL

       PDL has support (currently experimental)	for splitting up numerical
       processing between multiple parallel processor threads (or pthreads)
       using the set_autopthread_targ and set_autopthread_size functions.
       This can	improve	processing performance (by greater than	2-4X in	most
       cases) by taking	advantage of multi-core	and/or multi-processor

	 use PDL;

	 # Set target of 4 parallel pthreads to	create,	with a lower limit of
	 #  5Meg elements for splitting	processing into	parallel pthreads.

	 $a = zeroes(5000,5000); # Create 25Meg	element	array

	 $b = $a + 5; #	Processing will	be split up into multiple pthreads

	 # Get the actual number of pthreads for the last
	 #  processing operation.
	 $actualPthreads = get_autopthread_actual();

       The use of the term threading can be confusing with PDL,	because	it can
       refer to	PDL threading, as defined in the PDL::Threading	docs, or to
       processor multi-threading.

       To reduce confusion with	the existing PDL threading terminology,	this
       document	uses pthreading	to refer to processor multi-threading, which
       is the use of multiple processor	threads	to split up numerical
       processing into parallel	operations.

Functions that control PDL PThreads
       This is a brief listing and description of the PDL pthreading
       functions, see the PDL::Core docs for detailed information.

	    Set	the target number of processor-threads (pthreads) for multi-
	    threaded processing. Setting auto_pthread_targ to 0	means that no
	    pthreading will occur.

	    See	PDL::Core for details.

	    Set	the minimum size (in Meg-elements or 2**20 elements) of	the
	    largest PDL	involved in a function where auto-pthreading will be
	    performed. For small PDLs, it probably isn't worth starting
	    multiple pthreads, so this function	is used	to define a minimum
	    threshold where auto-pthreading won't be attempted.

	    See	PDL::Core for details.

	    Get	the actual number of pthreads executed for the last pdl
	    processing function.

	    See	PDL::get_autopthread_actual for	details.

Global Control of PDL PThreading using Environment Variables
       PDL PThreading can be globally turned on, without modifying existing
       code by setting environment variables PDL_AUTOPTHREAD_TARG and
       PDL_AUTOPTHREAD_SIZE before running a PDL script.  These	environment
       variables are checked when PDL starts up	and calls to
       set_autopthread_targ and	set_autopthread_size functions made with the
       environment variable's values.

       For example, if the environment var PDL_AUTOPTHREAD_TARG	is set to 3,
       and PDL_AUTOPTHREAD_SIZE	is set to 10, then any pdl script will run as
       if the following	lines were at the top of the file:


How It Works
       The auto-pthreading process works by analyzing threaded array
       dimensions in PDL operations and	splitting up processing	based on the
       thread dimension	sizes and desired number of pthreads (i.e. the pthread
       target or pthread_targ).	The offsets and	increments that	PDL uses to
       step thru the data in memory are	modified for each pthread so each one
       sees a different	set of data when performing processing.


	$a = sequence(20,4,3); # Small 3-D Array, size 20,4,3

	# Setup	auto-pthreading:
	set_autopthread_targ(2); # Target of 2 pthreads
	set_autopthread_size(0); # Zero	so that	the small PDLs in this example will be pthreaded

	# This will be split up	into 2 pthreads
	$c = maximum($a);

       For the above example, the maximum function has a signature of "(a(n);
       [o]c())", which means that the first dimension of $a (size 20) is a
       Core dimension of the maximum function. The other dimensions of $a
       (size 4,3) are threaded dimensions (i.e.	will be	threaded-over in the
       maximum function.

       The auto-pthreading algorithm examines the threaded dims	of size	(4,3)
       and picks the 4 dimension, since	it is evenly divisible by the
       autopthread_targ	of 2. The processing of	the maximum function is	then
       split into two pthreads on the size-4 dimension,	with dim indexes 0,2
       processed by one	pthread
	and dim	indexes	1,3 processed by the other pthread.

   Must	have POSIX Threads Enabled
       Auto-PThreading only works if your PDL installation was compiled	with
       POSIX threads enabled. This is normally the case	if you are running on
       linux, or other unix variants.

   Non-Threadsafe Code
       Not all the libraries that PDL intefaces	to are thread-safe, i.e. they
       aren't written to operate in a multi-threaded environment without
       crashing	or causing side-effects. Some examples in the PDL core is the
       fft function and	the pnmout functions.

       To operate properly with	these types of functions, the PPCode flag
       NoPthread has been introduced to	indicate a function as not being
       pthread-safe. See PDL::PP docs for details.

   Size	of PDL Dimensions and PThread Target
       Due to the way a	PDL is split-up	for operation using multiple pthreads,
       the size	of a dimension must be evenly divisible	by the pthread target.
       For example, if a PDL has threaded dimension sizes of (4,3,3) and the
       auto_pthread_targ has been set to 2, then the first threaded dimension
       (size 4)	will be	picked to be split up into two pthreads	of size	2 and
       2. However, if the threaded dimension sizes are (3,3,3) and the
       auto_pthread_targ is still 2, then pthreading won't occur, because no
       threaded	dimensions are divisible by 2.

       The algorithm that picks	the actual number of pthreads has some smarts
       (but could probably be improved)	to adjust down from the
       auto_pthread_targ to get	a number of pthreads that can evenly divide
       one of the threaded dimensions. For example, if a PDL has threaded
       dimension sizes of (9,2,2) and the auto_pthread_targ is 4, the
       algorithm will see that no dimension is divisible by 4, then adjust
       down the	target to 3, resulting in splitting up the first threaded
       dimension (size 9) into 3 pthreads.

   Speed improvement might be less than	you expect.
       If you have a 8 core machine and	call auto_pthread_targ with 8 to
       generate	8 parallel pthreads, you probably won't	get a 8X improvement
       in speed, due to	memory bandwidth issues. Even though you have 8
       separate	CPUs crunching away on data, you will have (for	most common
       machine architectures) common RAM that now becomes your bottleneck. For
       simple calculations (e.g	simple additions) you can run into a
       performance limit at about
	4 pthreads. For	more complex calculations the limit will be higher.

       Copyright 2011 John Cerney. You can distribute and/or modify this
       document	under the same terms as	the current Perl license.


perl v5.32.0			  2018-05-05			PARALLELCPU(1)

NAME | DESCRIPTION | SYNOPSIS | Terminology | Functions that control PDL PThreads | Global Control of PDL PThreading using Environment Variables | How It Works | Limitations | COPYRIGHT

Want to link to this manual page? Use this URL:

home | help