PDL::ParallelCPU • man page

PDL::ParallelCPU (1)

Leading comments

Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35)

Standard preamble:
========================================================================

(The comments found at the beginning of the groff file "man1/PDL::ParallelCPU.1p".)

NAME

PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL (Experimental)

DESCRIPTION

PDL

has support (currently experimental) for splitting up numerical processing between multiple parallel processor threads (or pthreads) using the set_autopthread_targ and set_autopthread_size functions. This can improve processing performance (by greater than 2-4X in most cases) by taking advantage of multi-core and/or multi-processor machines.

SYNOPSIS

  use PDL;
  
  # Set target of 4 parallel pthreads to create, with a lower limit of
  #  5Meg elements for splitting processing into parallel pthreads.
  set_autopthread_targ(4);
  set_autopthread_size(5);
  
  $a = zeroes(5000,5000); # Create 25Meg element array
  
  $b = $a + 5; # Processing will be split up into multiple pthreads
  
  # Get the actual number of pthreads for the last
  #  processing operation.
  $actualPthreads = get_autopthread_actual();

Terminology

The use of the term threading can be confusing with

PDL,

because it can refer to

PDL

threading, as defined in the PDL::Threading docs, or to processor multi-threading.

To reduce confusion with the existing

PDL

threading terminology, this document uses pthreading to refer to processor multi-threading, which is the use of multiple processor threads to split up numerical processing into parallel operations.

Functions that control PDL PThreads

This is a brief listing and description of the

PDL

pthreading functions, see the PDL::Core docs for detailed information.

set_autopthread_targ: Set the target number of processor-threads (pthreads) for multi-threaded processing. Setting auto_pthread_targ to 0 means that no pthreading will occur.
See PDL::Core for details.
set_autopthread_size: Set the minimum size (in Meg-elements or 2**20 elements) of the largest
PDL
involved in a function where auto-pthreading will be performed. For small PDLs, it probably isn't worth starting multiple pthreads, so this function is used to define a minimum threshold where auto-pthreading won't be attempted.
See PDL::Core for details.
get_autopthread_actual: Get the actual number of pthreads executed for the last pdl processing function.
See PDL::get_autopthread_actual for details.

Global Control of PDL PThreading using Environment Variables

PDL

PThreading can be globally turned on, without modifying existing code by setting environment variables

PDL_AUTOPTHREAD_TARG

and

PDL_AUTOPTHREAD_SIZE

before running a

PDL

script. These environment variables are checked when

PDL

starts up and calls to set_autopthread_targ and set_autopthread_size functions made with the environment variable's values.

For example, if the environment var

PDL_AUTOPTHREAD_TARG

is set to 3, and

PDL_AUTOPTHREAD_SIZE

is set to 10, then any pdl script will run as if the following lines were at the top of the file:

 set_autopthread_targ(3);
 set_autopthread_size(10);

How Works

The auto-pthreading process works by analyzing threaded array dimensions in

PDL

operations and splitting up processing based on the thread dimension sizes and desired number of pthreads (i.e. the pthread target or pthread_targ). The offsets and increments that

PDL

uses to step thru the data in memory are modified for each pthread so each one sees a different set of data when performing processing.

Example

 $a = sequence(20,4,3); # Small 3-D Array, size 20,4,3
 
 # Setup auto-pthreading:
 set_autopthread_targ(2); # Target of 2 pthreads
 set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded

 # This will be split up into 2 pthreads
 $c = maximum($a);

For the above example, the maximum function has a signature of "(a(n); [o]c())", which means that the first dimension of $a (size 20) is a Core dimension of the maximum function. The other dimensions of $a (size 4,3) are threaded dimensions (i.e. will be threaded-over in the maximum function.

The auto-pthreading algorithm examines the threaded dims of size (4,3) and picks the 4 dimension, since it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then split into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by one pthread
and dim indexes 1,3 processed by the other pthread.

Limitations

Must have
POSIX
Threads Enabled

Auto-PThreading only works if your

PDL

installation was compiled with

POSIX

threads enabled. This is normally the case if you are running on linux, or other unix variants.

Non-Threadsafe Code

Not all the libraries that

PDL

intefaces to are thread-safe, i.e. they aren't written to operate in a multi-threaded environment without crashing or causing side-effects. Some examples in the

PDL

core is the fft function and the pnmout functions.

To operate properly with these types of functions, the PPCode flag NoPthread has been introduced to indicate a function as not being pthread-safe. See

PDL::PP

docs for details.

Size of
PDL
Dimensions and PThread Target

Due to the way a

PDL

is split-up for operation using multiple pthreads, the size of a dimension must be evenly divisible by the pthread target. For example, if a

PDL

has threaded dimension sizes of (4,3,3) and the auto_pthread_targ has been set to 2, then the first threaded dimension (size 4) will be picked to be split up into two pthreads of size 2 and 2. However, if the threaded dimension sizes are (3,3,3) and the auto_pthread_targ is still 2, then pthreading won't occur, because no threaded dimensions are divisible by 2.

The algorithm that picks the actual number of pthreads has some smarts (but could probably be improved) to adjust down from the auto_pthread_targ to get a number of pthreads that can evenly divide one of the threaded dimensions. For example, if a

PDL

has threaded dimension sizes of (9,2,2) and the auto_pthread_targ is 4, the algorithm will see that no dimension is divisible by 4, then adjust down the target to 3, resulting in splitting up the first threaded dimension (size 9) into 3 pthreads.

Speed improvement might be less than you expect.

If you have a 8 core machine and call auto_pthread_targ with 8 to generate 8 parallel pthreads, you probably won't get a 8X improvement in speed, due to memory bandwidth issues. Even though you have 8 separate CPUs crunching away on data, you will have (for most common machine architectures) common

RAM

that now becomes your bottleneck. For simple calculations (e.g simple additions) you can run into a performance limit at about
4 pthreads. For more complex calculations the limit will be higher.

COPYRIGHT

See: dev.perl.org/licenses

PDL::ParallelCPU • man page

PDL::ParallelCPU • man page

PDL::ParallelCPU (1)

Leading comments

NAME

DESCRIPTION

SYNOPSIS

Terminology

Functions that control PDL PThreads

Global Control of PDL PThreading using Environment Variables

How Works

Limitations

Must have
POSIX
Threads Enabled

Non-Threadsafe Code

Size of
PDL
Dimensions and PThread Target

Speed improvement might be less than you expect.

COPYRIGHT

Man Section

extra • Version

extra • Source

extra • Book

PDL::ParallelCPU • man page

PDL::ParallelCPU • man page

PDL::ParallelCPU (1)

Leading comments

NAME

DESCRIPTION

SYNOPSIS

Terminology

Functions that control PDL PThreads

Global Control of PDL PThreading using Environment Variables

How Works

Limitations

Must have POSIX Threads Enabled

Non-Threadsafe Code

Size of PDL Dimensions and PThread Target

Speed improvement might be less than you expect.

COPYRIGHT

Man Section

extra • Version

extra • Source

extra • Book

Must have
POSIX
Threads Enabled

Size of
PDL
Dimensions and PThread Target