PDL::Indexing (1)
Leading comments
Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) Standard preamble: ========================================================================
NAME
PDL::Indexing  Introduction to indexing and slicing piddles.OVERVIEW
This man page should serve as a first tutorial on the indexing and threading features ofLike all vectorized languages,
A lot of the flexibility and power of
The values of a piddle are stored compactly as typed values in a single block of memory, not (as in a normal Perl listoflists) as individual Perl scalars.
In the sections that follow many ``methods'' are called out  these are Perl operators that apply to piddles. From the perldl (or pdl2) shell, you can find out more about each method by typing ``?'' followed by the method name.
Dimension lists
A piddle (You can access the dimensions of a piddle as a Perl list and otherwise determine the size of a piddle with several methods. The important ones are:
 nelem  the total number of elements in a piddle
 ndims  returns the number of dimensions in a piddle
 dims  returns the dimension list of a piddle as a Perl list
 dim  returns the size of a particular dimension of a piddle
Indexing and Dataflow
There are two important methods that let you control dataflow connections between a child and parent piddle:
 copy  forces an explicit copy of a piddle
 sever  breaks the dataflow connection between a piddle and its parents (if any)
Threading and Dimension Order
Most mv  moves a particular dimension somewhere else in the dimension list
 xchg  exchanges two dimensions in the dimension list, leaving the rest alone
 reorder  allows wholesale mixing of the dimensions
 clump  clumps together two or more small dimensions into one larger one
 squeeze  eliminates any dimensions of size 1
Physical and Dummy Dimensions
 *
 document Perl level threading
 *
 threadids
 *
 update and correct description of slice
 *
 new functions in slice.pd (affine, lag, splitdim)
 *
 reworking of paragraph on explicit threading
Indexing and threading with PDL
A lot of the flexibility and power ofPiddles are Perl objects that represent multidimensional arrays and operations on those. In contrast to simple Perl @x style lists the array data is compactly stored in a single block of memory thus taking up a lot less memory and enabling use of fast C code to implement operations (e.g. addition, etc) on piddles.
piddles can have children
Central to many of the indexing capabilities ofAnother way to interpret the piddles created by our indexing commands is to view them as a kind of intelligent pointer that points back to some portion or all of its parent's data. Therefore, it is not surprising that the parent's data (or a portion of it) changes when manipulated through this ``pointer''. After these introductory remarks that hopefully prepared you for what is coming (rather than confuse you too much) we are going to dive right in and start with a description of the indexing commands and some typical examples how they might be used in
There are two different implementations of this ``smart pointer'' relationship: the first one, which is a little slower but works for any transformation is simply to do the transformation forwards and backwards as necessary. The other is to consider the child piddle a ``virtual'' piddle, which only stores a pointer to the parent and access information so that routines which use the child piddle actually directly access the data in the parent. If the virtual piddle is given to a routine which cannot use it,
Currently (1.94_01) all transformations which are ``affine'', i.e. the indices of the data item in the parent piddle are determined by a linear transformation (+ constant) from the indices of the child piddle result in virtual piddles. All other indexing routines (e.g. ">index(...)") result in physical piddles. All routines compiled by
Note that whether something is affine or not does not affect the semantics of what you do in any way: both
$a>index(...) .= 5; $a>slice(...) .= 5;
change the data in $a. The affinity does, however, have a significant impact on memory usage and performance.
Slicing piddles
Probably the most important application of the concept of parent/child piddles is the representation of rectangular slices of a physical piddle by a virtual piddle. Having talked long enough about concepts let's get more specific. Suppose we are working with a 2D piddle representing a 5x5 image (its unusually small so that we can print it without filling several screens full of digits ;).
pdl> $im = sequence(5,5) pdl> p $im [ [ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19] [20 21 22 23 24] ] pdl> help vars PDL variables in package main:: Name Type Dimension Flow State Mem  $im Double D [5,5] P 0.20Kb
[ here it might be appropriate to quickly talk about the "help vars" command that provides information about piddles in the interactive "perldl" or "pdl2" shell that comes with
Now suppose we want to create a 1D piddle that just references one line of the image, say line 2; or a piddle that represents all even lines of the image (imagine we have to deal with even and odd frames of an interlaced image due to some peculiar behaviour of our frame grabber). As another frequent application of slices we might want to create a piddle that represents a rectangular region of the image with top and bottom reversed. All these effects (and many more) can be easily achieved with the powerful slice function:
pdl> $line = $im>slice(':,(2)') pdl> $even = $im>slice(':,1:1:2') pdl> $area = $im>slice('3:4,3:1') pdl> help vars # or just PDL>vars PDL variables in package main:: Name Type Dimension Flow State Mem  $even Double D [5,2] C 0.00Kb $im Double D [5,5] P 0.20Kb $line Double D [5] C 0.00Kb $area Double D [2,3] C 0.00Kb
All three ``child'' piddles are children of $im or in the other (largely equivalent) interpretation pointers to data of $im. Operations on those virtual piddles access only those portions of the data as specified by the argument to slice. So we can just print line 2:
pdl> p $line [10 11 12 13 14]
Also note the difference in the ``Flow State'' of $area above and below:
pdl> p $area pdl> help $area This variable is Double D [2,3] VC 0.00Kb
The following demonstrates that $im and $line really behave as you would expect from a pointerlike object (or in the dataflow picture: the changes in $line's data are propagated back to $im):
pdl> $im++ pdl> p $line [11 12 13 14 15] pdl> $line += 2 pdl> p $im [ [ 1 2 3 4 5] [ 6 7 8 9 10] [13 14 15 16 17] [16 17 18 19 20] [21 22 23 24 25] ]
Note how assignment operations on the child virtual piddles change the parent physical piddle and vice versa (however, the basic ``='' assignment doesn't, use ``.='' to obtain that effect. See below for the reasons). The virtual child piddles are something like ``live links'' to the ``original'' parent piddle. As previously said, they can be thought of to work similar to a Cpointer. But in contrast to a Cpointer they carry a lot more information. Firstly, they specify the structure of the data they represent (the dimensionality of the new piddle) and secondly, specify how to create this structure from its parents data (the way this works is buried in the internals of
The previous examples have demonstrated typical usage of the slice function. Since the slicing functionality is so important here is an explanation of the syntax for the string argument to slice:
$vpdl = $a>slice('ind0,ind1...')
where "ind0" specifies what to do with index No 0 of the piddle $a, etc. Each element of the comma separated list can have one of the following forms:
 ':'
 Use the whole dimension
 'n'

Use only index "n". The dimension of this index in the
resulting virtual piddle is 1. An example involving those first two index
formats:
pdl> $column = $im>slice('2,:') pdl> $row = $im>slice(':,0') pdl> p $column [ [ 3] [ 8] [15] [18] [23] ] pdl> p $row [ [1 2 3 4 5] ] pdl> help $column This variable is Double D [1,5] VC 0.00Kb pdl> help $row This variable is Double D [5,1] VC 0.00Kb
 '(n)'

Use only index "n". This dimension is removed from the
resulting piddle (relying on the fact that a dimension of size 1 can always be
removed). The distinction between this case and the previous one
becomes important in assignments where left and right hand side have to
have appropriate dimensions.
pdl> $line = $im>slice(':,(0)') pdl> help $line This variable is Double D [5] C 0.00Kb pdl> p $line [1 2 3 4 5]
Spot the difference to the previous example?
 'n1:n2' or 'n1:n2:n3'

Take the range of indices from "n1" to "n2" or (second form)
take the range of indices from "n1" to "n2" with step
"n3". An example for the use of this format is the previous
definition of the subimage composed of even lines.
pdl> $even = $im>slice(':,1:1:2')
This example also demonstrates that negative indices work like they do for normal Perl style arrays by counting backwards from the end of the dimension. If "n2" is smaller than "n1" (in the example 1 is equivalent to index 4) the elements in the virtual piddle are effectively reverted with respect to its parent.
 '*[n]'

Add a dummy dimension. The size of this dimension will be 1 by default
or equal to "n" if the optional numerical argument is given.
Now, this is really something a bit strange on first sight. What is a dummy dimension? A dummy dimension inserts a dimension where there wasn't one before. How is that done ? Well, in the case of the new dimension having size 1 it can be easily explained by the way in which you can identify a vector (with "m" elements) with an "(1,m)" or "(m,1)" matrix. The same holds obviously for higher dimensional objects. More interesting is the case of a dummy dimensions of size greater than one (e.g. "slice('*5,:')"). This works in the same way as a call to the dummy function creates a new dummy dimension. So read on and check its explanation below.
 '([n1:n2[:n3]]=i)'

[Not yet implemented ??????]
With an argument like this you make generalised diagonals. The
diagonal will be dimension no. "i" of the new output piddle and (if
optional part in brackets specified) will extend along the range of
indices specified of the respective parent piddle's dimension. In general
an argument like this only makes sense if there are other arguments
like this in the same call to slice. The part in brackets is optional
for this type of argument. All arguments of this type that specify the
same target dimension "i" have to relate to the same number of
indices in their parent dimension. The best way to explain it is probably to
give an example, here we make a piddle that refers to the elements along
the space diagonal of its parent piddle (a cube):
$cube = zeroes(5,5,5); $sdiag = $cube>slice('(=0),(=0),(=0)');
The above command creates a virtual piddle that represents the diagonal along the parents' dimension no. 0, 1 and 2 and makes its dimension 0 (the only dimension) of it. You use the extended syntax if the dimension sizes of the parent dimensions you want to build the diagonal from have different sizes or you want to reverse the sequence of elements in the diagonal, e.g.
$rect = zeroes(12,3,5,6,2); $vpdl = $rect>slice('2:7,(0:1=1),(4),(5:4=1),(=1)');
So the elements of $vpdl will then be related to those of its parent in way we can express as:
vpdl(i,j) = rect(i+2,j,4,5j,j) 0<=i<5, 0<=j<2
[ work in the new index function: "$b = $a>index($c);" ???? ]
There are different kinds of assignments in PDL
The previous examples have already shown that virtual piddles can be used
to operate on or access portions of data of a parent piddle. They can
also be used as lvalues in assignments (as the use of "++" in some of
the examples above has already demonstrated). For explicit assignments
to the data represented by a virtual piddle you have to use the
overloaded ".=" operator (which in this context we call propagated
assignment). Why can't you use the normal assignment operator "="?
Well, you definitely still can use the '=' operator but it wouldn't do what you want. This is due to the fact that the '=' operator cannot be overloaded in the same way as other assignment operators. If we tried to use '=' to try to assign data to a portion of a physical piddle through a virtual piddle we wouldn't achieve the desired effect (instead the variable representing the virtual piddle (a reference to a blessed thingy) would after the assignment just contain the reference to another blessed thingy which would behave to future assignments as a ``physical'' copy of the original rvalue [this is actually not yet clear and subject of discussions in the
E.g.
pdl> $line = $im>slice(':,(2)') pdl> $line = zeroes(5); pdl> $line++; pdl> p $im [ [ 1 2 3 4 5] [ 6 7 8 9 10] [13 14 15 16 17] [16 17 18 19 20] [21 22 23 24 25] ] pdl> p $line [1 1 1 1 1]
But using ".="
pdl> $line = $im>slice(':,(2)') pdl> $line .= zeroes(5) pdl> $line++ pdl> p $im [ [ 1 2 3 4 5] [ 6 7 8 9 10] [ 1 1 1 1 1] [16 17 18 19 20] [21 22 23 24 25] ] pdl> print $line [1 1 1 1 1]
Also, you can substitute
pdl> $line .= 0;
for the assignment above (the zero is converted to a scalar piddle, with no dimensions so it can be assigned to any piddle).
A nice feature in recent perl versions is lvalue subroutines (i.e., versions 5.6.x and higher including all perls currently supported by
pdl> $im>slice(':,(2)') .= zeroes(5)>xvals>float
Related to the lvalue sub assignment feature is a little trap for the unwary: recent perls introduced a ``feature'' which breaks
pdl> ($pdl = $im>slice(':,(2)')) .= zeroes(5)>xvals>float
which works both with and without the debugger but is arguably clumsy and awkward to read.
Note that there can be a problem with assignments like this when lvalue and rvalue piddles refer to overlapping portions of data in the parent piddle:
# revert the elements of the first line of $a ($tmp = $a>slice(':,(1)')) .= $a>slice('1:0,(1)');
Currently, the parent data on the right side of the assignments is not copied before the (internal) assignment loop proceeds. Therefore, the outcome of this assignment will depend on the sequence in which elements are assigned and almost certainly not do what you wanted. So the semantics are currently undefined for now and liable to change anytime. To obtain the desired behaviour, use
($tmp = $a>slice(':,(1)')) .= $a>slice('1:0,(1)')>copy;
which makes a physical copy of the slice or
($tmp = $a>slice(':,(1)')) .= $a>slice('1:0,(1)')>sever;
which returns the same slice but severs the connection of the slice to its parent.
Other functions that manipulate dimensions
Having talked extensively about the slice function it should be noted that this is not the only dummy

inserts a dummy dimension of the size you specify (default 1) at the
chosen location. You can't wait to hear how that is achieved? Well,
all elements with index "(X,x,Y)" ("0<=x<size_of_dummy_dim") just map to
the element with index "(X,Y)" of the parent piddle (where "X" and "Y" refer to
the group of indices before and after the location where the dummy
dimension was inserted.)
This example calculates the x coordinate of the centroid of an image (later we will learn that we didn't actually need the dummy dimension thanks to the magic of implicit threading; but using dummy dimensions the code would also work in a threadless world; though once you have worked with
PDLthreads you wouldn't want to live without them again).# centroid ($xd,$yd) = $im>dims; $xc = sum($im*xvals(zeroes($xd))>dummy(1,$yd))/sum($im);
Let's explain how that works in a little more detail. First, the product:
$xvs = xvals(zeroes($xd)); print $xvs>dummy(1,$yd); # repeat the line $yd times $prod = $im*xvs>dummy(1,$yd); # form the pixelwise product with # the repeated line of xvalues
The rest is then summing the results of the pixelwise product together and normalizing with the sum of all pixel values in the original image thereby calculating the xcoordinate of the ``center of mass'' of the image (interpreting pixel values as local mass) which is known as the centroid of an image.
Next is a (from the point of view of memory consumption) very cheap conversion from greyscale to
RGB,i.e. every pixel holds now a triple of values instead of a scalar. The three values in the triple are, fortunately, all the same for a grey image, so that our trick works well in that it maps all the three members of the triple to the same source element:# a cheap greyscale to RGB conversion $rgb = $grey>dummy(0,3)
Unfortunately this trick cannot be used to convert your old B/W photos to color ones in the way you'd like. :(
Note that the memory usage of piddles with dummy dimensions is especially sensitive to the internal representation. If the piddle can be represented as a virtual affine (``vaffine'') piddle, only the control structures are stored. But if $b in
$a = zeroes(10000); $b = $a>dummy(1,10000);
is made physical by some routine, you will find that the memory usage of your program has suddenly grown by 100Mb.
 diagonal

replaces two dimensions (which have to be of equal size) by one
dimension that references all the elements along the ``diagonal'' along
those two dimensions. Here, we have two examples which should appear
familiar to anyone who has ever done some linear algebra. Firstly,
make a unity matrix:
# unity matrix $e = zeroes(float, 3, 3); # make everything zero ($tmp = $e>diagonal(0,1)) .= 1; # set the elements along the diagonal to 1 print $e;
Or the other diagonal:
($tmp = $e>slice(':1:0')>diagonal(0,1)) .= 2; print $e;
(Did you notice how we used the slice function to revert the sequence of lines before setting the diagonal of the new child, thereby setting the cross diagonal of the parent ?) Or a mapping from the space of diagonal matrices to the field over which the matrices are defined, the trace of a matrix:
# trace of a matrix $trace = sum($mat>diagonal(0,1)); # sum all the diagonal elements
 xchg and mv

xchg exchanges or ``transposes'' the two specified dimensions.
A straightforward example:
# transpose a matrix (without explicitly reshuffling data and # making a copy) $prod = $a x $a>xchg(0,1);
$prod should now be pretty close to the unity matrix if $a is an orthogonal matrix. Often "xchg" will be used in the context of threading but more about that later.
mv works in a similar fashion. It moves a dimension (specified by its number in the parent) to a new position in the new child piddle:
$b = $a>mv(4,0); # make the 5th dimension of $a the first in the # new child $b
The difference between "xchg" and "mv" is that "xchg" only changes the position of two dimensions with each other, whereas "mv" inserts the first dimension to the place of second, moving the other dimensions around accordingly.
 clump

collapses several dimensions into one. Its only argument specifies how
many dimensions of the source piddle should be collapsed (starting from
the first). An (admittedly unrealistic) example is a 3D piddle which
holds data from a stack of image files that you have just read
in. However, the data from each image really represents a 1D time
series and has only been arranged that way because it was digitized
with a frame grabber. So to have it again as an array of time
sequences you say
pdl> $seqs = $stack>clump(2) pdl> help vars PDL variables in package main:: Name Type Dimension Flow State Mem  $seqs Double D [8000,50] C 0.00Kb $stack Double D [100,80,50] P 3.05Mb
Unrealistic as it may seem, our confocal microscope software writes data (sometimes) this way. But more often you use clump to achieve a certain effect when using implicit or explicit threading.
Calls to indexing functions can be chained
As you might have noticed in some of the examples above calls to the indexing functions can be nicely chained since all of these functions return a newly created child object. However, when doing extensive index manipulations in a chain be sure to keep track of what you are doing, e.g.
$a>xchg(0,1)>mv(0,4)
moves the dimension 1 of $a to position 4 since when the second command is executed the original dimension 1 has been moved to position 0 of the new child that calls the "mv" function. I think you get the idea (in spite of my convoluted explanations).
Propagated assignments ('.=') and dummy dimensions
A subtlety related to indexing is the assignment to piddles containing dummy dimensions of size greater than 1. These assignments (using ".=") are forbidden since several elements of the lvalue piddle point to the same element of the parent. As a consequence the value of those parent elements are potentially ambiguous and would depend on the sequence in which the implementation makes the assignments to elements. Therefore, an assignment like this:
$a = pdl [1,2,3]; $b = $a>dummy(1,4); $b .= yvals(zeroes(3,4));
can produce unexpected results and the results are explicitly undefined by
From the point of view of dataflow the introduction of greatersizethanone dummy dimensions is regarded as an irreversible transformation (similar to the terminology in thermodynamics) which precludes backward propagation of assignment to a parent (which you had explicitly requested using the ".=" assignment). A similar problem to watch out for occurs in the context of threading where sometimes dummy dimensions are created implicitly during the thread loop (see below).
Reasons for the parent/child (or pointer) concept
[ this will have to wait a bit ]
XXXXX being memory efficient XXXXX in the context of threading XXXXX very flexible and powerful way of accessing portions of piddle data (in much more general way than sec, etc allow) XXXXX efficient implementation XXXXX difference to section/at, etc.
How to make things physical again
[
** When needed (xsub routine interfacing C lib function) ** How achieved (>physical) ** How to test (isphysical (explain how it works currently)) ** >copy and >sever
Threading
In the previous paragraph on indexing we have already mentioned the term occasionally but now its really time to talk explicitly about ``threading'' with piddles. The term threading has many different meanings in different fields of computing. Within the framework ofImplicit threading  a first example
There are two slightly different variants of threading. We start with what we call ``implicit threading''. Let's pick a practical example that involves looping of a function over many elements of a piddle. Suppose we have an
greyvalue = 77/256*r + 150/256*g + 29/256*b = inner([77,150,29]/256, [r,g,b])
where the last form indicates that we can write this as an inner product of the 3vector comprising the weights for red, green and blue components with the 3vector containing the color components. Traditionally, we might have written a function like the following to process the whole image:
my @dims=$im>dims; # here normally check that first dim has correct size (3), etc $grey=zeroes(@dims[1,2]); # make the piddle for the resulting grey image $w = pdl [77,150,29] / 256; # the vector of weights for ($j=0;$j<dims[2];$j++) { for ($i=0;$i<dims[1];$i++) { # compute the pixel value $tmp = inner($w,$im>slice(':,(i),(j)')); set($grey,$i,$j,$tmp); # and set it in the greyscale image } }
Now we write the same using threading (noting that "inner" is a threading aware function defined in the PDL::Primitive package)
$grey = inner($im,pdl([77,150,29]/256));
We have ended up with a oneliner that automatically creates the piddle $grey with the right number and size of dimensions and performs the loops automatically (these loops are implemented as fast C code in the internals of
How does the example work ?
The first thing to note is that every function that is threading aware (these are without exception functions compiled from concise descriptions byHere is the point when threading finally enters the game. If you call PPfunctions with piddles that have more than the required core dimensions the first dimensions of the piddle arguments are used as the core dimensions and the additional extra dimensions are threaded over. Let us demonstrate this first with our example above
$grey = inner($im,$w); # w is the weight vector from above
In this case $w is 1D and so supplied just the core dimension, $im is 3D, more specifically "(3,x,y)". The first dimension (of size 3) is the required core dimension that matches (as required by inner) the first (and only) dimension of $w. The second dimension is the first thread dimension (of size "x") and the third is here the second thread dimension (of size "y"). The output piddle is automatically created (as requested by setting $grey to ``null'' prior to invocation). The output dimensions are obtained by appending the loop dimensions (here "(x,y)") to the core output dimensions (here 0D) to yield the final dimensions of the autocreated piddle (here "0D+2D=2D" to yield a 2D output of size "(x,y)").
So the above command calls the core functionality that computes the inner product of two 1D vectors "x*y" times with $w and all 1D slices of the form "(':,(i),(j)')" of $im and sets the respective elements of the output piddle "$grey(i,j)" to the result of each computation. We could write that symbolically as
$grey(0,0) = f($w,$im(:,(0),(0))) $grey(1,0) = f($w,$im(:,(1),(0))) . . . $grey(x2,y1) = f($w,$im(:,(x2),(y1))) $grey(x1,y1) = f($w,$im(:,(x1),(y1)))
But this is done automatically by
When even more piddles and extra dimensions are involved things get a bit more complicated. We will first give the general rules how the thread dimensions depend on the dimensions of input piddles enabling you to figure out the dimensionality of an autocreated output piddle (for any given set of input piddles and core dimensions of the PPfunction in question). The general rules will most likely appear a bit confusing on first sight so that we'll set out to illustrate the usage with a set of further examples (which will hopefully also demonstrate that there are indeed many practical situations where threading comes in extremely handy).
A call for coding discipline
Before we point out the other technical details of threading, please note this call for programming discipline when using threading:In order to preserve human readability,
As a warning, look at this undocumented function and try to guess what might be going on:
sub lookup { my ($im,$palette) = @_; my $res; index($palette>xchg(0,1), $im>long>dummy(0,($palette>dim)[0]), ($res=null)); return $res; }
Would you agree that it might be difficult to figure out expected dimensions, purpose of the routine, etc ? (If you want to find out what this piece of code does, see below)
How to figure out the loop dimensions
There are a couple of rules that allow you to figure out number and size of loop dimensions (and if the size of your input piddles comply with the threading rules). Dimensions of any piddle argument are broken down into two groups in the following: Core dimensions (as defined by the PPfunction, see Appendix B for a list of R0
 Core dimensions are identified with the first N dimensions of the respective piddle argument (and are required). Any further dimensions are extra dimensions and used to determine the loop dimensions.
 R1
 The number of (implicit) loop dimensions is equal to the maximal number of extra dimensions taken over the set of piddle arguments.
 R2
 The size of each of the loop dimensions is derived from the size of the respective dimensions of the piddle arguments. The size of a loop dimension is given by the maximal size found in any of the piddles having this extra dimension.
 R3
 For all piddles that have a given extra dimension the size must be equal to the size of the loop dimension (as determined by the previous rule) or 1; otherwise you raise a runtime exception. If the size of the extra dimension in a piddle is one it is implicitly treated as a dummy dimension of size equal to that loop dim size when performing the thread loop.
 R4
 If a piddle doesn't have a loop dimension, in the thread loop this piddle is treated as if having a dummy dimension of size equal to the size of that loop dimension.
 R5
 If output autocreation is used (by setting the relevant piddle to "PDL>null" before invocation) the number of dimensions of the created piddle is equal to the sum of the number of core output dimensions + number of loop dimensions. The size of the core output dimensions is derived from the relevant dimension of input piddles (as specified in the function definition) and the sizes of the other dimensions are equal to the size of the loop dimension it is derived from. The automatically created piddle will be physical (unless dataflow is in operation).
In this context, note that you can run into the problem with assignment to piddles containing greaterthanone dummy dimensions (see above). Although your output piddle(s) didn't contain any dummy dimensions in the first place they may end up with implicitly created dummy dimensions according to R4.
As an example, suppose we have a (here unspecified) PPfunction with the signature:
func((m,n),(m,n,o),(m),[o](m,o))
and you call it with 3 piddles "a(5,3,10,11)", "b(5,3,2,10,1,12)", and "c(5,1,11,12)" as
func($a,$b,$c,($d=null))
then the number of loop dimensions is 3 (by "R0+R1" from $b and $c) with sizes "(10,11,12)" (by R2); the two output core dimensions are "(5,2)" (from the signature of func) resulting in a 5dimensional output piddle $c of size "(5,2,10,11,12)" (see R5) and (the automatically created) $d is derived from "($a,$b,$c)" in a way that can be expressed in pdl pseudocode as
$d(:,:,i,j,k) .= func($a(:,:,i,j),$b(:,:,:,i,0,k),$c(:,0,j,k)) with 0<=i<10, 0<=j<=11, 0<=k<12
If we analyze the color to greyscale conversion again with these rules in mind we note another great advantage of implicit threading. We can call the conversion with a piddle representing a pixel (im(3)), a line of rgb pixels ("im(3,x)"), a proper color image ("im(3,x,y)") or a whole stack of
Let's fill these general rules with some more life by going through a couple of further examples. The reader may try to figure out equivalent formulations with explicit forlooping and compare the flexibility of those routines using implicit threading to the explicit formulation. Furthermore, especially when using several thread dimensions it is a useful exercise to check the relative speed by doing some benchmark tests (which we still have to do).
First in the row is a slightly reworked centroid example, now coded with threading in mind.
# threaded mult to calculate centroid coords, works for stacks as well $xc = sumover(($im*xvals(($im>dims)[0]))>clump(2)) / sumover($im>clump(2));
Let's analyze what's going on step by step. First the product:
$prod = $im*xvals(zeroes(($im>dims)[0]))
This will actually work for $im being one, two, three, and higher dimensional. If $im is onedimensional it's just an ordinary product (in the sense that every element of $im is multiplied with the respective element of "xvals(...)"), if $im has more dimensions further threading is done by adding appropriate dummy dimensions to "xvals(...)" according to R4. More importantly, the two sumover operations show a first example of how to make use of the dimension manipulating commands. A quick look at sumover's signature will remind you that it will only ``gobble up'' the first dimension of a given input piddle. But what if we want to really compute the sum over all elements of the first two dimensions? Well, nothing keeps us from passing a virtual piddle into sumover which in this case is formed by clumping the first two dimensions of the ``parent piddle'' into one. From the point of view of the parent piddle the sum is now computed over the first two dimensions, just as we wanted, though sumover has just done the job as specified by its signature. Got it ?
Another little finesse of writing the code like that: we intentionally used "sumover($pdl>clump(2))" instead of "sum($pdl)" so that we can either pass just an image "(x,y)" or a stack of images "(x,y,t)" into this routine and get either just one xcoordiante or a vector of xcoordinates (of size t) in return.
Another set of common operations are what one could call ``projection operations''. These operations take a ND piddle as input and return a (N1)D ``projected'' piddle. These operations are often performed with functions like sumover, prodover, minimum and maximum. Using again images as examples we might want to calculate the maximum pixel value for each line of an image or image stack. We know how to do that
# maxima of lines (as function of line number and time) maximum($stack,($ret=null));
But what if you want to calculate maxima per column when implicit threading always applies the core functionality to the first dimension and threads over all others? How can we achieve that instead the core functionality is applied to the second dimension and threading is done over the others. Can you guess it? Yes, we make a virtual piddle that has the second dimension of the ``parent piddle'' as its first dimension using the "mv" command.
# maxima of columns (as function of column number and time) maximum($stack>mv(1,0),($ret=null));
and calculating all the sums of subslices over the third dimension is now almost too easy
# sums of pixels in time (assuming time is the third dim) sumover($stack>mv(2,0),($ret=null));
Finally, if you want to apply the operation to all elements (like max over all elements or sum over all elements) regardless of the dimensions of the piddle in question "clump" comes in handy. As an example look at the definition of "sum" (as defined in "Ufunc.pm"):
sub sum { PDL::Ufunc::sumover($name>clump(1),($tmp=null)); return $tmp>at(); # return a Perl number, not a 0D piddle }
We have already mentioned that all basic operations support threading and assignment is no exception. So here are a couple of threaded assignments
pdl> $im = zeroes(byte, 10,20) pdl> $line = exp(rvals(10)**2/9) # threaded assignment pdl> $im .= $line # set every line of $im to $line pdl> $im2 .= 5 # set every element of $im2 to 5
By now you probably see how it works and what it does, don't you?
To finish the examples in this paragraph here is a function to create an
# a threaded index lookup to generate an RGB, or RGBA or YMCK image # from a palette image (represented by a lookup table $palette and # an colorindex image $im) # you can say just dummy(0) since the rules of threading make it fit pdl> index($palette>xchg(0,1), $im>long>dummy(0,($palette>dim)[0]), ($res=null));
Let's go through it and explain the steps involved. Assuming we are dealing with an
[palette(0,4),palette(1,4),palette(2,4)]
for the new red, green and blue components of the output image. Hopefully by now you have some sort of idea what the above piece of code is supposed to do (it is often actually quite complicated to describe in detail how a piece of threading code works; just go ahead and experiment a bit to get a better feeling for it).
If you have read the threading rules carefully, then you might have noticed that we didn't have to explicitly state the size of the dummy dimension that we created for $im; when we create it with size 1 (the default) the rules of threading make it automatically fit to the desired size (by rule R3, in our example the size would be 3 assuming a palette of size "(3,x)"). Since situations like this do occur often in practice this is actually why rule R3 has been introduced (the part that makes dimensions of size 1 fit to the thread loop dim size). So we can just say
pdl> index($palette>xchg(0,1),$im>long>dummy(0),($res=null));
Again, you can convince yourself that this routine will create the right output if called with a pixel ($im is 0D), a line ($im is 1D), an image ($im is 2D), ..., an
To wrap it all up once again, the general idea is as follows. If you want to achieve looping over certain dimensions and have the core functionality applied to another specified set of dimensions you use the dimension manipulating commands to create a (or several) virtual piddle(s) so that from the point of view of the parent piddle(s) you get what you want (always having the signature of the function in question and R1R5 in mind!). Easy, isn't it ?
Output autocreation and PPfunction calling conventions
At this point we have to divert to some technical detail that has to do with the general calling conventions of PPfunctions and the automatic creation of output arguments. Basically, there are two ways of invoking
$result = func($a,$b);
and
func($a,$b,$result);
If you are only using implicit threading then the output variable can be automatically created by
func($a,$b,($result=PDL>null));
or
$result = func($a,$b)
which are exactly equivalent.
Be warned that you can not use output autocreation when using explicit threading (for reasons explained in the following section on explicit threading, the second variant of threading).
In ``tight'' loops you probably want to avoid the implicit creation of a temporary piddle in each step of the loop that comes along with the ``functional'' style but rather say
# create output piddle of appropriate size only at first invocation $result = null; for (0...$n) { func($a,$b,$result); # in all but the first invocation $result func2($b); # is defined and has the right size to # take the output provided $b's dims don't change twiddle($result,$a); # do something from $result to $a for iteration }
The takehome message of this section once more: be aware of the limitation on output creation when using explicit threading.
Explicit threading
Having so far only talked about the first flavour of threading it is now about time to introduce the second variant. Instead of shuffling around dimensions all the time and relying on the rules of implicit threading to get it all right you sometimes might want to specify in a more explicit way how to perform the thread loop. It is probably not too surprising that this variant of the game is called explicit threading. Now, before we create the wrong impression: it is not either implicit or explicit; the two flavours do mix. But more about that later.The two most used functions with explicit threading are thread and unthread. We start with an example that illustrates typical usage of the former:
[ # ** this is the worst possible example to start with ] # but can be used to show that $mat += $line is different from # $mat>thread(0) += $line # explicit threading to add a vector to each column of a matrix pdl> $mat = zeroes(4,3) pdl> $line = pdl (3.1416,2,2) pdl> ($tmp = $mat>thread(0)) += $line
In this example, "$mat>thread(0)" tells
for (j=0; j<3; j++) { for (i=0; i<4; i++) { mat(i,j) += src(j); } }
"thread" takes a list of numbers as arguments which explicitly specify which dimensions to thread over first. With the introduction of explicit threading the dimensions of a piddle are conceptually split into three different groups the latter two of which we have already encountered: thread dimensions, core dimensions and extra dimensions.
Conceptually, it is best to think of those dimensions of a piddle that have been specified in a call to "thread" as being taken away from the set of normal dimensions and put on a separate stack. So assuming we have a piddle "a(4,7,2,8)" saying
$b = $a>thread(2,1)
creates a new virtual piddle of dimension "b(4,8)" (which we call the remaining dims) that also has 2 thread dimensions of size "(2,7)". For the purposes of this document we write that symbolically as "b(4,8){2,7}". An important difference to the previous examples where only implicit threading was used is the fact that the core dimensions are matched against the remaining dimensions which are not necessarily the first dimensions of the piddle. We will now specify how the presence of thread dimensions changes the rules R1R5 for thread loops (which apply to the special case where none of the piddle arguments has any thread dimensions).
 T0
 Core dimensions are matched against the first n remaining dimensions of the piddle argument (note the difference to R1). Any further remaining dimensions are extra dimensions and are used to determine the implicit loop dimensions.
 T1a
 The number of implicit loop dimensions is equal to the maximal number of extra dimensions taken over the set of piddle arguments.
 T1b
 The number of explicit loop dimensions is equal to the maximal number of thread dimensions taken over the set of piddle arguments.
 T1c
 The total number of loop dimensions is equal to the sum of explicit loop dimensions and implicit loop dimensions. In the thread loop, explicit loop dimensions are threaded over first followed by implicit loop dimensions.
 T2
 The size of each of the loop dimensions is derived from the size of the respective dimensions of the piddle arguments. It is given by the maximal size found in any piddles having this thread dimension (for explicit loop dimensions) or extra dimension (for implicit loop dimensions).
 T3
 This rule applies to any explicit loop dimension as well as any implicit loop dimension. For all piddles that have a given thread/extra dimension the size must be equal to the size of the respective explicit/implicit loop dimension or 1; otherwise you raise a runtime exception. If the size of a thread/extra dimension of a piddle is one it is implicitly treated as a dummy dimension of size equal to the explicit/implicit loop dimension.
 T4
 If a piddle doesn't have a thread/extra dimension that corresponds to an explicit/implicit loop dimension, in the thread loop this piddle is treated as if having a dummy dimension of size equal to the size of that loop dimension.
 T4a
 All piddles that do have thread dimensions must have the same number of thread dimensions.
 T5
 Output autocreation cannot be used if any of the piddle arguments has any thread dimensions. Otherwise R5 applies.
The same restrictions apply with regard to implicit dummy dimensions (created by application of T4) as already mentioned in the section on implicit threading: if any of the output piddles has an (explicit or implicitly created) greaterthanone dummy dimension a runtime exception will be raised.
Let us demonstrate these rules at work in a generic case. Suppose we have a (here unspecified) PPfunction with the signature:
func((m,n),(m),(),[o](m))
and you call it with 3 piddles "a(5,3,10,11)", "b(3,5,10,1,12)", "c(10)" and an output piddle "d(3,11,5,10,12)" (which can here not be automatically created) as
func($a>thread(1,3),$b>thread(0,3),$c,$d>thread(0,1))
From the signature of func and the above call the piddles split into the following groups of core, extra and thread dimensions (written in the form "pdl(core dims){thread dims}[extra dims]"):
a(5,10){3,11}[] b(5){3,1}[10,12] c(){}[10] d(5){3,11}[10,12]
With this to help us along (it is in general helpful to write the arguments down like this when you start playing with threading and want to keep track of what is going on) we further deduce that the number of explicit loop dimensions is 2 (by T1b from $a and $b) with sizes "(3,11)" (by T2); 2 implicit loop dimensions (by T1a from $b and $d) of size "(10,12)" (by T2) and the elements of are computed from the input piddles in a way that can be expressed in pdl pseudocode as
for (l=0;l<12;l++) for (k=0;k<10;k++) for (j=0;j<11;j++) effect of treating it as dummy dim (index j) for (i=0;i<3;i++)  d(i,j,:,k,l) = func(a(:,i,:,j),b(i,:,k,0,l),c(k))
Ugh, this example was really not easy in terms of bookkeeping. It serves mostly as an example how to figure out what's going on when you encounter a complicated looking expression. But now it is really time to show that threading is useful by giving some more of our so called ``practical'' examples.
[ The following examples will need some additional explanations in the future. For the moment please try to live with the comments in the code fragments. ]
Example 1:
*** inverse of matrix represented by eigvecs and eigvals ** given a symmetrical matrix M = A^T x diag(lambda_i) x A ** => inverse M^1 = A^T x diag(1/lambda_i) x A ** first $tmp = diag(1/lambda_i)*A ** then A^T * $tmp by threaded inner product # index handling so that matrices print correct under pdl $inv .= $evecs*0; # just copy to get appropriately sized output $tmp .= $evecs; # initialise, no backpropagation ($tmp2 = $tmp>thread(0)) /= $evals; # threaded division # and now a matrix multiplication in disguise PDL::Primitive::inner($evecs>xchg(0,1)>thread(1,1), $tmp>thread(0,1), $inv>thread(0,1)); # alternative for matrix mult using implicit threading, # first xchg only for transpose PDL::Primitive::inner($evecs>xchg(0,1)>dummy(1), $tmp>xchg(0,1)>dummy(2), ($inv=null));
Example 2:
# outer product by threaded multiplication # stress that we need to do it with explicit call to my_biop1 # when using explicit threading $res=zeroes(($a>dims)[0],($b>dims)[0]); my_biop1($a>thread(0,1),$b>thread(1,0),$res>(0,1),"*"); # similar thing by implicit threading with autocreated piddle $res = $a>dummy(1) * $b>dummy(0);
Example 3:
# different use of thread and unthread to shuffle a number of # dimensions in one go without lots of calls to >xchg and >mv # use thread/unthread to shuffle dimensions around # just try it out and compare the child piddle with its parent $trans = $a>thread(4,1,0,3,2)>unthread;
Example 4:
# calculate a couple of bounding boxes # $bb will hold BB as [xmin,xmax],[ymin,ymax],[zmin,zmax] # we use again thread and unthread to shuffle dimensions around pdl> $bb = zeroes(double, 2,3 ); pdl> minimum($vertices>thread(0)>clump>unthread(1), $bb>slice('(0),:')); pdl> maximum($vertices>thread(0)>clump>unthread(1), $bb>slice('(1),:'));
Example 5:
# calculate a selfrationed (i.e. self normalized) sequence of images # uses explicit threading and an implicitly threaded division $stack = read_image_stack(); # calculate the average (per pixel average) of the first $n+1 images $aver = zeroes([stack>dims]>[0,1]); # make the output piddle sumover($stack>slice(":,:,0:$n")>thread(0,1),$aver); $aver /= ($n+1); $stack /= $aver; # normalize the stack by doing a threaded division # implicit versus explicit # alternatively calculate $aver with implicit threading and autocreation sumover($stack>slice(":,:,0:$n")>mv(2,0),($aver=null)); $aver /= ($n+1); #
Implicit versus explicit threading
In this paragraph we are going to illustrate when explicit threading is preferable over implicit threading and vice versa. But then again, this is probably not the best way of putting the case since you already know: the two flavours do mix. So, it's more about how to get the best of both worlds and, anyway, in the best of Perl traditions:[ Sorry, this still has to be filled in in a later release; either refer to above examples or choose some new ones ]
Finally, this may be a good place to justify all the technical detail we have been going on about for a couple of pages: why threading ?
Well, code that uses threading should be (considerably) faster than code that uses explicit forloops (or similar Perl constructs) to achieve the same functionality. Especially on supercomputers (with vector computing facilities/parallel processing)
PDL::PP
An easy way to define functions that are aware of indexing and threading (and the universe and everything)
Appendix A
Affine transformations  a special class of simple and powerful transformations
[ This is also something to be added in future releases. Do we already have the general make_affine routine inAppendix B
signatures of standard PDL::PP compiled functions
A selection of signatures of
# functions in primitive.pd # sumover ((n),[o]()) prodover ((n),[o]()) axisvalues ((n)) inplace inner ((n),(n),[o]()) outer ((n),(m),[o](n,m)) innerwt ((n),(n),(n),[o]()) inner2 ((m),(m,n),(n),[o]()) inner2t ((j,n),(n,m),(m,k),[o]()) index (1D,0D,[o]) minimum (1D,[o]) maximum (1D,[o]) wstat ((n),(n),(),[o],()) assgn ((),()) # basic operations binary operations ((),(),[o]()) unary operations ((),[o]())