PDL::Threading (1)
Leading comments
Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) Standard preamble: ========================================================================
NAME
PDL::Threading - Tutorial for PDL's Threading featureINTRODUCTION
One of the most powerful features ofOther vector based languages, such as
TERMINOLOGY: PIDDLE
A piddle consists of a series of numbers organized as an N-dimensional data set. Piddles provide efficient storage and fast computation of large N-dimensional matrices. They are highly optimized for numerical work.
THINKING IN TERMS OF THREADING
If you have used
% pdl2 perlDL shell v1.352 ... ReadLines, NiceSlice, MultiLines enabled ... Note: AutoLoader not enabled ('use PDL::AutoLoader' recommended) pdl>
In this example, NiceSlice was automatically enabled, but AutoLoader was not. To enable it, type "use PDL::AutoLoader".
Let's start with a two-dimensional piddle:
pdl> $a = sequence(11,9) pdl> p $a [ [ 0 1 2 3 4 5 6 7 8 9 10] [11 12 13 14 15 16 17 18 19 20 21] [22 23 24 25 26 27 28 29 30 31 32] [33 34 35 36 37 38 39 40 41 42 43] [44 45 46 47 48 49 50 51 52 53 54] [55 56 57 58 59 60 61 62 63 64 65] [66 67 68 69 70 71 72 73 74 75 76] [77 78 79 80 81 82 83 84 85 86 87] [88 89 90 91 92 93 94 95 96 97 98] ]
The "info" method gives you basic information about a piddle:
pdl> p $a->info PDL: Double D [11,9]
This tells us that $a is an 11 x 9 piddle composed of double precision numbers. If we wanted to add 3 to all elements in an "n x m" piddle, a traditional language would use two nested for-loops:
# Pseudo-code. Traditional way to add 3 to an array. for (x=0; x < n; x++) { for (y=0; y < m; y++) { a(x,y) = a(x,y) + 3 } }
Note: Notice that indices start at 0, as in Perl, C and Java (and unlike
But with
pdl> $b = $a + 3 pdl> p $b [ [ 3 4 5 6 7 8 9 10 11 12 13] [ 14 15 16 17 18 19 20 21 22 23 24] [ 25 26 27 28 29 30 31 32 33 34 35] [ 36 37 38 39 40 41 42 43 44 45 46] [ 47 48 49 50 51 52 53 54 55 56 57] [ 58 59 60 61 62 63 64 65 66 67 68] [ 69 70 71 72 73 74 75 76 77 78 79] [ 80 81 82 83 84 85 86 87 88 89 90] [ 91 92 93 94 95 96 97 98 99 100 101] ]
This is the simplest example of threading, and it is something that all numerical software tools do. The "+ 3" operation was automatically applied along two dimensions. Now suppose you want to to subtract a line from every row in $a:
pdl> $line = sequence(11) pdl> p $line [0 1 2 3 4 5 6 7 8 9 10] pdl> $c = $a - $line pdl> p $c [ [ 0 0 0 0 0 0 0 0 0 0 0] [11 11 11 11 11 11 11 11 11 11 11] [22 22 22 22 22 22 22 22 22 22 22] [33 33 33 33 33 33 33 33 33 33 33] [44 44 44 44 44 44 44 44 44 44 44] [55 55 55 55 55 55 55 55 55 55 55] [66 66 66 66 66 66 66 66 66 66 66] [77 77 77 77 77 77 77 77 77 77 77] [88 88 88 88 88 88 88 88 88 88 88] ]
Two things to note here: First, the value of $a is still the same. Try "p $a" to check. Second,
pdl> p $line->info => PDL: Double D [11] pdl> p $a->info => PDL: Double D [11,9] pdl> p $c->info => PDL: Double D [11,9]
So, both $a and $line have the same number of elements in the 0th dimension! What
What if you want to subtract $line from the first line in $a only? You can do that by specifying the line explicitly:
pdl> $a(:,0) -= $line pdl> p $a [ [ 0 0 0 0 0 0 0 0 0 0 0] [11 12 13 14 15 16 17 18 19 20 21] [22 23 24 25 26 27 28 29 30 31 32] [33 34 35 36 37 38 39 40 41 42 43] [44 45 46 47 48 49 50 51 52 53 54] [55 56 57 58 59 60 61 62 63 64 65] [66 67 68 69 70 71 72 73 74 75 76] [77 78 79 80 81 82 83 84 85 86 87] [88 89 90 91 92 93 94 95 96 97 98] ]
See PDL::Indexing and PDL::NiceSlice to learn more about specifying subsets from piddles.
The true power of threading comes when you realise that the piddle can have any number of dimensions! Let's make a 4 dimensional piddle:
pdl> $piddle_4D = sequence(11,3,7,2) pdl> $c = $piddle_4D - $line
Now $c is a piddle of the same dimension as $piddle_4D.
pdl> p $piddle_4D->info => PDL: Double D [11,3,7,2] pdl> p $c->info => PDL: Double D [11,3,7,2]
This time
But, maybe you don't want to subtract from the rows (dimension 0), but from the columns (dimension 1). How do I subtract a column of numbers from each column in $a?
pdl> $cols = sequence(9) pdl> p $a->info => PDL: Double D [11,9] pdl> p $cols->info => PDL: Double D [9]
Naturally, we can't just type "$a - $cols". The dimensions don't match:
pdl> p $a - $cols PDL: PDL::Ops::minus(a,b,c): Parameter 'b' PDL: Mismatched implicit thread dimension 0: should be 11, is 9
How do we tell
MANIPULATING DIMENSIONS
There are many
xchg mv reorder
Method: xchg
The "xchg" method "exchanges" two dimensions in a piddle:
pdl> $a = sequence(6,7,8,9) pdl> $a_xchg = $a->xchg(0,3) pdl> p $a->info => PDL: Double D [6,7,8,9] pdl> p $a_xchg->info => PDL: Double D [9,7,8,6] | | V V (dim 0) (dim 3)
Notice that dimensions 0 and 3 were exchanged without affecting the other dimensions. Notice also that "xchg" does not alter $a. The original variable $a remains untouched.
Method: mv
The "mv" method "moves" one dimension, in a piddle, shifting other dimensions as necessary.
pdl> $a = sequence(6,7,8,9) (dim 0) pdl> $a_mv = $a->mv(0,3) | pdl> V _____ pdl> p $a->info => PDL: Double D [6,7,8,9] pdl> p $a_mv->info => PDL: Double D [7,8,9,6] ----- | V (dim 3)
Notice that when dimension 0 was moved to position 3, all the other dimensions had to be shifted as well. Notice also that "mv" does not alter $a. The original variable $a remains untouched.
Method: reorder
The "reorder" method is a generalization of the "xchg" and "mv" methods. It "reorders" the dimensions in any way you specify:
pdl> $a = sequence(6,7,8,9) pdl> $a_reorder = $a->reorder(3,0,2,1) pdl> pdl> p $a->info => PDL: Double D [6,7,8,9] pdl> p $a_reorder->info => PDL: Double D [9,6,8,7] | | | | V V v V dimensions: 0 1 2 3
Notice what happened. When we wrote "reorder(3,0,2,1)" we instructed
* Put dimension 3 first. * Put dimension 0 next. * Put dimension 2 next. * Put dimension 1 next.
When you use the "reorder" method, all the dimensions are shuffled. Notice that "reorder" does not alter $a. The original variable $a remains untouched.
GOTCHA: LINKING VS ASSIGNMENT
Linking
By default, piddles are linked together so that changes on one will go back and affect the original as well.
pdl> $a = sequence(4,5) pdl> $a_xchg = $a->xchg(1,0)
Here, $a_xchg is not a separate object. It is merely a different way of looking at $a. Any change in $a_xchg will appear in $a as well.
pdl> p $a [ [ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19] ] pdl> $a_xchg += 3 pdl> p $a [ [ 3 4 5 6] [ 7 8 9 10] [11 12 13 14] [15 16 17 18] [19 20 21 22] ]
Assignment
Some times, linking is not the behaviour you want. If you want to make the piddles independent, use the "copy" method:
pdl> $a = sequence(4,5) pdl> $a_xchg = $a->copy->xchg(1,0)
Now $a and $a_xchg are completely separate objects:
pdl> p $a [ [ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19] ] pdl> $a_xchg += 3 pdl> p $a [ [ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11] [12 13 14 15] [16 17 18 19] ] pdl> $a_xchg [ [ 3 7 11 15 19] [ 4 8 12 16 20] [ 5 9 13 17 21] [ 6 10 14 18 22] ]
PUTTING IT ALL TOGETHER
Now we are ready to solve the problem that motivated this whole discussion:
pdl> $a = sequence(11,9) pdl> $cols = sequence(9) pdl> pdl> p $a->info => PDL: Double D [11,9] pdl> p $cols->info => PDL: Double D [9]
How do we tell
pdl> p $a [ [ 0 1 2 3 4 5 6 7 8 9 10] [11 12 13 14 15 16 17 18 19 20 21] [22 23 24 25 26 27 28 29 30 31 32] [33 34 35 36 37 38 39 40 41 42 43] [44 45 46 47 48 49 50 51 52 53 54] [55 56 57 58 59 60 61 62 63 64 65] [66 67 68 69 70 71 72 73 74 75 76] [77 78 79 80 81 82 83 84 85 86 87] [88 89 90 91 92 93 94 95 96 97 98] ] pdl> $a->xchg(1,0) -= $cols pdl> p $a [ [ 0 1 2 3 4 5 6 7 8 9 10] [10 11 12 13 14 15 16 17 18 19 20] [20 21 22 23 24 25 26 27 28 29 30] [30 31 32 33 34 35 36 37 38 39 40] [40 41 42 43 44 45 46 47 48 49 50] [50 51 52 53 54 55 56 57 58 59 60] [60 61 62 63 64 65 66 67 68 69 70] [70 71 72 73 74 75 76 77 78 79 80] [80 81 82 83 84 85 86 87 88 89 90] ]
- General Strategy:
-
Move the dimensions you want to operate on to the start of your piddle's
dimension list. Then let PDLthread over the higher dimensions.
EXAMPLE: CONWAY'S GAME OF LIFE
Okay, enough theory. Let's do something a bit more interesting: We'll write Conway's Game of Life inThe Game of Life is a simulation run on a big two dimensional grid. Each cell in the grid can either be alive or dead (represented by 1 or 0). The next generation of cells in the grid is calculated with simple rules according to the number of living cells in it's immediate neighbourhood:
1) If an empty cell has exactly three neighbours, a living cell is generated.
2) If a living cell has less than two neighbours, it dies of overfeeding.
3) If a living cell has 4 or more neighbours, it dies from starvation.
Only the first generation of cells is determined by the programmer. After that, the simulation runs completely according to these rules. To calculate the next generation, you need to look at each cell in the 2D field (requiring two loops), calculate the number of live cells adjacent to this cell (requiring another two loops) and then fill the next generation grid.
Classical implementation
Here's a classic way of writing this program in Perl. We only use
#!/usr/bin/perl -w use PDL; use PDL::NiceSlice; # Make a board for the game of life. my $nx = 20; my $ny = 20; # Current generation. my $a = zeroes($nx, $ny); # Next generation. my $n = zeroes($nx, $ny); # Put in a simple glider. $a(1:3,1:3) .= pdl ( [1,1,1], [0,0,1], [0,1,0] ); for (my $i = 0; $i < 100; $i++) { $n = zeroes($nx, $ny); $new_a = $a->copy; for ($x = 0; $x < $nx; $x++) { for ($y = 0; $y < $ny; $y++) { # For each cell, look at the surrounding neighbours. for ($dx = -1; $dx <= 1; $dx++) { for ($dy = -1; $dy <= 1; $dy++) { $px = $x + $dx; $py = $y + $dy; # Wrap around at the edges. if ($px < 0) {$px = $nx-1}; if ($py < 0) {$py = $ny-1}; if ($px >= $nx) {$px = 0}; if ($py >= $ny) {$py = 0}; $n($x,$y) .= $n($x,$y) + $a($px,$py); } } # Do not count the central cell itself. $n($x,$y) -= $a($x,$y); # Work out if cell lives or dies: # Dead cell lives if n = 3 # Live cell dies if n is not 2 or 3 if ($a($x,$y) == 1) { if ($n($x,$y) < 2) {$new_a($x,$y) .= 0}; if ($n($x,$y) > 3) {$new_a($x,$y) .= 0}; } else { if ($n($x,$y) == 3) {$new_a($x,$y) .= 1} } } } print $a; $a = $new_a; }
If you run this, you will see a small glider crawl diagonally across the grid of zeroes. On my machine, it prints out a couple of generations per second.
Threaded PDL implementation
And here's the threaded version in
#!/usr/bin/perl -w use PDL; use PDL::NiceSlice; my $a = zeroes(20,20); # Put in a simple glider. $a(1:3,1:3) .= pdl ( [1,1,1], [0,0,1], [0,1,0] ); my $n; for (my $i = 0; $i < 100; $i++) { # Calculate the number of neighbours per cell. $n = $a->range(ndcoords($a)-1,3,"periodic")->reorder(2,3,0,1); $n = $n->sumover->sumover - $a; # Calculate the next generation. $a = ((($n == 2) + ($n == 3))* $a) + (($n==3) * !$a); print $a; }
The threaded
Classical => 32.79 seconds. Threaded => 0.41 seconds.
Explanation
How does the threaded version work?There are many
Method: "range"
At the simplest level, the "range" method is a different way to select a portion of a piddle. Instead of using the "$a(2,3)" notation, we use another piddle.
pdl> $a = sequence(6,7) pdl> p $a [ [ 0 1 2 3 4 5] [ 6 7 8 9 10 11] [12 13 14 15 16 17] [18 19 20 21 22 23] [24 25 26 27 28 29] [30 31 32 33 34 35] [36 37 38 39 40 41] ] pdl> p $a->range( pdl [1,2] ) 13 pdl> p $a(1,2) [ [13] ]
At this point, the "range" method looks very similar to a regular
pdl> $index = pdl [ [1,2],[2,3],[3,4],[4,5] ] pdl> p $a->range( $index ) [13 20 27 34]
Additionally, "range" takes a second parameter which determines the size of the chunk to return:
pdl> $size = 3 pdl> p $a->range( pdl([1,2]) , $size ) [ [13 14 15] [19 20 21] [25 26 27] ]
We can use this to select one or more 3x3 boxes.
Finally, "range" can take a third parameter called the ``boundary'' condition. It tells
pdl> p $a [ [ 0 1 2 3 4 5] [ 6 7 8 9 10 11] [12 13 14 15 16 17] [18 19 20 21 22 23] [24 25 26 27 28 29] [30 31 32 33 34 35] [36 37 38 39 40 41] ] pdl> $size = 3 pdl> p $a->range( pdl([4,2]) , $size , "periodic" ) [ [16 17 12] [22 23 18] [28 29 24] ] pdl> p $a->range( pdl([5,2]) , $size , "periodic" ) [ [17 12 13] [23 18 19] [29 24 25] ]
Notice how the box wraps around the boundary of the piddle.
Method: "ndcoords"
The "ndcoords" method is a convenience method that returns an enumerated list of coordinates suitable for use with the "range" method.
pdl> p $piddle = sequence(3,3) [ [0 1 2] [3 4 5] [6 7 8] ] pdl> p ndcoords($piddle) [ [ [0 0] [1 0] [2 0] ] [ [0 1] [1 1] [2 1] ] [ [0 2] [1 2] [2 2] ] ]
This can be a little hard to read. Basically it's saying that the coordinates for every element in $piddle is given by:
(0,0) (1,0) (2,0) (1,0) (1,1) (2,1) (2,0) (2,1) (2,2)
Combining "range" and "ndcoords"
What really matters is that "ndcoords" is designed to work together with "range", with no $size parameter, you get the same piddle back.
pdl> p $piddle [ [0 1 2] [3 4 5] [6 7 8] ] pdl> p $piddle->range( ndcoords($piddle) ) [ [0 1 2] [3 4 5] [6 7 8] ]
Why would this be useful? Because now we can ask for a series of ``boxes'' for the entire piddle. For example, 2x2 boxes:
pdl> p $piddle->range( ndcoords($piddle) , 2 , "periodic" )
The output of this function is difficult to read because the ``boxes'' along the last two dimension. We can make the result more readable by rearranging the dimensions:
pdl> p $piddle->range( ndcoords($piddle) , 2 , "periodic" )->reorder(2,3,0,1) [ [ [ [0 1] [3 4] ] [ [1 2] [4 5] ] ... ]
Here you can see more clearly that
[0 1] [3 4]
Is the 2x2 box starting with the (0,0) element of $piddle.
We are not done yet. For the game of life, we want 3x3 boxes from $a:
pdl> p $a [ [ 0 1 2 3 4 5] [ 6 7 8 9 10 11] [12 13 14 15 16 17] [18 19 20 21 22 23] [24 25 26 27 28 29] [30 31 32 33 34 35] [36 37 38 39 40 41] ] pdl> p $a->range( ndcoords($a) , 3 , "periodic" )->reorder(2,3,0,1) [ [ [ [ 0 1 2] [ 6 7 8] [12 13 14] ] ... ]
We can confirm that this is the 3x3 box starting with the (0,0) element of $a. But there is one problem. We actually want the 3x3 box to be centered on (0,0). That's not a problem. Just subtract 1 from all the coordinates in "ndcoords($a)". Remember that the ``periodic'' option takes care of making everything wrap around.
pdl> p $a->range( ndcoords($a) - 1 , 3 , "periodic" )->reorder(2,3,0,1) [ [ [ [41 36 37] [ 5 0 1] [11 6 7] ] [ [36 37 38] [ 0 1 2] [ 6 7 8] ] ...
Now we see a 3x3 box with the (0,0) element in the centre of the box.
Method: "sumover"
The "sumover" method adds along only the first dimension. If we apply it twice, we will be adding all the elements of each 3x3 box.
pdl> $n = $a->range(ndcoords($a)-1,3,"periodic")->reorder(2,3,0,1) pdl> p $n [ [ [ [41 36 37] [ 5 0 1] [11 6 7] ] [ [36 37 38] [ 0 1 2] [ 6 7 8] ] ... pdl> p $n->sumover->sumover [ [144 135 144 153 162 153] [ 72 63 72 81 90 81] [126 117 126 135 144 135] [180 171 180 189 198 189] [234 225 234 243 252 243] [288 279 288 297 306 297] [216 207 216 225 234 225] ]
Use a calculator to confirm that 144 is the sum of all the elements in the first 3x3 box and 135 is the sum of all the elements in the second 3x3 box.
Counting neighbours
We are almost there!
Adding up all the elements in a 3x3 box is not quite what we want. We don't want to count the center box. Fortunately, this is an easy fix:
pdl> p $n->sumover->sumover - $a [ [144 134 142 150 158 148] [ 66 56 64 72 80 70] [114 104 112 120 128 118] [162 152 160 168 176 166] [210 200 208 216 224 214] [258 248 256 264 272 262] [180 170 178 186 194 184] ]
When applied to Conway's Game of Life, this will tell us how many living neighbours each cell has:
pdl> $a = zeroes(10,10) pdl> $a(1:3,1:3) .= pdl ( [1,1,1], ..( > [0,0,1], ..( > [0,1,0] ) pdl> p $a [ [0 0 0 0 0 0 0 0 0 0] [0 1 1 1 0 0 0 0 0 0] [0 0 0 1 0 0 0 0 0 0] [0 0 1 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ] pdl> $n = $a->range(ndcoords($a)-1,3,"periodic")->reorder(2,3,0,1) pdl> $n = $n->sumover->sumover - $a pdl> p $n [ [1 2 3 2 1 0 0 0 0 0] [1 1 3 2 2 0 0 0 0 0] [1 3 5 3 2 0 0 0 0 0] [0 1 1 2 1 0 0 0 0 0] [0 1 1 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ]
For example, this tells us that cell (0,0) has 1 living neighbour, while cell (2,2) has 5 living neighbours.
Calculating the next generation
At this point, the variable $n has the number of living neighbours for every cell. Now we apply the rules of the game of life to calculate the next generation.
- If an empty cell has exactly three neighbours, a living cell is generated.
-
Get a list of cells that have exactly three neighbours:
pdl> p ($n == 3) [ [0 0 1 0 0 0 0 0 0 0] [0 0 1 0 0 0 0 0 0 0] [0 1 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ]
Get a list of empty cells that have exactly three neighbours:
pdl> p ($n == 3) * !$a
- If a living cell has less than 2 or more than 3 neighbours, it dies.
-
Get a list of cells that have exactly 2 or 3 neighbours:
pdl> p (($n == 2) + ($n == 3)) [ [0 1 1 1 0 0 0 0 0 0] [0 0 1 1 1 0 0 0 0 0] [0 1 0 1 1 0 0 0 0 0] [0 0 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ]
Get a list of living cells that have exactly 2 or 3 neighbours:
pdl> p (($n == 2) + ($n == 3)) * $a
Putting it all together, the next generation is:
pdl> $a = ((($n == 2) + ($n == 3)) * $a) + (($n == 3) * !$a) pdl> p $a [ [0 0 1 0 0 0 0 0 0 0] [0 0 1 1 0 0 0 0 0 0] [0 1 0 1 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] ]
Bonus feature: Graphics!
If you have PDL::Graphics::TriD installed, you can make a graphical version of the program by just changing three lines:
#!/usr/bin/perl use PDL; use PDL::NiceSlice; use PDL::Graphics::TriD; my $a = zeroes(20,20); # Put in a simple glider. $a(1:3,1:3) .= pdl ( [1,1,1], [0,0,1], [0,1,0] ); my $n; for (my $i = 0; $i < 100; $i++) { # Calculate the number of neighbours per cell. $n = $a->range(ndcoords($a)-1,3,"periodic")->reorder(2,3,0,1); $n = $n->sumover->sumover - $a; # Calculate the next generation. $a = ((($n == 2) + ($n == 3))* $a) + (($n==3) * !$a); # Display. nokeeptwiddling3d(); imagrgb [$a]; }
But if we really want to see something interesting, we should make a few more changes:
1) Start with a random collection of 1's and 0's.
2) Make the grid larger.
3) Add a small timeout so we can see the game evolve better.
4) Use a while loop so that the program can run as long as it needs to.
#!/usr/bin/perl use PDL; use PDL::NiceSlice; use PDL::Graphics::TriD; use Time::HiRes qw(usleep); my $a = random(100,100); $a = ($a < 0.5); my $n; while (1) { # Calculate the number of neighbours per cell. $n = $a->range(ndcoords($a)-1,3,"periodic")->reorder(2,3,0,1); $n = $n->sumover->sumover - $a; # Calculate the next generation. $a = ((($n == 2) + ($n == 3))* $a) + (($n==3) * !$a); # Display. nokeeptwiddling3d(); imagrgb [$a]; # Sleep for 0.1 seconds. usleep(100000); }
CONCLUSION: GENERAL STRATEGY
The general strategy is: Move the dimensions you want to operate on to the start of your piddle's dimension list. Then letThreading is a powerful tool that helps eliminate for-loops and can make your code more concise. Hopefully this tutorial has shown why it is worth getting to grips with threading in