# 5-Parallel Programming Patterns Overview and Map Pattern

Outline Parallel programming models Dependencies Structured programming patterns overview Serial / parallel control flow patterns ❍  Serial / parallel data management patterns Map pattern ❍  Optimizations ◆  sequences of Maps ◆  code Fusion ◆  cache Fusion ❍  Related Patterns ❍  Example: Scaled Vector Addition (SAXPY)

1. Parallel Programming Patterns Overview and Map Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 5 – Parallel Programming Patterns - Map

2. Outline q  Parallel programming models q  Dependencies q  Structured programming patterns overview ❍  Serial / parallel control flow patterns ❍  Serial / parallel data management patterns q  Map pattern ❍  Optimizations ◆  sequences of Maps ◆  code Fusion ◆  cache Fusion ❍  Related Patterns ❍  Example: Scaled Vector Addition (SAXPY) Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 2

3. Parallel Models 101 q  Sequential models Memory   (RAM)   ❍  von Neumann (RAM) model q  Parallel model processor   ❍  A parallel computer is simple a collection of processors interconnected in some manner to coordinate activities and exchange data ❍  Models that can be used as general frameworks for describing and analyzing parallel algorithms ◆ Simplicity: description, analysis, architecture independence ◆ Implementability: able to be realized, reflect performance q  Three common parallel models ❍  Directed acyclic graphs, shared-memory, network Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 3

4. Directed Acyclic Graphs (DAG) q  Capturesdata flow parallelism q  Nodes represent operations to be performed ❍  Inputsare nodes with no incoming arcs ❍  Output are nodes with no outgoing arcs ❍  Think of nodes as tasks q  Arcs are paths for flow of data results q  DAG represents the operations of the algorithm and implies precedent constraints on their order for (i=1; i<100; i++) a   a   …   a   a[i] = a[i-1] + 100; Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 4

5. Shared Memory Model q  Parallel extension of RAM model (PRAM) ❍  Memory size is infinite P1 ❍  Number of processors in unbounded P2 ❍  Processors communicate via the memory P3 Shared . Memory ❍  Every processor accesses any memory . . location in 1 cycle PN ❍  Synchronous ◆  All processors execute same algorithm synchronously –  READ phase –  COMPUTE phase –  WRITE phase ◆  Some subset of the processors can stay idle ❍  Asynchronous Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 5

6. Network Model q  G = (N,E) P11 P12 P1N ❍  N are processing nodes P21 P22 P2N ❍  E are bidirectional communication links P31 P32 P3N . . . q  Each processor has its own memory . . . . . . q  No shared memory is available PN1 PN2 PNN q  Network operation may be synchronous or asynchronous q  Requires communication primitives ❍  Send (X, i) ❍  Receive (Y, j) q  Captures message passing model for algorithm design Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 6

7. Parallelism q  Abilityto execute different parts of a computation concurrently on different machines q  Why do you want parallelism? ❍  Shorter running time or handling more work q  What is being parallelized? ❍  Task: instruction, statement, procedure, … ❍  Data: data flow, size, replication ❍  Parallelism granularity ◆ Coarse-grain versus fine-grainded q  Thinking about parallelism q  Evaluation Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 7

8. Why is parallel programming important? q  Parallel programming has matured ❍  Standard programming models ❍  Common machine architectures ❍  Programmer can focus on computation and use suitable programming model for implementation q  Increase portability between models and architectures q  Reasonable hope of portability across platforms q  Problem ❍  Performance optimization is still platform-dependent ❍  Performance portability is a problem ❍  Parallel programming methods are still evolving Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 8

9. Parallel Algorithm q  Recipe to solve a problem “in parallel” on multiple processing elements q  Standard steps for constructing a parallel algorithm ❍  Identify work that can be performed concurrently ❍  Partition the concurrent work on separate processors ❍  Properly manage input, output, and intermediate data ❍  Coordinate data accesses and work to satisfy dependencies q  Which are hard to do? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 9

10. Parallelism Views q  Where can we find parallelism? q  Program (task) view ❍  Statement level ◆ Between program statements ◆ Which statements can be executed at the same time? ❍  Block level / Loop level / Routine level / Process level ◆ Larger-grained program statements q  Data view ❍  How is data operated on? ❍  Where does data reside? q  Resource view Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 10

11. Parallelism, Correctness, and Dependence q  Parallel execution, from any point of view, will be constrained by the sequence of operations needed to be performed for a correct result q  Parallel execution must address control, data, and system dependences q  A dependency arises when one operation depends on an earlier operation to complete and produce a result before this later operation can be performed q  We extend this notion of dependency to resources since some operations may depend on certain resources ❍  For example, due to where data is located Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 11

12. Executing Two Statements in Parallel q  Want to execute two statements in parallel q  On one processor: Statement 1; Statement 2; q  On two processors: Processor 1: Processor 2: Statement 1; Statement 2; q  Fundamental (concurrent) execution assumption ❍  Processors execute independent of each other ❍  No assumptions made about speed of processor execution Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 12

13. Sequential Consistency in Parallel Execution q  Case 1: Processor 1: Processor 2: 0me   statement 1; statement 2; q  Case 2: Processor 1: Processor 2: 0me   statement 2; statement 1; q  Sequential consistency ❍  Statements execution does not interfere with each other ❍  Computation results are the same (independent of order) Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 13

14. Independent versus Dependent q  In other words the execution of statement1; statement2; must be equivalent to statement2; statement1; q  Their order of execution must not matter! q  If true, the statements are independent of each other q  Two statements are dependent when the order of their execution affects the computation outcome Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 14

15. Examples q  Example 1 r  Statements are independent S1: a=1; S2: b=1; q  Example 2 r  Dependent (true (flow) dependence) S1: a=1; ¦  Second is dependent on first S2: b=a; ¦  Can you remove dependency? q  Example 3 r  Dependent (output dependence) S1: a=f(x); ¦  Second is dependent on first S2: a=b; ¦  Can you remove dependency? How? q  Example 4 r  Dependent (anti-dependence) S1: a=b; ¦  First is dependent on second S2: b=1; ¦  Can you remove dependency? How? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 15

16. True Dependence and Anti-Dependence q  Given statements S1 and S2, S1; S2; q  S2 has a true (flow) dependence on S1 X  =     δ   ...   if and only if        =  X   S2 reads a value written by S1 q  S2 has a anti-dependence on S1        =  X   δ-­‐1     ...   if and only if X  =   S2 writes a value read by S1 Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 16

17. Output Dependence q  Given statements S1 and S2, S1; S2; q  S2 has an output dependence on S1 X  =   if and only if   δ0   ...   S2 writes a variable written by S1 X  =     q  Anti- and output dependences are “name” dependencies ❍  Are they “true” dependences? q  How can you get rid of output dependences? ❍  Are there cases where you can not? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 17

18. Statement Dependency Graphs q  Canuse graphs to show dependence relationships q  Example S1: a=1; S1   ﬂow   S2: b=a; output   S2   an0   S3: a=b+1; S3   S4: c=a; S4   q  S2 δ S3 : S3 is flow-dependent on S2 q  S1 δ0 S3 : S3 is output-dependent on S1 q  S2 δ-1 S3 : S3 is anti-dependent on S2 Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 18

19. When can two statements execute in parallel? q  Statements S1 and S2 can execute in parallel if and only if there are no dependences between S1 and S2 ❍  True dependences ❍  Anti-dependences ❍  Output dependences q  Some dependences can be remove by modifying the program ❍  Rearranging statements ❍  Eliminating statements Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 19

20. How do you compute dependence? q  Data dependence relations can be found by comparing the IN and OUT sets of each node q  The IN and OUT sets of a statement S are defined as: ❍  IN(S) : set of memory locations (variables) that may be used in S ❍  OUT(S) : set of memory locations (variables) that may be modified by S q  Note that these sets include all memory locations that may be fetched or modified q  As such, the sets can be conservatively large Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 20

21. IN / OUT Sets and Computing Dependence q  Assuming that there is a path from S1 to S2 , the following shows how to intersect the IN and OUT sets to test for data dependence out (S1 )∩ in(S 2 ) ≠ ∅ S1 δ S 2 flow dependence in( S1 ) ∩ out ( S 2 ) ≠ ∅ S1 δ −1 S 2 anti - dependence out ( S1 ) ∩ out ( S 2 ) ≠ ∅ S1 δ 0 S 2 output dependence Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 21

22. Loop-Level Parallelism q  Significant parallelism can be identified within loops for (i=0; i<100; i++) for (i=0; i<100; i++) { S1: a[i] = i; S1: a[i] = i; S2: b[i] = 2*i; } q  Dependencies? What about i, the loop index? q  DOALL loop (a.k.a. foreach loop) ❍  All iterations are independent of each other ❍  All statements be executed in parallel at the same time ◆  Is this really true? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 22

23. Iteration Space q  Unrollloop into separate statements / iterations q  Show dependences between iterations for (i=0; i<100; i++) { for (i=0; i<100; i++) S1: a[i] = i; S1: a[i] = i; S2: b[i] = 2*i; } S10   S11   …   S199   S10   S11   …   S199   S20   S21   S299   Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 23

24. Multi-Loop Parallelism q  Significant parallelism can be identified between loops …   for (i=0; i<100; i++) a[i] = i; a   a   a   …   for (i=0; i<100; i++) b[i] = i; b   b   b   q  Dependencies? q  How much parallelism is available? q  Given 4 processors, how much parallelism is possible? q  What parallelism is achievable with 50 processors? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 24

25. Loops with Dependencies Case 1: Case 2: for (i=1; i<100; i++) for (i=5; i<100; i++) a[i] = a[i-1] + 100; a[i-5] = a[i] + 100; a   a   …   a   a   a   a   …   a   a   a   …   a   a   a   …   a   a   a   …   q  Dependencies? a   a   a   …   ❍  What type? q  Is the Case 1 loop parallelizable? q  Is the Case 2 loop parallelizable? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 25

26. Another Loop Example for (i=1; i<100; i++) a[i] = f(a[i-1]); q  Dependencies? ❍  What type? q  Loop iterations are not parallelizable ❍  Why not? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 26

27. Loop Dependencies q  A loop-carried dependence is a dependence that is present only if the statements are part of the execution of a loop (i.e., between two statements instances in two different iterations of a loop) q  Otherwise, it is loop-independent, including between two statements instances in the same loop iteration q  Loop-carried dependences can prevent loop iteration parallelization q  The dependence is lexically forward if the source comes before the target or lexically backward otherwise ❍  Unroll the loop to see Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 27

28. Loop Dependence Example for (i=0; i<100; i++) a[i+10] = f(a[i]); q  Dependencies? ❍  Between a, a, … ❍  Between a, a, … q  Some parallel execution is possible ❍  How much? Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 28

29. Dependences Between Iterations for (i=1; i<100; i++) { S1: a[i] = …; S1   …   S2   S2: … = a[i-1]; i   } 1   2   3   4   5   6   q  Dependencies? ❍  Between a[i] and a[i-1] q  Is parallelism possible? ❍  Statements can be executed in “pipeline” manner Introduction to Parallel Computing, University of Oregon, IPCC Lecture 5 – Parallel Programming Patterns - Map 29