14-High-level Parallel Programming Models

Higher Level Parallel Programming OpenMP Cilk
展开查看详情

1.CSC2/458 Parallel and Distributed Systems High-level Parallel Programming Models Sreepathi Pai March 6, 2018 URCS

2.Outline Higher Level Parallel Programming OpenMP Cilk

3.Outline Higher Level Parallel Programming OpenMP Cilk

4.Parallel Programming Models • What can be executed in parallel? • Parallelization • Where and when should be it executed? • Scheduling • Synchronization

5.Low-level Parallel Programming • Parallelization • Threading • Multiprocessing • Scheduling • Manual (thread pinning, etc.) • Synchronization • Manual (locks, barriers, etc.)

6.Higher-level Parallel Programming Models • Make parallel programming easy • for parallel programmers • Without ”overburdening” • Programming language designers • Compiler writers • Computer architects

7.Building your own parallel programming models • Identify parallel patterns • Embarrassingly parallel • Reductions • Producer–Consumer • etc. • Provide structures for these parallel patterns • e.g. parallel for loop – forall • Programmers use these structures • Execute in parallel • Implement structures using lower-level programming models

8.Outline Higher Level Parallel Programming OpenMP Cilk

9.OpenMP • Aimed at shared memory multithreaded programming • Within a single process and single computer node • Usually coupled with Message Passing Interface (MPI) • MPI is used for parallelism across processes and machines • Not required for OpenMP programs, will discuss MPI in distributed systems

10.OpenMP programming model • C/C++ and Fortran support • Must be supported by compiler • gcc/clang both support OpenMP • Supports programmer-created threads • But do not need to use them

11.Vector Addition void vector_add(int *a, int *b, int *c, int N) { for(int i = 0; i < N; i++) { c[i] = a[i] + b[i] } } How would you parallelize this loop using pthreads?

12.Parallel Programming Pattern in Vector Addition • Create T threads • Distribute N/T work to each thread • N is size of array (or work, or loop iterations, etc.) • Pattern • Embarrassingly Parallel • Data Decomposition • Also called ”map” pattern

13.Vector Addition using OpenMP #include <omp.h> void vector_add(int *a, int *b, int *c, int N) { #pragma omp parallel for for(int i = 0; i < N; i++) { c[i] = a[i] + b[i] } /* implicit barrier after loop*/ }

14.Vector Sum void vector_sum(int *a, int N) { int sum = 0; for(int i = 0; i < N; i++) { sum += a[i]; } } How would you parallelize this loop using pthreads?

15.Parallel Programming Pattern in Vector Sum • Create T threads • Distribute N/T work to each thread • N is size of array (or work, or loop iterations, etc.) • Wait until each thread computes sum • Compute the sum of these individual sums (recursively and in parallel) • Pattern • Reduction • Requires associative operator • Also called ”reduce”

16.Vector Sum using OpenMP? void vector_sum(int *a, int N) { int sum = 0; #pragma omp parallel for for(int i = 0; i < N; i++) { sum += a[i]; } } Correct?

17.Vector Sum using OpenMP void vector_sum(int *a, int N) { int sum = 0; #pragma omp parallel for reduction(+:sum) for(int i = 0; i < N; i++) { sum += a[i]; } /* OpenMP aggregates sum across all threads */ } Other operators also supported: +, *, -, max, min, &, &&, etc.

18.Low-level constructs in OpenMP • Thread creation • #pragma omp parallel • (without for) • Critical Sections • #pragma omp critical • Many others • Usually better integrated with C/C++ than pthreads • Other ”task parallel” constructs too

19.Outline Higher Level Parallel Programming OpenMP Cilk

20.Cilk programming model • Also C/C++ • Two keywords: • cilk spawn • cilk sync

21.Fibonacci calculation example int fib(int n) { if (n < 2) return n; int x = fib(n-1); int y = fib(n-2); return x + y; }

22.Fibonacci calculation in Cilk int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } • cilk spawn will allow fib(n-1) to execute in parallel with fib • cilk sync will wait for all spawns from current function to finish • All functions contain an implicit cilk sync at exit • cilk spawn is like fork, cilk sync is like join • However, these are suggestions, may not actually execute in parallel

23.Scheduling in Cilk • cilk spawn creates two tasks • Actually one task, and one continuation • How are these tasks mapped to cores? • Tasks can produce other tasks

24.Work stealing in Cilk - I • Assume each core has a deque (double-ended queue) • Tasks spawned (and their continuations) are added to the deque of a core • And pulled off the head of deque and executed • Will this keep all cores busy?

25.Work stealing in Cilk - II • Assume core runs out of tasks • It randomly picks another core • And pulls off a task from the tail of the dequeue and executes it • Will this keep all cores busy?

26.Embarrassingly parallel loops in Cilk for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; • Apparently not a good idea in Cilk • Why?

27.Embarrassingly parallel loops in CilkPlus cilk_for (int i = 0; i < 8; ++i) { do_work(i); } cilk_sync;

28.Acknowledgements Cilk/Cilkplus material from the Cilkplus tutorial