展开查看详情
1.CSC2/458 Parallel and Distributed Systems High-level Parallel Programming Models Sreepathi Pai March 6, 2018 URCS
2.Outline Higher Level Parallel Programming OpenMP Cilk
3.Outline Higher Level Parallel Programming OpenMP Cilk
4.Parallel Programming Models • What can be executed in parallel? • Parallelization • Where and when should be it executed? • Scheduling • Synchronization
5.Low-level Parallel Programming • Parallelization • Threading • Multiprocessing • Scheduling • Manual (thread pinning, etc.) • Synchronization • Manual (locks, barriers, etc.)
6.Higher-level Parallel Programming Models • Make parallel programming easy • for parallel programmers • Without ”overburdening” • Programming language designers • Compiler writers • Computer architects
7.Building your own parallel programming models • Identify parallel patterns • Embarrassingly parallel • Reductions • Producer–Consumer • etc. • Provide structures for these parallel patterns • e.g. parallel for loop – forall • Programmers use these structures • Execute in parallel • Implement structures using lower-level programming models
8.Outline Higher Level Parallel Programming OpenMP Cilk
9.OpenMP • Aimed at shared memory multithreaded programming • Within a single process and single computer node • Usually coupled with Message Passing Interface (MPI) • MPI is used for parallelism across processes and machines • Not required for OpenMP programs, will discuss MPI in distributed systems
10.OpenMP programming model • C/C++ and Fortran support • Must be supported by compiler • gcc/clang both support OpenMP • Supports programmer-created threads • But do not need to use them
11.Vector Addition void vector_add(int *a, int *b, int *c, int N) { for(int i = 0; i < N; i++) { c[i] = a[i] + b[i] } } How would you parallelize this loop using pthreads?
12.Parallel Programming Pattern in Vector Addition • Create T threads • Distribute N/T work to each thread • N is size of array (or work, or loop iterations, etc.) • Pattern • Embarrassingly Parallel • Data Decomposition • Also called ”map” pattern
13.Vector Addition using OpenMP #include <omp.h> void vector_add(int *a, int *b, int *c, int N) { #pragma omp parallel for for(int i = 0; i < N; i++) { c[i] = a[i] + b[i] } /* implicit barrier after loop*/ }
14.Vector Sum void vector_sum(int *a, int N) { int sum = 0; for(int i = 0; i < N; i++) { sum += a[i]; } } How would you parallelize this loop using pthreads?
15.Parallel Programming Pattern in Vector Sum • Create T threads • Distribute N/T work to each thread • N is size of array (or work, or loop iterations, etc.) • Wait until each thread computes sum • Compute the sum of these individual sums (recursively and in parallel) • Pattern • Reduction • Requires associative operator • Also called ”reduce”
16.Vector Sum using OpenMP? void vector_sum(int *a, int N) { int sum = 0; #pragma omp parallel for for(int i = 0; i < N; i++) { sum += a[i]; } } Correct?
17.Vector Sum using OpenMP void vector_sum(int *a, int N) { int sum = 0; #pragma omp parallel for reduction(+:sum) for(int i = 0; i < N; i++) { sum += a[i]; } /* OpenMP aggregates sum across all threads */ } Other operators also supported: +, *, -, max, min, &, &&, etc.
18.Low-level constructs in OpenMP • Thread creation • #pragma omp parallel • (without for) • Critical Sections • #pragma omp critical • Many others • Usually better integrated with C/C++ than pthreads • Other ”task parallel” constructs too
19.Outline Higher Level Parallel Programming OpenMP Cilk
20.Cilk programming model • Also C/C++ • Two keywords: • cilk spawn • cilk sync
21.Fibonacci calculation example int fib(int n) { if (n < 2) return n; int x = fib(n-1); int y = fib(n-2); return x + y; }
22.Fibonacci calculation in Cilk int fib(int n) { if (n < 2) return n; int x = cilk_spawn fib(n-1); int y = fib(n-2); cilk_sync; return x + y; } • cilk spawn will allow fib(n-1) to execute in parallel with fib • cilk sync will wait for all spawns from current function to finish • All functions contain an implicit cilk sync at exit • cilk spawn is like fork, cilk sync is like join • However, these are suggestions, may not actually execute in parallel
23.Scheduling in Cilk • cilk spawn creates two tasks • Actually one task, and one continuation • How are these tasks mapped to cores? • Tasks can produce other tasks
24.Work stealing in Cilk - I • Assume each core has a deque (double-ended queue) • Tasks spawned (and their continuations) are added to the deque of a core • And pulled off the head of deque and executed • Will this keep all cores busy?
25.Work stealing in Cilk - II • Assume core runs out of tasks • It randomly picks another core • And pulls off a task from the tail of the dequeue and executes it • Will this keep all cores busy?
26.Embarrassingly parallel loops in Cilk for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; • Apparently not a good idea in Cilk • Why?
27.Embarrassingly parallel loops in CilkPlus cilk_for (int i = 0; i < 8; ++i) { do_work(i); } cilk_sync;
28.Acknowledgements Cilk/Cilkplus material from the Cilkplus tutorial