ARCS What is OpenMP?

The Components of OpenMP. 6. ARCS. The Components of OpenMP. Directives. Runtime library routines; Environment variables. 7 !$OMP PARALLEL DO.
展开查看详情

1.Parallel Programming in Ope nMP 2009.10.09 JEONGSIK CHOI (chjs@skku.edu) ARCS

2.Agenda • Instruction • Directives • Clauses ARCS 2

3.Instruction ARCS 3

4.What is OpenMP? It is an API that supports multi-platform shared memory multiprocessing program ming in C, C++ and Fortran on many architectures ARCS 4

5.Shared Memory Architecture I/O Memory Bus or Crossbar switch Cache Cache Cache Cache Processor Processor Processor Processor ARCS 5

6.The Components of OpenMP ARCS 6

7.The Components of OpenMP • Directives !$OMP PARALLEL DO • Runtime library routines CALL omp_set_num_threads(128) • Environment variables Export OMP_NUM_THREADS=8 ARCS 7

8.OpenMP Programming Model • Thread-Based • Fork-Join Model ARCS 8

9.OpenMP Fork-and-Join model printf(“program begin\n”); N = 1000; Serial #pragma omp parallel for for (i=0; i<N; i++) Parallel A[i] = B[i] + C[i]; M = 500; Serial #pragma omp parallel for for (j=0; j<M; j++) Parallel p[j] = q[j] – r[j]; printf(“program done\n”); Serial ARCS 9

10.Parallelizing a Simple Loop Serial version Parallel OpenMP version PROGRAM exam PROGRAM exam … … ialpha = 2 ialpha = 2 DO i = 1, 100 !$OMP PARALLEL DO a(i) = a(i) + ialpha * b(i) DO i = 1, 100 ENDDO a(i) = a(i) + ialpha * b(i) PRINT *,a ENDDO END !$OMP END PARALLEL DO PRINT *,a END ARCS 10

11.Parallelizing a Simple Loop • Fork-Join ※ export OMP_NUM_THREADS = 4 ialpha = 2 (Master Thread) Fork DO i=1, 25 DO i=26, 50 DO i=51, 75 DO i=76, 100 ... ... ... ... (Master) (Slave) (Slave) (Slave) Join PRINT *, a (Master Thread) ARCS 11

12.Syntax Fortran: f77 Fortran: f90 C prefix !$OMP <directive> !$OMP <directive> #pragma omp <directive> C$OMP <directive> *$OMP <directive> newline !$OMP <directive> !$OMP <directive> & #pragma omp … \ !$OMP&… … … selective !$ … !$ … #ifdef _OPENMP compile C$ … … *$ … #endif start first column any any location ARCS 12

13.Syntax • Commonly used directives Fortran C !$OMP PARALLEL #pragma omp parallel !$OMP DO #pragma omp for !$OMP PARALLEL DO #pragma omp parallel for !$OMP CRITICAL #pragma omp critical PRIVATE/SHARED private/shared DEFAULT default RECUCTION reduction ARCS 13

14.Shared Memory Model private private • Data can be shared or pr thread2 ivate thread1 • Shared data is accessibl Shared Shared e by all threads Memory Memory • Private data can be acce thread3 ssed only by the threads thread5 that owns it thread4 private private private ARCS 14

15.Data Scoping /* hello_wrong */ I am = 3, tid = 3 I am = 0, tid = 1 #include <omp.h> I am = 1, tid = 1 main() { I am = 2, tid = 1 int i, a, tid; #pragma omp parallel { tid = omp_get_thread_num(); for(i=0; i<10000; i++) a = i; printf(“I am %d, tid = %d\n”, omp_get_thread_num(), tid); } } ARCS 15

16.Data Scoping /* hello_right */ I am = 3, tid = 3 I am = 0, tid = 0 #include <omp.h> I am = 1, tid = 1 main() { I am = 2, tid = 2 int i, a, tid; #pragma omp parallel private(tid) { tid = omp_get_thread_num(); for(i=0; i<10000; i++) a = i; printf(“I am %d, tid = %d\n”, omp_get_thread_num(), tid); } } ARCS 16

17.Data Scoping ARCS 17

18.Compile & Excute • Compile $ ifort –o hello hello.f -openmp $ icc –o hello hello.c -openmp • Excute $ ./hello ARCS 18

19.Directives and Clauses ARCS 19

20.Major Directives • Parallel Region Constructs − parallel • Work-Sharing Constructs − do/for − sections − single • Combined Parallel Work-Sharing Constructs − parallel do/for − parallel sections ARCS 20

21.Parallel Region Construct • C format #pragma omp parallel [clause ...] newline if (scalar_expression) private (list) shared (list) default (shared | none) firstprivate (list) reduction (operator: list) copyin (list) num_threads (integer-expression) { structured_block } ARCS 21

22.Parallel Region Construct • How Many Threads? − Use of the omp_set_num_threads() library function − Setting of the OMP_NUM_THREADS environment variable • SPMD Style − The code is duplicated and all threads will execute that cod e. • Fork-Join Model − There is an implied barrier at the end of a parallel section. Only the master thread continues execution past this point. ARCS 22

23.Parallel Region Construct • Thread ID − 0(Master Thread) ~[# of threads -1] − omp_get_thread_num() #pragma omp parallel { myid = omp_get_thread_num(); if( myid == 0 ) do_something(); else do_something( myid ); } ARCS 23

24.Parallel Region Construct • Dynamic Threads: − If supported, the two methods available for enabling d ynamic threads are: • The omp_set_dynamic() library routine • Setting of the OMP_DYNAMIC environment variable to TRUE • Nested Parallel Regions: − Use the omp_get_nested() library function to determin e if nested parallel regions are enabled. ARCS 24

25.Work-Sharing Constructs • do / for • sections • single − A work-sharing construct divides the execution of the e nclosed code region among the members of the team that encounter it. − Work-sharing constructs do not launch new threads − There is no implied barrier upon entry to a work-sharin g construct, however there is an implied barrier at the end of a work sharing construct. ARCS 25

26.Work-Sharing Constructs do / for sections single ARCS 26

27.Work-Sharing Constructs : DO / for • The DO / for directive specifies that the iteration s of the loop immediately following it must be ex ecuted in parallel by the team. #pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) collapse (n) nowait for_loop ARCS 27

28.Work-Sharing Constructs : sections • The SECTIONS directive is a non-iterative work-sharing co nstruct. It specifies that the enclosed section(s) of code are to be divided among the threads in the team. #pragma omp sections [clause ...] newline private (list) firstprivate (list) lastprivate (list) reduction (operator: list) nowait { #pragma omp section newline structured_block #pragma omp section newline structured_block } ARCS 28

29.Work-Sharing Constructs : single • The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. • May be useful when dealing with sections of code that ar e not thread safe (such as I/O) #pragma omp single [clause ...] newline private (list) firstprivate (list) nowait structured_block ARCS 29