Introduction
This tutorial will take you through the OpenMP directives, starting with the most basic and useful directives.
Execution Model
A program that is written using OpenMP directives begins execution as a single process, called the master thread of execution. The master thread executes sequentially until the first parallel construct is encountered. The PARALLEL / END PARALLEL directive pair constitutes the parallel construct.
When a parallel construct is encountered, the master thread creates a team of threads, and the master becomes the master of the team. The program statements that are enclosed in a parallel construct, including routines called from within the construct, are executed in parallel by each thread in the team.
Upon completion of the parallel construct, the threads in the team synchronize and only the master thread continues execution. Any number of parallel constructs can be specified in a single program. As a result, a program may fork and join many times during execution.
The degree of parallelism an OpenMP code is dependent on the code, the platform, the hardware configuration, the compiler, and the operating system. In no case are you guaranteed to have each thread running on a separate processor.
The Cray XT4 will only allow you to specify a number of threads less than or equal to the number of processor cores on a node. For example, you can only execute 1 or 2 threads on a dual processor core franklin node.
You are only allowed to run OpenMP codes on franklin on compute nodes with a pbs script.
Examples
Examples using OpenMP can be copied to your $HOME/openmp_examples directory on franklin by using:
% cd $HOME % mkdir openmp_examples % module load training % cp $EXAMPLES/OpenMP/tutorial/* openmp_examples
OpenMP Directive Syntax
OpenMP directives are inserted directly into source code. Free-form Fortran source code directives begin with the sentinel !$OMP. Fixed-form Fortran source code directives begin with the sentinels !$OMP, C$OMP, or *$OMP. Sentinels must start in column one. Continuation lines are permitted using the same format as the Fortran source code format you are using (free or fixed).
Following are descriptions of the basic use of OpenMP directives with examples.
PARALLEL Directive
A Parallel Region is a block of code that is to be executed in parallel by a number of threads. Each thread executes the enclosed code separately.
Note that all code within a parallel region is executed by each thread unless other OpenMP directives specify otherwise. For instance, a DO loop that lies within a parallel region will be executed completely (and redundantly) by each thread unless a parallel DO directive is inserted before the loop. A DO or PARALLEL DO directive is necessary if you want the loop to be executed once with different threads performing different iterations of the loop in parallel.
It is illegal to branch out of a Parallel Region.
!$OMP PARALLEL [clause] code block !$OMP END PARALLEL
There are many possible values of [clause].
Examples
!Filename: parallel.f90
!
!This simply shows that code in a PARALLEL
!region is executed by each thread.
PROGRAM PARALLEL
IMPLICIT NONE
INTEGER I
I=1
!$OMP PARALLEL FIRSTPRIVATE(I)
PRINT *, I
!$OMP END PARALLEL
END PROGRAM PARALLEL
To run on franklin:
> cat parallel.pbs
#PBS -N parallel
#PBS -j oe
#PBS -o parallel.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o parallel -mp=nonuma -Minfo=mp parallel.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./parallel
> qsub parallel.pbs
498022.nid00003
> cat parallel.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
parallel.f90:
parallel:
12, Parallel region activated
14, Parallel region terminated
1
1
Application 4673065 resources: utime 0, stime 0
The next example shows use of the REDUCTION clause. This simple example shows how the values of the variables are combined when leaving the parallel region when a REDUCTION clause is used. Also note that each thread executes the PRINT statement in the parallel region.
!Filename: reduction.f90
!
!This program shows the use of the REDUCTION clause.
PROGRAM REDUCTION
IMPLICIT NONE
INTEGER tnumber, OMP_GET_THREAD_NUM
INTEGER I,J,K
I=1
J=1
K=1
PRINT *, "Before Par Region: I=",I," J=", J," K=",K
PRINT *, ""
!$OMP PARALLEL DEFAULT(PRIVATE) REDUCTION(+:I)&
!$OMP REDUCTION(*:J) REDUCTION(MAX:K)
tnumber=OMP_GET_THREAD_NUM()
I = tnumber
J = tnumber
K = tnumber
PRINT *, "Thread ",tnumber, " I=",I," J=", J," K=",K
!$OMP END PARALLEL
PRINT *, ""
print *, "Operator + * MAX"
PRINT *, "After Par Region: I=",I," J=", J," K=",K
END PROGRAM REDUCTION
To run on franklin:
> cat reduction.pbs
#PBS -N reduction
#PBS -j oe
#PBS -o reduction.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o reduction -mp=nonuma -Minfo=mp reduction.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./reduction
> qsub reduction.pbs
498041.nid00003
> cat reduction.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
reduction.f90:
reduction:
15, Parallel region activated
25, Begin critical section
End critical section
Parallel region terminated
Before Parallel Region: I= 1 J= 1 K= 1
Thread 0 I= 0 J= 0 K=
0
Thread 1 I= 1 J= 1 K=
1
Operator + * MAX
After Parallel Region: I= 2 J= 0 K= 1
Application 4673534 resources: utime 0, stime 0
DO Directive
The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. The DO directive must be enclosed in a parallel region; it creates no threads by itself. The following do loop can not be a DO WHILE.
!$OMP DO [clause[[,]clause ...] do_loop !$OMP END DO [NOWAIT]
The DO clause can have various values.
It is illegal to branch out of a DO loop associated with the DO directive.
Example
!Filename: dodir.f90
!
PROGRAM DODIR
IMPLICIT NONE
INTEGER I,L
INTEGER, PARAMETER:: DIM=16
REAL A(DIM),B(DIM),S
INTEGER nthreads,tnumber
INTEGER OMP_GET_NUM_THREADS,OMP_GET_THREAD_NUM
CALL RANDOM_NUMBER(A)
CALL RANDOM_NUMBER(B)
!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(A,B)
!$OMP DO SCHEDULE(STATIC,2)
DO I=2,DIM
B(I) = ( A(I) - A(I-1) ) / 2.0
nthreads=OMP_GET_NUM_THREADS()
tnumber=OMP_GET_THREAD_NUM()
print *, "Thread",tnumber," of",nthreads," has I=",I
END DO
!$OMP END DO
!$OMP END PARALLEL
S=MAXVAL(B)
L=MAXLOC(B,1)
PRINT *, "Maximum gradient: ",S," at location:",L
END PROGRAM DODIR
Compiling and running on Franklin:
> cat dodir.pbs
#PBS -N dodir
#PBS -j oe
#PBS -o dodir.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o dodir -mp=nonuma -Minfo=mp dodir.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./dodir
> qsub dodir.pbs
500611.nid00003
> cat dodir.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
dodir.f90:
dodir:
15, Parallel region activated
17, Parallel loop activated; static block-cyclic iteration allocation
24, Barrier
Parallel region terminated
Thread 1 of 2 has I= 4
Thread 1 of 2 has I= 5
Thread 1 of 2 has I= 8
Thread 1 of 2 has I= 9
Thread 1 of 2 has I= 12
Thread 1 of 2 has I= 13
Thread 1 of 2 has I= 16
Thread 0 of 2 has I= 2
Thread 0 of 2 has I= 3
Thread 0 of 2 has I= 6
Thread 0 of 2 has I= 7
Thread 0 of 2 has I= 10
Thread 0 of 2 has I= 11
Thread 0 of 2 has I= 14
Thread 0 of 2 has I= 15
Maximum gradient: 0.6280164 at location: 1
Application 4733417 resources: utime 0, stime 0
Notice that the loop was divided among the 2 threads as we requested with the SCHEDULE(STATIC,2) clause. Also note that if had not enclosed the do loop in a DO/END DO directive block, the loop would not have been split, but would have been executed OMP_NUM_THREADS number of times.
PARALLEL DO Directive
The PARALLEL DO directive provides a shortcut form for specifying a parallel region that contains a single DO directive. The semantics are identical to specifying a PARALLEL directive followed by a DO directive.
!$OMP PARALLEL DO [clause[[,]clause ...] do_loop !$OMP END PARALLEL DO
The clause can be any of those associated with the PARALLEL and DO directives described above.
Example
!Filename: pardo.f90
!
PROGRAM PARDO
IMPLICIT NONE
INTEGER I,J
INTEGER, PARAMETER:: DIM1=10000, DIM2=200
REAL A(DIM1),B(DIM2,DIM1),C(DIM2,DIM1)
REAL before, after, elapsed,S
INTEGER nthreads,OMP_GET_NUM_THREADS
CALL RANDOM_NUMBER(A)
call cpu_time(before)
!$OMP PARALLEL DO SCHEDULE(RUNTIME) PRIVATE(I,J) SHARED (A,B,C,nthreads)
DO J=1,DIM2
nthreads = OMP_GET_NUM_THREADS()
DO I=2, DIM1
B(J,I) = ( (A(I)+A(I-1))/2.0 ) / SQRT(A(I))
C(J,I) = SQRT( A(I)*2 ) / ( A(I)-(A(I)/2.0) )
B(J,I) = C(J,I) * ( B(J,I)**2 ) * SIN(A(I))
END DO
END DO
!$OMP END PARALLEL DO
call cpu_time(after)
!Find elapsed time; convert to seconds from ms
elapsed = after-before
S=MAXVAL(B)
WRITE(6,'("Maximum of B=",1pe8.2," found in ",1pe8.2," &
&seconds using", I2," threads")') S,elapsed,nthreads
END PROGRAM PARDO
Compiling and running on Franklin:
> cat pardo.pbs
#PBS -N pardo
#PBS -j oe
#PBS -o pardo.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o pardo -mp=nonuma -Minfo=mp pardo.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./pardo
> qsub pardo.pbs
500719.nid00003
> cat pardo.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
pardo.f90:
pardo:
16, Parallel region activated
17, Parallel loop activated; runtime schedule iteration allocation
24, Parallel region terminated
Maximum of B=2.92E+01 found in 1.12E-01 seconds using 2 threads
Application 4734298 resources: utime 0, stime 0
SECTIONS Directive
The SECTIONS directive specifies that code in the the enclosed SECTION blocks are to be divided among the threads in the team. Each section is executed once.
!$OMP SECTIONS [clause[[,]clause ...] [!$OMP SECTION] code block [!$OMP SECTION code block] ... !$OMP END SECTIONS [NOWAIT]
The clause can be one of the following:
- PRIVATE (list)
- FIRSTPRIVATE (list)
- LASTPRIVATE (list)
- REDUCTION ({operator | intrinsic):list)
- See operator and intrinsic list for Parallel Regions.
Example
!Filename: sections.f90
!
!This shows code that is executed
!in sections.
PROGRAM SECTIONS
IMPLICIT NONE
INTEGER OMP_GET_THREAD_NUM, tnumber
!$OMP PARALLEL
!$OMP SECTIONS PRIVATE(tnumber)
!$OMP SECTION
tnumber=OMP_GET_THREAD_NUM()
PRINT *,"This is section 1 being executed by thread",tnumber
!$OMP SECTION
tnumber=OMP_GET_THREAD_NUM()
PRINT *,"This is section 2 being executed by thread",tnumber
!$OMP SECTION
tnumber=OMP_GET_THREAD_NUM()
PRINT *,"This is section 3 being executed by thread",tnumber
!$OMP SECTION
tnumber=OMP_GET_THREAD_NUM()
PRINT *,"This is section 4 being executed by thread",tnumber
!$OMP END SECTIONS
!$OMP END PARALLEL
END PROGRAM SECTIONS
Compiling and running on franklin:
> cat sections.pbs
#PBS -N sections
#PBS -j oe
#PBS -o sections.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o sections -mp=nonuma -Minfo=mp sections.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./sections
> qsub sections.pbs
500738.nid00003
> cat sections.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
sections.f90:
sections:
11, Parallel region activated
12, Begin sections
17, New section
20, New section
23, New section
26, End sections
Parallel region terminated
This is section 2 being executed by thread 1
This is section 4 being executed by thread 1
This is section 1 being executed by thread 0
This is section 3 being executed by thread 0
Application 4734431 resources: utime 0, stime 0
SINGLE Directive
The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. Threads that are not executing in the SINGLE directive wait at the END SINGLE directive unless NOWAIT is specified. It is illegal to branch out of a SINGLE block.
!$OMP SINGLE [clause[[,]clause ...] block !$OMP END SINGLE [NOWAIT]
The clause can be on of the following:
- PRIVATE (list)
- FIRSTPRIVATE (list)
Example
!Filename: single.f90
!
!This shows use of the SINGLE directive.
!
PROGRAM SINGLE
IMPLICIT NONE
INTEGER, PARAMETER:: N=12
REAL, DIMENSION(N):: A,B,C,D
INTEGER:: I
REAL:: SUMMED
!$OMP PARALLEL SHARED(A,B,C,D) PRIVATE(I)
!***** Reading files fort.10, fort.11, fort.12 in parallel
!$OMP SECTIONS
!$OMP SECTION
READ(10,*) (A(I),I=1,N)
!$OMP SECTION
READ(11,*) (B(I),I=1,N)
!$OMP SECTION
READ(12,*) (C(I),I=1,N)
!$OMP END SECTIONS
!$OMP SINGLE
SUMMED = SUM(A) + SUM(B) + SUM(C)
PRINT *, "Sum of A+B+C=",SUMMED
!$OMP END SINGLE
!$OMP DO SCHEDULE(STATIC,4)
DO I=1,N
D(I) = A(I) + B(I)*C(I)
END DO
!$OMP END DO
!$OMP END PARALLEL
PRINT *, "The values of D are", D
END PROGRAM SINGLE
Compiling and running on franklin. The files named fort.10, fort.11, and fort.12 each has 12 1.0's:
> cat single.pbs
#PBS -N single
#PBS -j oe
#PBS -o single.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o single -mp=nonuma -Minfo=mp single.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./single
> qsub single.pbs
500801.nid00003
> cat single.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
single.f90:
single:
13, Parallel region activated
17, Begin sections
20, New section
22, New section
24, End sections
26, Begin single section
29, End single section
Barrier
32, Parallel loop activated; static block-cyclic iteration allocation
35, Barrier
Parallel region terminated
Sum of A+B+C= 36.00000
The values of D are 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
2.000000
Application 4735339 resources: utime 0, stime 0
MASTER Directive
The code enclosed with MASTER and END MASTER directives is executed only by the master thread of the team. There is no implied barrier either on entry to or exit from the MASTER section. Branching out of a MASTER block is illegal.
!$OMP MASTER block !$OMP END MASTER
BARRIER Directive
!$OMP BARRIER
The BARRIER directive synchronizes all threads in a team. When encountered, each thread waits until all of the others threads in that team have reached this point.
Example
!Filename: barrier.f90
!
!This shows use of the BARRIER directive.
!
PROGRAM ABARRIER
IMPLICIT NONE
INTEGER:: L
INTEGER:: nthreads, OMP_GET_NUM_THREADS
INTEGER:: tnumber, OMP_GET_THREAD_NUM
!$OMP PARALLEL SHARED(L) PRIVATE(nthreads,tnumber)
nthreads = OMP_GET_NUM_THREADS()
tnumber = OMP_GET_THREAD_NUM()
!$OMP MASTER
PRINT *, ' Enter a value for L'
READ(5,*) L
!$OMP END MASTER
!$OMP BARRIER
!$OMP CRITICAL
PRINT *, ' My thread number =',tnumber
PRINT *, ' Number of threads =',nthreads
PRINT *, ' Value of L =',L
PRINT *, ''
!$OMP END CRITICAL
!$OMP END PARALLEL
END PROGRAM ABARRIER
Compiling and running on franklin:
> cat barrier.pbs
#PBS -N barrier
#PBS -j oe
#PBS -o barrier.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o barrier -mp=nonuma -Minfo=mp barrier.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./barrier < ninety
> cat ninety
90
> qsub barrier.pbs
500868.nid00003
> cat barrier.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
barrier.f90:
barrier:
12, Parallel region activated
16, Begin master section
21, End master section
23, Barrier
27, Begin critical section __cs_unspc
30, End critical section __cs_unspc
Parallel region terminated
Enter a value for L
My thread number = 0
Number of threads = 2
Value of L = 90
My thread number = 1
Number of threads = 2
Value of L = 90
Application 4736056 resources: utime 0, stime 0
FLUSH Directive
!$OMP FLUSH(list)
The FLUSH directive identifies synchronization points at which the implementation is required to provide a consistent view of memory. The directive must appear at the precise point in the code at which the synchronization is required. The optional list argument consists of a comma-separated list of variables that need to be flushed in order to avoid flushing all variables.
The FLUSH directive is implied for the following directives:
- BARRIER
- CRITICAL and END CRITICAL
- END DO
- END PARALLEL
- END SECTIONS
- END SINGLE
- ORDERED and END ORDERED
The directive is not implied if NOWAIT is present.
ATOMIC Directive
!$OMP ATOMIC
The ATOMIC directive ensures that a specific memory location is to be updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
Example
!Filename: density.f
!
PROGRAM DENSITY
IMPLICIT NONE
INTEGER, PARAMETER:: NBINS=10
INTEGER, PARAMETER:: NPARTICLES=100000
REAL:: XMIN, XMAX, MAXMASS, MINMASS
REAL, DIMENSION(NPARTICLES):: X_LOCATION, PARTICLE_MASS
INTEGER, DIMENSION(NPARTICLES):: BIN
REAL, DIMENSION(NBINS):: GRID_MASS, GRID_DENSITY
INTEGER, DIMENSION(NBINS):: GRID_N
REAL:: DX,DXINV,TOTAL_MASS,CHECK_MASS
INTEGER:: I, CHECK_N, XMAX_LOC(1)
GRID_MASS=0.0
TOTAL_MASS=0.0
GRID_N=0
CHECK_MASS=0.0
CHECK_N=0
! Initialize particle positions and masses
CALL RANDOM_NUMBER(PARTICLE_MASS)
CALL RANDOM_NUMBER(X_LOCATION)
MAXMASS = MAXVAL(PARTICLE_MASS)
MINMASS = MINVAL(PARTICLE_MASS)
XMAX = MAXVAL(X_LOCATION)
XMIN = MINVAL(X_LOCATION)
PRINT *, 'MINMASS =',MINMASS,' MAXMASS = ',MAXMASS
PRINT *, 'XMIN =',XMIN,' XMAX = ',XMAX
! Grid Spacing (and inverse)
DX = (XMAX-XMIN) / FLOAT(NBINS)
DXINV = 1/DX
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I) REDUCTION(+:TOTAL_MASS)
!$OMP DO
DO I = 1, NPARTICLES
IF (I==XMAX_LOC(1)) THEN
BIN(I) = NBINS
ELSE
BIN(I) = 1 + ( (X_LOCATION(I)-XMIN) * DXINV )
END IF
IF(BIN(I) < 1 .OR. BIN(I) > NBINS) THEN
! Off Grid!
PRINT *, 'ERROR: BIN =',BIN(I),' X =',X_LOCATION(I)
ELSE
!$OMP ATOMIC
GRID_MASS(BIN(I)) = GRID_MASS(BIN(I)) &
+ PARTICLE_MASS(I)
!$OMP ATOMIC
GRID_N(BIN(I)) = GRID_N(BIN(I)) + 1
TOTAL_MASS = TOTAL_MASS + PARTICLE_MASS(I)
END IF
END DO
!$OMP END DO
!$OMP END PARALLEL
DO I=1, NBINS
GRID_DENSITY(I) = GRID_MASS(I) * DXINV
END DO
PRINT *, 'Total Particles =',NPARTICLES
PRINT *, 'Total Mass =',TOTAL_MASS
DO I=1,NBINS
PRINT *, 'DENSITY(',I,' ) =',GRID_DENSITY(I),' &
&MASS(',I,' ) =',GRID_MASS(I)
END DO
! Check for consistency
DO I=1,NBINS
CHECK_MASS = CHECK_MASS + GRID_MASS(I)
CHECK_N = CHECK_N + GRID_N(I)
END DO
PRINT *, 'Particles on Grid =', CHECK_N
PRINT *, 'Total Mass on Grid =', CHECK_MASS
END PROGRAM DENSITY
Without the ATOMIC directives, difference threads would try to update the grid mass bins at the same time, causing erroneous results. Compiling and running on franklin:
> cat density.pbs
#PBS -N density
#PBS -j oe
#PBS -o density.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o density -mp=nonuma -Minfo=mp density.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./density
> qsub density.pbs
500899.nid00003
> cat density.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
density.f90:
density:
40, Parallel region activated
43, Parallel loop activated; static block iteration allocation
55, Begin critical section
End critical section
57, Begin critical section
End critical section
62, Barrier
64, Begin critical section
End critical section
Parallel region terminated
MINMASS = 4.8334509E-06 MAXMASS = 0.9999951
XMIN = 4.3212090E-06 XMAX = 0.9999959
Total Particles = 100000
Total Mass = 50035.98
DENSITY( 1 ) = 50715.23 MASS( 1 ) = 5071.481
DENSITY( 2 ) = 50151.22 MASS( 2 ) = 5015.081
DENSITY( 3 ) = 50154.58 MASS( 3 ) = 5015.417
DENSITY( 4 ) = 49183.93 MASS( 4 ) = 4918.352
DENSITY( 5 ) = 49083.88 MASS( 5 ) = 4908.347
DENSITY( 6 ) = 50813.93 MASS( 6 ) = 5081.351
DENSITY( 7 ) = 51367.36 MASS( 7 ) = 5136.694
DENSITY( 8 ) = 49535.90 MASS( 8 ) = 4953.549
DENSITY( 9 ) = 50296.39 MASS( 9 ) = 5029.598
DENSITY( 10 ) = 49062.40 MASS( 10 ) = 4906.199
Particles on Grid = 100000
Total Mass on Grid = 50036.07
Application 4736271 resources: utime 0, stime 0
Running a second time reveals that the additions of numbers ranging over 5 orders of magnitude (MINMASS to MAXMASS) are not entirely associative; note the values of MASS and Total Mass. You will not get precisely the same results each time as you would if had used a single thread.
MINMASS = 4.8334509E-06 MAXMASS = 0.9999951 XMIN = 4.3212090E-06 XMAX = 0.9999959 Total Particles = 100000 Total Mass = 50035.98 DENSITY( 1 ) = 50715.10 MASS( 1 ) = 5071.468 DENSITY( 2 ) = 50151.42 MASS( 2 ) = 5015.101 DENSITY( 3 ) = 50154.55 MASS( 3 ) = 5015.414 DENSITY( 4 ) = 49184.07 MASS( 4 ) = 4918.366 DENSITY( 5 ) = 49083.94 MASS( 5 ) = 4908.353 DENSITY( 6 ) = 50813.96 MASS( 6 ) = 5081.354 DENSITY( 7 ) = 51367.18 MASS( 7 ) = 5136.676 DENSITY( 8 ) = 49535.82 MASS( 8 ) = 4953.542 DENSITY( 9 ) = 50296.48 MASS( 9 ) = 5029.606 DENSITY( 10 ) = 49062.43 MASS( 10 ) = 4906.203 Particles on Grid = 100000 Total Mass on Grid = 50036.08
As an illustration of how things would go wrong, here's sample output with the ATOMIC directive removed from the code. Note the particle conservation check fails:
MINMASS = 4.8334509E-06 MAXMASS = 0.9999951 XMIN = 4.3212090E-06 XMAX = 0.9999959 Total Particles = 100000 Total Mass = 50035.98 DENSITY( 1 ) = 45400.19 MASS( 1 ) = 4539.981 DENSITY( 2 ) = 44862.25 MASS( 2 ) = 4486.188 DENSITY( 3 ) = 45124.16 MASS( 3 ) = 4512.378 DENSITY( 4 ) = 44338.48 MASS( 4 ) = 4433.811 DENSITY( 5 ) = 44486.02 MASS( 5 ) = 4448.565 DENSITY( 6 ) = 45709.66 MASS( 6 ) = 4570.928 DENSITY( 7 ) = 46367.48 MASS( 7 ) = 4636.710 DENSITY( 8 ) = 44487.38 MASS( 8 ) = 4448.701 DENSITY( 9 ) = 45177.90 MASS( 9 ) = 4517.752 DENSITY( 10 ) = 44100.41 MASS( 10 ) = 4410.005 Particles on Grid = 90036 Total Mass on Grid = 45005.02 Application 4736632 resources: utime 0, stime 0
ORDERED Directive
The code enclosed with ORDERED and END ORDERED directives is executed in the order in which iterations would be executed in a sequential execution of the loop. The ORDERED directive can only appear in the context of a DO or PARALLEL DO directive. It is illegal to branch into or out of an ORDERED block.
!$OMP ORDERED block !$OMP END ORDERED
Example
PROGRAM ORDERED
IMPLICIT NONE
INTEGER, PARAMETER:: N=1000, M=4000
REAL, DIMENSION(N,M):: X,Y
REAL, DIMENSION(N):: Z
INTEGER I,J
CALL RANDOM_NUMBER(X)
CALL RANDOM_NUMBER(Y)
Z=0.0
PRINT *, 'The first 20 values of Z are:'
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I,J)
!$OMP DO SCHEDULE(DYNAMIC,2) ORDERED
DO I=1,N
DO J=1,M
Z(I) = Z(I) + X(I,J)*Y(J,I)
END DO
!$OMP ORDERED
IF(I<21) THEN
PRINT *, 'Z(',I,') =',Z(I)
END IF
!$OMP END ORDERED
END DO
!$OMP END DO
!$OMP END PARALLEL
END PROGRAM ORDERED
Compiling and running on franklin:
> cat ordered.pbs
#PBS -N ordered
#PBS -j oe
#PBS -o ordered.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V
cd $PBS_O_WORKDIR
ftn -o ordered -mp=nonuma -Minfo=mp ordered.f90
export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./ordered
> qsub ordered.pbs
500980.nid00003
> cat ordered.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
ordered.f90:
ordered:
15, Parallel region activated
18, Parallel loop activated; dynamic iteration allocation
31, Barrier
Parallel region terminated
The first 20 values of Z are:
Z( 1 ) = 1028.067
Z( 2 ) = 1015.378
Z( 3 ) = 1010.786
Z( 4 ) = 1003.594
Z( 5 ) = 990.9749
Z( 6 ) = 982.2872
Z( 7 ) = 1021.567
Z( 8 ) = 1019.952
Z( 9 ) = 1011.410
Z( 10 ) = 986.6424
Z( 11 ) = 987.3596
Z( 12 ) = 992.1103
Z( 13 ) = 1001.674
Z( 14 ) = 1000.352
Z( 15 ) = 1021.354
Z( 16 ) = 1009.728
Z( 17 ) = 996.0969
Z( 18 ) = 1005.107
Z( 19 ) = 993.8898
Z( 20 ) = 981.4053
Application 4736903 resources: utime 22, stime 3
Without the ordered directive, sample output looks like this:
The first 20 values of Z are: Z( 3 ) = 1010.786 Z( 1 ) = 1028.067 Z( 2 ) = 1015.378 Z( 5 ) = 990.9749 Z( 6 ) = 982.2872 Z( 7 ) = 1021.567 Z( 8 ) = 1019.952 Z( 9 ) = 1011.410 Z( 10 ) = 986.6424 Z( 11 ) = 987.3596 Z( 12 ) = 992.1103 Z( 13 ) = 1001.674 Z( 14 ) = 1000.352 Z( 15 ) = 1021.354 Z( 16 ) = 1009.728 Z( 17 ) = 996.0969 Z( 18 ) = 1005.107 Z( 19 ) = 993.8898 Z( 20 ) = 981.4053 Z( 4 ) = 1003.594 Application 4736940 resources: utime 24, stime 2
![]() |
Page last modified: Fri, 21 May 2004 22:21:36 GMT Page URL: http://www.nersc.gov/nusers/help/tutorials/openmp/print.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |

