Historical: Using OpenMP on IBM SP Seaborg was Decommissioned in January 2008
OpenMP supports multi-platform shared-memory parallel programming in C/C++ and
Fortran on many architectures.
This document describes how to compile and run OpenMP programs
on IBM SP systems.
Contents
Additional Information
Basic usage
OpenMP provides an easy method for SMP-style parallelization of
discrete, small sections of code, such as a do loop.
This can be very helpful
for code development and testing.
However, OpenMP has a number of limitations which make it less desirable
than MPI for large scale computations.
- OpenMP can only be used among the processors of a single node. For use
with production scale, multi-node codes, OpenMP threads must be combined
with MPI processes.
- Debugging OpenMP threads is complex using
Totalview.
- OpenMP provides many ways to write codes which compile and run,
but provide unexpected results, particularly for codes with large granularity
(e.g., calls to subroutines). Local variables in subroutine calls will
probably be shared among threads; users must be take care
that the desired memory-association is in effect.
- All of our examples below use the Fortran90 xlf90_r compiler.
If you use the Fortran77 xlf_r compiler, be aware that the default
is -qsave, which may result in unexpected sharing of variables in
subroutines.
OpenMP is available for
- Fortran using the IBM xlf_r compiler,
- C using the IBM xlc_r compiler, or
- C++ using the IBM xlC_r compiler.
To compile and run a Fortran code using OpenMP use:
% xlf90_r -qsmp=omp -o exename filename.f
% ./exename
To compile and run a C code using OpenMP use:
% xlc_r -qsmp=omp -o exename filename.c
% ./exename
To compile and run a C++ code using OpenMP use:
% xlC_r -qsmp=omp -o exename filename.C
% ./exename
It should be noted that the -qsmp=omp option is required for both
the compile step and the link step.
A program built in this way will automatically use a number of threads
equal to the number of processors on the node.
Here's a small example code that prints out the number of threads
created.
! Filename: threads.f
! Compile: xlf90_r -o threads -qsmp=omp threads.f
PROGRAM HELLO
IMPLICIT NONE
INTEGER nthreads, tid, OMP_GET_NUM_THREADS
INTEGER OMP_GET_THREAD_NUM
! Fork a team of threads
!$OMP PARALLEL PRIVATE(nthreads, tid)
! Obtain and print thread id
tid = OMP_GET_THREAD_NUM()
print *, 'Hello World from thread ', tid
! Only master thread does this
IF (tid .EQ. 0) THEN
nthreads = OMP_GET_NUM_THREADS()
print *, 'Number of threads ', nthreads
END IF
! All threads join master thread and disband
!$OMP END PARALLEL
END
The same small example code in C is shown below:
/* Filename: threads.c
Compile: xlc_r -o threads -qsmp=omp threads.c */
#include "omp.h"
int main ()
{
int nthreads, tid;
/* Fork a team of threads */
#pragma omp parallel private(nthreads, tid)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
printf("Hello World from thread %d\n", tid);
/* Only master thread does this */
if (tid==0) {
nthreads = omp_get_num_threads();
printf("Number of threads %d\n", nthreads);
}
}
return 0;
}
The same small example code in C++ is shown below. Note that
printf is used for output rather than the stream cout.
This is because printf produces more coherent output for
multiple threads; different parts of the cout streams would
be mixed in the output from the different parallel threads.
// Filename: threads.C
// Compile: xlC_r -o threads -qsmp=omp threads.C
#include <iostream>
#include <omp.h>
int main ()
{
int nthreads, tid;
// Fork a team of threads
#pragma omp parallel private(nthreads, tid)
{
// Obtain and print thread id
tid = omp_get_thread_num();
printf("Hello World from thread %d\n", tid);
// Only master thread does this
if (tid==0) {
nthreads = omp_get_num_threads();
printf("Number of threads %d\n", nthreads);
}
}
return 0;
}
Compiling and running on on the IBM SP is as follows:
% xlf90_r -o threads -qsmp=omp threads.f
** hello === End of Compilation 1 ===
1501-510 Compilation successful for file threads.f.
% ./threads
Hello World from thread 8
Hello World from thread 0
Number of threads 8
Hello World from thread 3
...
% xlc_r -o threads -qsmp=omp threads.c
% ./threads
Hello World from thread 0
Number of threads 8
Hello World from thread 5
...
% xlC_r -o threads -qsmp=omp threads.C
% ./threads
Hello World from thread 0
Number of threads 8
Hello World from thread 8
...
Note that you do not have to use poe for pure OpenMP
codes that are intended to run on a single node.
Changing the number of threads and tasks
You can change the number of threads by setting the
OMP_NUM_THREADS environment variable.
The deafult is to use the same number of threads as
cpus available on a node.
For example, to create 8 threads on a single node
% setenv OMP_NUM_THREADS 8
% ./threads
Hello World from thread 0
Number of threads 8
Hello World from thread 1
Hello World from thread 2
Hello World from thread 3
Hello World from thread 4
Hello World from thread 5
Hello World from thread 6
Hello World from thread 7
The same thing may be accomplished by using poe to request
one task on a single node.
That one task will run OMP_NUM_THREADS threads.
% poe ./threads -nodes 1 -tasks_per_node 1
The environment variable
XLSMPOPTS can be used to control
the behavior of OpenMP threads (including the number of
threads).
Running on more than one node
You can use poe to run on more than a single
node. However, the nodes can not communicate using only OpenMP;
see "Mixing OpenMP and MPI" in the next section.
Set -nodes to the number of nodes,
-tasks_per_node to 1, and
OMP_NUM_THREADS to whatever you wish, or use the
default.
For example, this will run on 2 nodes with the default number
of OMP threads per node:
% unsetenv OMP_NUM_THREADS
% poe ./threads -nodes 2 -tasks_per_node 1
Here is an analogous LoadLeveler script that compiles
and runs the three examples above:
#@ class = debug
#@ shell = /usr/bin/csh
#@ node = 2
#@ tasks_per_node = 1
#@ network.MPI = csss,not_shared,us
#@ wall_clock_limit = 00:02:00
#@ notification = complete
#@ job_type = parallel
#@ output = $(jobid).$(stepid).out
#@ error = $(jobid).$(stepid).out
#@ environment = COPY_ALL
#@ queue
set echo
xlf90_r -o threads -qsmp=omp threads.f
poe ./threads
xlc_r -o threads -qsmp=omp threads.c
poe ./threads
mpxlf90_r -o threads -qsmp=omp threads.f
./threads
mpcc_r -o threads -qsmp=omp threads.c
./threads
exit
Note that poe is needed in this script when the code
was compiled with a "serial" version of the compiler. Without poe
the code will not run on more than a single node.
However, if a "parallel" version of the compiler, such as
mpxlf90_r or mpcc_r, is used to create the executable
then poe does not need to
to be used on the command line.
The use of poe in batch scripts can be confusing, because
LoadLeveler keywords will override poe command line options.
Mixing OpenMP and MPI
OpenMP and MPI can be freely mixed in Fortran source code.
You must use a "multiprocessor" and "thread-safe" compiler
invocation with the -qsmp=omp option, e.g.,
- mpxlf90_r -qsmp=omp for Fortran,
- mpcc_r -qsmp=omp for C, and
- mpCC_r -qsmp=omp for C++.
Some users have reported cases where this
mixed-mode programming strategy increases
a code's runtime performance.
Here's the same code as above, but with some MPI calls mixed in:
! Filename: hello.f
! Compile: mpxlf90_r -o hello -qsmp=omp hello.f
! Run: poe ./hello -nodes 2 -tasks_per_node 1
PROGRAM HELLO
IMPLICIT NONE
INCLUDE 'mpif.h'
INTEGER nthreads, tid, OMP_GET_NUM_THREADS
INTEGER OMP_GET_THREAD_NUM, myid, ierr, nprocs
CHARACTER*32 buf
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, nprocs, ierr )
print *, "MPI Process number ", myid, " of ", nprocs, " is alive"
! Fork a team of threads on each MPI task
!$OMP PARALLEL PRIVATE(nthreads, tid)
! Obtain and print thread id
tid = OMP_GET_THREAD_NUM()
! print *, 'Hello World from OMP thread ', tid, 'on process ',myid
! Only master thread does this
IF (tid==0) THEN
nthreads = OMP_GET_NUM_THREADS()
print *, 'Number of OMP threads ', nthreads, 'on process ',myid
END IF
! All threads join master thread and disband
!$OMP END PARALLEL
if (myid==0) buf='an MPI message from process 0'
call MPI_BCAST(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD,ierr)
if(myid/=0) print *, 'Process ', myid, "got ", buf
call MPI_FINALIZE(ierr)
END
Here is the same OMP/MPI example in C:
/* Filename: hello.c
Compile: mpcc_r -o hello -qsmp=omp hello.c
Run: poe ./hello -nodes 2 -tasks_per_node 1 */
#include "mpi.h"
#include "omp.h"
int main(int argc, char* argv[])
{
int nthreads, tid;
int myid, nprocs;
char buf[32];
MPI_Init(&argc, &argv); /* start MPI */
MPI_Comm_rank(MPI_COMM_WORLD, &myid); /* get my proc id */
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); /* get no.r of procs */
printf("MPI Process number %d of %d is alive\n", myid, nprocs);
/* Fork a team of threads */
#pragma omp parallel private(nthreads, tid)
{
/* Obtain thread id */
tid = omp_get_thread_num();
/* Only master thread does this */
if (tid==0) {
nthreads = omp_get_num_threads();
printf("Number of threads %d on process %d\n",
nthreads, myid);
}
}
if (myid==0) { strcpy(buf,"an MPI message from process 0"); }
MPI_Bcast(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD);
if (myid!=0) {printf("Process %d got %s\n", myid, buf); }
MPI_Finalize(); /* finish MPI */
return 0;
}
Finally, here is the same OMP/MPI example in C++:
// Filename: hello.C
// Compile: mpCC_r -o hello -qsmp=omp hello.C
// Run: poe ./hello -nodes 2 -tasks_per_node 1
#include <iostream>
#include <mpi.h>
#include <omp.h>
int main(int argc, char* argv[])
{
int nthreads, tid;
int myid, nprocs;
char buf[32];
MPI_Init(&argc, &argv); // start MPI
MPI_Comm_rank(MPI_COMM_WORLD, &myid); // get my processor id
MPI_Comm_size(MPI_COMM_WORLD, &nprocs); // get number of procs
printf("MPI Process number %d of %d is alive\n", myid, nprocs);
// Fork a team of threads
#pragma omp parallel private(nthreads, tid)
{
// Obtain thread id
tid = omp_get_thread_num();
// Only master thread does this
if (tid==0) {
nthreads = omp_get_num_threads();
printf("Number of threads %d on process %d\n",
nthreads, myid);
}
}
if (myid==0) { strcpy(buf,"an MPI message from process 0"); }
MPI_Bcast(buf,32,MPI_CHARACTER,0,MPI_COMM_WORLD);
if (myid!=0) {printf("Process %d got %s\n", myid, buf); }
MPI_Finalize(); // finish MPI
return 0;
}
To compile:
% mpxlf90_r -o hello -qsmp=omp hello.f
** hello === End of Compilation 1 ===
1501-510 Compilation successful for file hello.f.
or
% mpcc_r -o hello -qsmp=omp hello.c
or
% mpCC_r -o hello -qsmp=omp hello.C
and to run on two nodes with 1 MPI process per node and
the default of 16 OpenMP threads per node:
% poe ./hello -nodes 2 -tasks_per_node 1
Here's a LoadLeveler script to run the code on two nodes with
2 total MPI tasks and 16 OMP threads per node.
#@ class = debug
#@ shell = /usr/bin/csh
#@ node = 2
#@ tasks_per_node = 1
#@ network.MPI = csss,not_shared,us
#@ wall_clock_limit = 00:02:00
#@ notification = complete
#@ job_type = parallel
#@ output = $(jobid).$(stepid).out
#@ error = $(jobid).$(stepid).err
#@ environment = COPY_ALL
#@ queue
./hello
exit
After the job completes
this is the standard output file:
MPI process number 1 of 2 is alive
MPI process number 0 of 2 is alive
Number of OMP threads 16 on process 0
Number of OMP threads 16 on process 1
Process 1 got an MPI message from process 0
|