David Klepacki
Advanced Computing
IBM Research
klepacki@us.ibm.com
Phone: +1-914-945-2628
Fax: +1-914-945-4269
Version 1.2.3–
LICENSE TERMS:
The Dynamic Performance Monitoring Interface for OpenMP (DPOMP) Tool is distributed under a nontransferable, nonexclusive, and revocable license. The DPOMP software is provided "AS IS". IBM MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IBM has no obligation to defend or indemnify against any claim of infringement, including, but not limited to, patents, copyright, trade secret, or intellectual property rights of any kind. IBM is under no obligation to maintain, correct, or otherwise support this software. IBM does not represent that the DPOMP Tool will be made generally available. IBM does not represent that any software made generally available will be similar to or compatible with the DPOMP Tool.
5. Limitations
DPOMP is developed based on IBM’s dynamic instrumentation infrastructure (DPCL). This supports binary instrumentation of FORTRAN, C and C++ programs. The DPOMP Tool was developed for dynamic instrumentation of OpenMP applications. It inserts into the application binary calls to a POMP (Performance Monitoring Interface for OpenMP) compliant library. The DPOMP tool reads the binary of the application, as well as the binary of a POMP compliant library and instruments the binary of the application with calls defined in the POMP compliant library. DPOMP requires DPCL version 3.2.6.
The versions of compilers supported are xlf 8.1.1.0 and xlc 6.0.0.0 and their respective thread safe and parallel versions. The following IBM processors and operating systems are supported: Power3 and Power4 with AIX 5L.
For more information on POMP, please refer to http://www.caspur.it/ewomp02/PAPERI/EWOMP02-POMP.pdf
For more
information on DPOMP, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/Tuesday/Session6/T16p.pdf
“POMP” is a standard API for performance monitoring of OpenMP applications. The POMP proposal was submitted to the OpenMP Architecture Review Board to be incorporated in future OpenMP standard. This distribution provides two libraries: timer_probe and pomprof_probe. In addition, since DPOMP supports binary instrumentation of any POMP Compliant library, users can build their own monitoring library
The “timer_probe” is provided as an example of a tracing probe. It displays all the events that were instrumented (“Handles”), initialize a timer to zero, and whenever an event is executed, it prints the event, its handle number and the corresponding time. For mixed-mode applications, timer_probe will display the information for each MPI task (in this case, for easier understanding of the output, it would be helpful to set the environment variable MPI_LABELIO to YES, in order to have the output from the different tasks labeled with the corresponding task ID).
The “pomprof_probe” is a POMP library that generates a profile of the OpenMP activity. It generates a set of performance files in XML (one for each MPI task), which are named: pomprof_<mpi_task>_<pid>.viz. These files are used as input to the PeekPerf performance visualizer.
For each OpenMP instrumented construct, the pomprof_probe provides the following metrics:
Summary
view: (maximum value from all threads)
· Count: the number of times the event was executed
· Exclusive time (Excl. Time): The total time not counting time inside of other events
· Inclusive time (Incl. Time): Wall clock time for the event (including other events).
· Percentage of total overhead (% Overhead): 100 * (thread time – computation time) / thread time
· Percentage of imbalance (% imbalance): 100 * (comp time – MIN(comp time(i))) / MIN(comp time(i))
· Average computation time (Avg. Comp Time): SUM(comp time(i)) / Number of threads
Detailed
View:
·
Task (Task): MPI Task ID
·
Thread (thread): OpenMP thread ID
·
Time in master: (Time in Master): Time in
the master thread (Wall Clock Time)
·
Thread time: (TT: Thread
Time): Wall clock time for the execution of each thread
·
Computation time: (CT: Comp.
Time) Time per thread in the body of the OpenMP construct
·
Percentage of imbalance (% imbalance):
100 * (comp time –
MIN(comp time(i))) / MIN(comp time(i))
·
Total overhead: (TO: TT – CT): thread time – comp time
·
Percentage of the total overhead due to
barrier (%TO (Barrier)): 100 * barrier time / incl time
The
barrier time is measured as the time between the end of the thread execution
and end of the last thread.
·
Percentage of the total overhead due to the
run time library (%TO (RTL)): 100 * (RTL time – barrier time) / incl time
The source code of the POMP library “timer_probe” is provided in this distribution (see doc/examples/sample_probe/). This code is provided as an example for users that want to build their own performance measurement probe. Users may refer to http://www.caspur.it/ewomp02/PAPERI/EWOMP02-POMP.pdf for details on the POMP API and how to build a POMP compliant library. Also, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/Tuesday/Session6/T16p.pdf for more information on DPOMP, its supported features, as well as the modifications to the POMP proposal that were needed for its implementation.
Users can also define a POMP compliant library that is not complete (i.e, not all the POMP functions are implemented in the library). This helps users to define only POMP functions that are of interest to them, thus limiting the amount of instrumentation.
Other POMP libraries were developed by the Central Institute for Applied Mathematics (ZAM) at the Forschungszentrum Jülich, and can be downloaded from http://www.fz-juelich.de/zam/kojak. These libraries are part of the KOJAK toolkit (Kit for Objective Judgment and Knowledge-based Detection of Performance Bottlenecks).
Usage:
dpomp [ flags ] [ lib-probe ] [executable-file [ executable-args ] ]
Where
flags:
-f Name of the file containing user functions to instrument.
-l Name of the file to write user functions.
-s User functions will be output to standard out.
-h Prints this message and then exit.
-v Print version information and then exit.
“lib-probe” is the name of the POMP compliant library.
The user can specify the path to lib-probe with the environment variable DPOMP_PROBE_PATH. Example 5 describes how lib-probe path can be specified. If a path is associated with lib-probe (e.g., dpomp ../../probes/lib-probe) then it will have precedence over the environment variable. If the path is not provided and the environment variable is not set, then dpomp will search for the lib-probe in the current directory.
“lib-probe” is not required with -l and -s options.
Note:
DPOMP requires the
environment variable MP_HOSTFILE to be set, even for non-MPI programs.
Examples:
1. To use dpomp without flags with a probe.
dpomp lib-probe a.out
dpomp parses the probe library (lib-probe), looking for POMP functions. By default, dpomp instruments all OpenMP constructs for which there is a corresponding POMP function in the probe library. It also instruments all user functions called from the main program provided that there is a definition in the probe library for user functions. In addition, for MPI applications, it instruments all MPI calls in the program.
2. To use dpomp to list all user defined functions in the application to standard out.
dpomp -s a.out
3. To use dpomp to list all user defined functions in the application to a file.
dpomp -l a.funcnames a.out
4. To use dpomp to instrument all user defined functions, i.e., the functions called from main program as well as functions called outside of main program. File “a.funcnames” should be created using example 3 before using it in this example.
dpomp -f a.funcnames a.out
Export DPOMP_PROBE_PATH=/a/b/c:/dir:/dir2/lib:.
Case 1:
dpomp /e/f/g/lib-probe a.out
Here /e/f/g/lib-probe is used because lib-probe start with “/”
Case 2:
dpomp ./lib-probe a.out
Here ./lib-probe is used because lib-probe start with “.”
Case 3:
dpomp lib-probe a.out
Since lib-probe does not start with “.” or “/” and DPOMP_PROBE_PATH exists, the first occurrence of /a/b/c/lib-probe /dir/lib-probe /dir2/lib/lib-probe ./lib-probe is used
For more information, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/wednesday/PET2/derose_dpomp.pdf
By default, dpomp instruments all OpenMP constructs for which there is a corresponding POMP function in the probe library. It also instruments all user functions called from the main program provided that there is a definition in the probe library for user functions. In addition, for MPI applications, it instruments all MPI calls in the program. One can set the environment variable DPOMP_NO_MPI_INSTRUMENTATION to disable the MPI instrumentation.
DPOMP allows users to specify the events to monitor through flags, environment variable and by defining a POMP compliant library that contains POMP functions corresponding to the events of interest.
As described in Section 3, by default dpomp instruments user functions called from the main program. To instrument user functions of interest, a file with the function names (e.g., a.funcnames) has to be given as input to dpomp, as shown in example 4 of Section 3.
Users can enable or disable the monitoring of events through the following environment variables:
POMP_<Group> <Level>
where “group” and “level” are defined as follows.
=======================================================================
Group Constructs Level
=======================================================================
PARALLEL Parallel none, EnterExit, BeginEnd
WORKSHARE Section, Work share, Single none, EnterExit, BeginEnd
SYNC Critical, Ordered none, EnterExit, BeginEnd
Barrier none, EnterExit
Master none, BeginEnd
Atomic, Flush none, Event
USER Function none, EnterExit
RUNTIME OpenMP run-time library routines none, Event, EnterExit
=======================================================================
Levels seperated by "," include previous levels. For example for group “LOOP”, this means “none” would specify to monitor no loop events at all, “EnterExit” specifies to monitor enter and exit of loop and “BeginEnd” would specifies to monitor enter, exit, begin and end of loop events. Similarly, for group "SYNC", "Event" would specify to instrument only Atomic and Flush events, "EnterExit" would specify to instrument enter and exit of Critical, Barrier, etc, and finally "BeginEnd" would specifiy to instrument all "Sync" calls.
Due to DPCL limitations with 64-bit binaries, only limited support is provided to instrument 64-bit binaries.
Due to compiler optimizations, a few OpenMP constructs cannot be instrumented.
Combined work shared constructs, such as “C$OMP Parallel Do” are not split into “OMP Parallel” / “OMP DO”.
Implicit barriers are not exposed by the compiler.
DPCL instrumentation of binaries compiled with “–O5” optimization sometimes fails the source code mapping.
DPCL fail to obtain the correct line number of the end of a construct for binaries compiled with optimization levels “-O3” or above. In this case, dpomp sets the end line number of a construct to be the start line +1.
Install dpomp in a directory (usually /bin). This directory should be accessible through the $PATH environment variable.
Version 1.2.3 (
o
Probe_timer renamed to
“timer_probe”.
o
Probe_timer updated to
support “chunks”.
o
Fixed the problem of having in some situations
multiple handles for the same OpenMP construct.
o
Updated documentation
o
Pomprof_probe splits
the total overhead into three components: Barrier time, Load imbalance time,
and Run time. The percentage of overhead for each component is now computed as
the ratio of the overhead of the component to the total inclusive time.
Version 1.1.5 (
o
Initial Release