Dynamic Performance Monitor for OpenMP

(DPOMP)

(C) COPYRIGHT International Business Machines Corp. 2004 All Rights Reserved.

 

David Klepacki

Advanced Computing Technology Center

IBM Research

klepacki@us.ibm.com
Phone: +1-914-945-2628
Fax: +1-914-945-4269

Version 1.2.3– January 20, 2003


LICENSE TERMS:

The Dynamic Performance Monitoring Interface for OpenMP (DPOMP) Tool is distributed under a nontransferable, nonexclusive, and revocable license. The DPOMP software is provided "AS IS". IBM MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  IBM has no obligation to defend or indemnify against any claim of infringement, including, but not limited to, patents, copyright, trade secret, or intellectual property rights of any kind. IBM is under no obligation to maintain, correct, or otherwise support this software. IBM does not represent that the DPOMP Tool will be made generally available. IBM does not represent that any software made generally available will be similar to or compatible with the DPOMP Tool.


Table of Contents

1.      The DPOMP Tool

2.      POMP Compliant Libraries

2.1.   The timer_probe

2.2.   The pomprof_probe

2.3.   Building a POMP library

2.4.   Other POMP Libraries

3.      DPOMP Usage

4.      Instrumentation Control

4.1.   Flags

4.2.   Environment Variables

5.      Limitations

6.      Installation

7.      Release History

 


1. The DPOMP Tool

DPOMP is developed based on IBM’s dynamic instrumentation infrastructure (DPCL).  This supports binary instrumentation of FORTRAN, C and C++ programs.  The DPOMP Tool was developed for dynamic instrumentation of OpenMP applications.  It inserts into the application binary calls to a POMP (Performance Monitoring Interface for OpenMP) compliant library.  The DPOMP tool reads the binary of the application, as well as the binary of a POMP compliant library and instruments the binary of the application with calls defined in the POMP compliant library.  DPOMP requires DPCL version 3.2.6.

The versions of compilers supported are xlf 8.1.1.0 and xlc 6.0.0.0 and their respective thread safe and parallel versions.  The following IBM processors and operating systems are supported:  Power3 and Power4 with AIX 5L.

For more information on POMP, please refer to http://www.caspur.it/ewomp02/PAPERI/EWOMP02-POMP.pdf

For more information on DPOMP, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/Tuesday/Session6/T16p.pdf


2. POMP Compliant Libraries

“POMP” is a standard API for performance monitoring of OpenMP applications. The POMP proposal was submitted to the OpenMP Architecture Review Board to be incorporated in future OpenMP standard. This distribution provides two libraries: timer_probe and pomprof_probe. In addition, since DPOMP supports binary instrumentation of any POMP Compliant library, users can build their own monitoring library

2.1 The timer_probe

The “timer_probe” is provided as an example of a tracing probe. It displays all the events that were instrumented (“Handles”), initialize a timer to zero, and whenever an event is executed, it prints the event, its handle number and the corresponding time. For mixed-mode applications, timer_probe will display the information for each MPI task (in this case, for easier understanding of the output, it would be helpful to set the environment variable MPI_LABELIO to YES, in order to have the output from the different tasks labeled with the corresponding task ID).

2.2 The pomprof_probe

The “pomprof_probe” is a POMP library that generates a profile of the OpenMP activity. It generates a set of performance files in XML (one for each MPI task), which are named: pomprof_<mpi_task>_<pid>.viz. These files are used as input to the PeekPerf performance visualizer.

 

For each OpenMP instrumented construct, the pomprof_probe provides the following metrics:

Summary view: (maximum value from all threads)

·        Count: the number of times the event was executed

·        Exclusive time (Excl. Time): The total time not counting time inside of other events

·        Inclusive time (Incl. Time): Wall clock time for the event (including other events).

·        Percentage of total overhead (% Overhead): 100 * (thread timecomputation time) / thread time

·        Percentage of imbalance (% imbalance): 100 * (comp time – MIN(comp time(i))) / MIN(comp time(i))

·        Average computation time (Avg. Comp Time): SUM(comp time(i)) / Number of threads

Detailed View:

·        Task (Task): MPI Task ID

·        Thread (thread): OpenMP thread ID

·        Time in master: (Time in Master): Time in the master thread (Wall Clock Time)

·        Thread time: (TT: Thread Time): Wall clock time for the execution of each thread

·        Computation time: (CT: Comp. Time) Time per thread in the body of the OpenMP construct

·        Percentage of imbalance (% imbalance): 100 * (comp time – MIN(comp time(i))) / MIN(comp time(i))

·        Total overhead: (TO: TT – CT): thread timecomp time

·        Percentage of the total overhead due to barrier (%TO (Barrier)): 100 * barrier time / incl time

The barrier time is measured as the time between the end of the thread execution and end of the last thread.

·        Percentage of the total overhead due to the run time library (%TO (RTL)): 100 * (RTL time – barrier time) / incl time

2.3 Building a POMP Library

The source code of the POMP library “timer_probe” is provided in this distribution (see doc/examples/sample_probe/). This code is provided as an example for users that want to build their own performance measurement probe. Users may refer to http://www.caspur.it/ewomp02/PAPERI/EWOMP02-POMP.pdf for details on the POMP API and how to build a POMP compliant library. Also, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/Tuesday/Session6/T16p.pdf for more information on DPOMP, its supported features, as well as the modifications to the POMP proposal that were needed for its implementation.

 

Users can also define a POMP compliant library that is not complete (i.e, not all the POMP functions are implemented in the library).  This helps users to define only POMP functions that are of interest to them, thus limiting the amount of instrumentation.

 

2.4 Other POMP Libraries

Other POMP libraries were developed by the Central Institute for Applied Mathematics (ZAM) at the Forschungszentrum Jülich, and can be downloaded from http://www.fz-juelich.de/zam/kojak. These libraries are part of the KOJAK toolkit (Kit for Objective Judgment and Knowledge-based Detection of Performance Bottlenecks).

 


3. DPOMP Usage

Usage:

dpomp [ flags ]  [ lib-probe ]  [executable-file  [ executable-args ] ]

Where

            flags:

-f          Name of the file containing user functions to instrument.

-l          Name of the file to write user functions.

-s         User functions will be output to standard out.

-h         Prints this message and then exit.

-v         Print version information and then exit.

“lib-probe” is the name of the POMP compliant library. 

The user can specify the path to lib-probe with the environment variable DPOMP_PROBE_PATH.  Example 5 describes how lib-probe path can be specified.  If a path is associated with lib-probe (e.g., dpomp ../../probes/lib-probe) then it will have precedence over the environment variable.  If the path is not provided and the environment variable is not set, then dpomp will search for the lib-probe in the current directory.

“lib-probe” is not required with -l and -s options.

Note:

DPOMP requires the environment variable MP_HOSTFILE to be set, even for non-MPI programs.

Examples:

1.                  To use dpomp without flags with a probe.

dpomp  lib-probe  a.out

dpomp parses the probe library (lib-probe), looking for POMP functions. By default, dpomp instruments all OpenMP constructs for which there is a corresponding POMP function in the probe library. It also instruments all user functions called from the main program provided that there is a definition in the probe library for user functions. In addition, for MPI applications, it instruments all MPI calls in the program.

2.                  To use dpomp to list all user defined functions in the application to standard out.

dpomp  -s  a.out

3.                  To use dpomp to list all user defined functions in the application to a file.

dpomp  -l  a.funcnames  a.out

4.                        To use dpomp to instrument all user defined functions, i.e., the functions called from main program as well as functions called outside of main program.  File “a.funcnames” should be created using example 3 before using it in this example.

dpomp  -f  a.funcnames  a.out

5                    To specify lib-probe path

Export DPOMP_PROBE_PATH=/a/b/c:/dir:/dir2/lib:.

Case 1: 

dpomp  /e/f/g/lib-probe  a.out

Here /e/f/g/lib-probe is used because lib-probe start with “/”

Case 2: 

dpomp  ./lib-probe  a.out

Here ./lib-probe is used because lib-probe start with “.”

Case 3: 

dpomp  lib-probe  a.out

Since lib-probe does not start with “.” or “/” and DPOMP_PROBE_PATH exists, the first occurrence of /a/b/c/lib-probe /dir/lib-probe  /dir2/lib/lib-probe ./lib-probe is used

For more information, please refer to http://www.rz.rwth-aachen.de/ewomp03/omptalks/wednesday/PET2/derose_dpomp.pdf


4. Instrumentation Control

By default, dpomp instruments all OpenMP constructs for which there is a corresponding POMP function in the probe library. It also instruments all user functions called from the main program provided that there is a definition in the probe library for user functions. In addition, for MPI applications, it instruments all MPI calls in the program. One can set the environment variable DPOMP_NO_MPI_INSTRUMENTATION to disable the MPI instrumentation.

 

DPOMP allows users to specify the events to monitor through flags, environment variable and by defining a POMP compliant library that contains POMP functions corresponding to the events of interest.

4.1 Flags

 

As described in Section 3, by default dpomp instruments user functions called from the main program. To instrument user functions of interest, a file with the function names (e.g., a.funcnames) has to be given as input to dpomp, as shown in example 4 of Section 3. 

 

4.2 Environment variabes

 

      Users can enable or disable the monitoring of events through the following environment variables:

      POMP_<Group>  <Level>

      where “group” and “level” are defined as follows.

 

=======================================================================

  Group                         Constructs                                                 Level

=======================================================================

PARALLEL                 Parallel                                                       none, EnterExit, BeginEnd

LOOP                          Do/For                                                      none, EnterExit, Chunks

WORKSHARE            Section, Work share, Single                       none, EnterExit, BeginEnd

SYNC                          Critical, Ordered                                        none, EnterExit, BeginEnd

                                     Barrier                                                       none, EnterExit

                                     Master                                                      none, BeginEnd

                                     Atomic, Flush                                            none, Event

USER                           Function                                                    none, EnterExit

RUNTIME                   OpenMP run-time library routines               none, Event, EnterExit

=======================================================================

Levels seperated by "," include previous levels.  For example for group “LOOP”, this means “none” would specify to monitor no loop events at all, “EnterExit” specifies to monitor enter and exit of loop and “BeginEnd” would specifies to monitor enter, exit, begin and end of loop events.  Similarly, for group "SYNC", "Event" would specify to instrument only Atomic and Flush events, "EnterExit" would specify to instrument enter and exit of Critical, Barrier, etc, and finally "BeginEnd" would specifiy to instrument all "Sync" calls.

 


5. Limitations

Due to DPCL limitations with 64-bit binaries, only limited support is provided to instrument 64-bit binaries.

Due to compiler optimizations, a few OpenMP constructs cannot be instrumented.

Combined work shared constructs, such as “C$OMP Parallel Do” are not split into “OMP Parallel” / “OMP DO”.

Implicit barriers are not exposed by the compiler.

DPCL instrumentation of binaries compiled with “–O5” optimization sometimes fails the source code mapping.

DPCL fail to obtain the correct line number of the end of a construct for binaries compiled with optimization levels “-O3” or above. In this case, dpomp sets the end line number of a construct to be the start line +1.


6. Installation

Install dpomp in a directory (usually /bin).  This directory should be accessible through the $PATH environment variable.


 

7. Release History

 

Version 1.2.3 (01/20/2004)

o       Probe_timer renamed to “timer_probe”.

o       Probe_timer updated to support “chunks”.

o       Fixed the problem of having in some situations multiple handles for the same OpenMP construct.

o       Updated documentation

o       Pomprof_probe splits the total overhead into three components: Barrier time, Load imbalance time, and Run time. The percentage of overhead for each component is now computed as the ratio of the overhead of the component to the total inclusive time.

 

Version 1.1.5 (12/01/2003)

o       Initial Release