NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Integrated Performance Monitoring (IPM)

IPM is a portable profiling infrastructure which provide a high level report on the execution of a parallel job. IPM reports hardware counters data, MPI function timinings, and memory usage. It provides a low overhead means to generate scaling studies or performance data for ERCAP submissions. When you run a job using the IPM module you will get a performance summary (see below) to stdout as well as a web accessible summary of all your IPM jobs.

The two main objectives of IPM are ease-of-use and scalability in performance analysis.

Usage

To use IPM load the ipm module :

s00509> module load ipm

On HPC architectures that support shared libraries that's all you need to do. Once the module is loaded you can run as you normally and get a performance profile once the job has successfully completed. You do not need to relink your code.

For static executables and architectures which do not support shared libraries a relink is required. You simply load the ipm module, add $(IPM) to your link line, and run as you normally would. Franklin is the only such system at NERSC that currently does not support shared libraries.

franklin> module load ipm
franklin> cc -o myapp *.o $(IPM)
...or...
franklin> ftn -o myapp *.o $(IPM)

Output and Results

Once the module has been loaded each parallel code will, upon completion, print a concise report to standard out. In addition, detailed results are available the day after the job completed from the Completed Jobs page.

Sample standard output follows:

##IPMv0.8######################################################################
#
# code   : ./bin/cg.B.32 (completed)
# host   : s10201/006001024C00_AIX        mpi_tasks : 32 on 2 nodes
# start  : 11/30/04/13:27:09              wallclock : 28.984938 sec
# stop   : 11/30/04/13:27:35              %comm     : 28.60
# gbytes : 6.62880e-01 total              gflop/sec : 2.41455e+00 total
#
###############################################################################

The field marked "total" are the aggregated values from all the parallel tasks. More detailed reports are possible, for example, a more detailed report looks like:

##IPMv0.8######################################################################
#
# code   : ./bin/cg.B.32 (completed)
# host   : s05601/006035314C00_AIX        mpi_tasks : 32 on 2 nodes
# start  : 11/30/04/14:35:34              wallclock : 29.975184 sec
# stop   : 11/30/04/14:36:00              %comm     : 27.72
# gbytes : 6.65863e-01 total              gflop/sec : 2.33478e+00 total
#
#
#                           [total]         <avg>           min           max
# wallclock                  953.272       29.7897       29.6092       29.9752
# user                        837.25       26.1641         25.71         26.92
# system                        60.6       1.89375          1.52          2.59
# mpi                        264.267       8.25834       7.73025       8.70985
# %comm                                    27.7234       25.8873       29.3705
# gflop/sec                  2.33478     0.0729619      0.072204     0.0745817
# gbytes                    0.665863     0.0208082     0.0195503     0.0237541
# PM_FPU0_CMPL           2.28827e+10   7.15084e+08   7.07373e+08   7.30171e+08
# PM_FPU1_CMPL           1.70657e+10   5.33304e+08   5.28487e+08   5.42882e+08
# PM_FPU_FMA             3.00371e+10    9.3866e+08   9.27762e+08   9.62547e+08
# PM_INST_CMPL           2.78819e+11   8.71309e+09   8.20981e+09   9.21761e+09
# PM_LD_CMPL             1.25478e+11   3.92118e+09   3.74541e+09   4.11658e+09
# PM_ST_CMPL             7.45961e+10   2.33113e+09   2.21164e+09   2.46327e+09
# PM_TLB_MISS            2.45894e+08   7.68418e+06   6.98733e+06   2.05724e+07
# PM_CYC                  3.0575e+11   9.55467e+09   9.36585e+09   9.62227e+09
#
#                            [time]       [calls]        <%mpi>      <%wall>
# MPI_Send                   188.386        639616         71.29        19.76
# MPI_Wait                   69.5032        639616         26.30         7.29
# MPI_Irecv                  6.34936        639616          2.40         0.67
# MPI_Barrier              0.0177442            32          0.01         0.00
# MPI_Reduce              0.00540609            32          0.00         0.00
# MPI_Comm_rank           0.00465156            32          0.00         0.00
# MPI_Comm_size          0.000145341            32          0.00         0.00
###############################################################################

The amount of detail reported information can be obtained using the options described in the next sections

Options

The interface to IPM is through environment variables and MPI_Pcontrol. The environment variable interface is selecetd at execute/submit time while the later allows for dynamic control of IPM. A description of environment variables supported is given below. A description of the MPI_Pcontrol interface is included in the main IPM documentation
Variable Values Description
IPM_REPORT terse (default) Aggregate wallclock time, memory usage and flops are reported along with the percentage of wallclock time spent in MPI calls.
  full Each HPM counter is reported as are all of wallclock, user, system, and MPI time. The contribution of each MPI call to the communication time is given.
  none No report

Features and Futures

Items in plan to be completed:

  • profiling of switch interfaces
  • profiling of serial codes
  • cross profile comparisons

IPM is Open Source software. If you have suggestions, questions, or bug reports please direct them to the IPM team.


LBNL Home
Page last modified: Thu, 04 Sep 2008 00:10:56 GMT
Page URL: http://www.nersc.gov/nusers/resources/software/tools/ipm.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science