| Terascale
Optimal PDE Simulations
The Terascale Optimal PDE Simulations (TOPS) ISIC is researching
and developing and will deploy a toolkit of open-source solvers
for the nonlinear partial differential equations (PDEs) that
arise in many application areas, including fusion, accelerator
design, global climate change, and the collapse of supernovae.
These algorithms aim to reduce computational bottlenecks by
one or more orders of magnitude on terascale computers, enabling
scientific simulation on a scale heretofore impossible.
 |
|
 |
|
| Figure
3 Memory bandwidth benchmark results.
Asterisks show the model bandwidth computed for each of
the last eight bars (non-standard STREAM kernels). (Click
on image for larger version.) |
|
One of the major TOPS activities in 2002 involved the magnetohydrodynamics
(MHD) code M3D. A
Hypre algebraic multigrid solver was ported into M3D underneath
the existing PETSc interface, and scalability studies were
done on M3D production runs. PETSc itself, a suite of data
structures and routines for solving PDEs, is undergoing performance
tuning and testing on terascale applications.
The Berkeley Benchmarking and Optimization Group (BeBOP)
successfully achieved over 80% of the modeled peak Mflop/s
for performance-tuned models of sparse matrix-vector products
and sparse triangular solutions on the IBM SP Power 3 processor
nodes at NERSC (Figure 3). The model estimates the best possible
performance for a computer’s memory system. This translates
to 15–20% of the processor’s peak performance,
a good gain over previous codes.
|
|
 |
|
|
|
| Figure
4 Megaflop rate per processor (cubic grids, nested
dissection). |
|
|
 |
Amestoy et al. conducted a comprehensive
study and comparison of two state-of-the-art direct solvers
for large sparse sets of linear equations on large-scale distributed-memory
computers. One is a multifrontal solver called MUMPS, the
other is a supernodal solver called SuperLU. The authors described
the main algorithmic features of the two solvers and compared
their performance characteristics with respect to uniprocessor
speed, interprocessor communication, memory requirements,
and scalability (Figure 4). They found that both solvers have
strengths and weaknesses.
INVESTIGATORS
D. Keyes, Old Dominion University; B. Smith and J. More, Argonne
National Laboratory; E. G. Ng, Lawrence Berkeley National
Laboratory; R. Falgout, Lawrence Livermore National Laboratory;
J. W. Demmel, University of California, Berkeley; O. Ghattas,
Carnegie Mellon University; O. Widlund, New York University;
S. McCormick, University of Colorado, Boulder; J. Dongarra,
University of Tennessee.
PUBLICATIONS
R. Vuduc, J. W. Demmel, K. A. Yelick, S. Kamil, R. Nishtala,
and B. Lee, “Performance optimizations and bounds for
sparse matrix-vector multiply,” Proc. of the IEEE/ACM
Conference on Supercomputing, 2002.
P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent,
and X. S. Li, “Analysis and comparison of two general
sparse solvers for distributed memory computers,” ACM
Transactions on Mathematical Software 27,
388 (2001).
URL
http://www.tops-scidac.org/ |