table of contents advances in computational science research news the NERSC center

research news

computer science

Self-Tuning Software

Z. Chen, J. Dongarra, P. Luszczek, and K. Roche, “The LAPACK for Clusters project: An example of self adapting numerical software,” Proc. 37th Hawaii International Conf. on System Sciences (2004). ASCR-MICS, NSF

As computing systems become more powerful and complex, tuning applications for high performance becomes a major challenge. The LAPACK for Clusters project has developed a self-adapting framework for the popular linear algebra software library that automates performance tuning, adapting to the user’s problem and the computational environment in an attempt to extract near-optimum performance. Test results show that self-adaptation can come very close to matching the performance of a custom-tuned application, and with minimal overhead.

Mapping the Network Topology

D. Turner, B. Tong, and M. Sosonkina, “Mapping algorithms to the network topology in a portable manner,” Proc. PARA'04 Workshop on the State-of-the-Art in Scientific Computing (2004). ASCR-MICS

Most scientific applications are unable to take advantage of the changing topology of the system network, resulting in loss of performance and scalability. The NodeMap project is being developed to automatically determine the network topology at runtime and provide an optimal mapping of a 2D or 3D algorithm. In a test of the concept underlying NodeMap, Turner et al. showed that the performance of a classical molecular dynamics code could be greatly enhanced by simply remapping the node arrangements to avoid saturation costs as much as possible. Parallel efficiency improved from 50% to 70% for 10,000,000 atoms on 1024 processors.

Compiling Irregular Applications

J. Su and K. Yelick, “Array prefetching for irregular array accesses in Titanium,” IPDPS Workshop on Java for Parallel and Distributed Computing, Santa Fe, New Mexico, April 2004. ASCR-MICS, NSF, NDSEG

Compiling irregular applications, such as sparse matrix and unstructured mesh codes, is a challenging problem. In a compiler for Titanium, a dialect of Java designed for high performance computing, Su and Yelick developed an inspector-executor style optimization framework, which iterates through a loop once, collecting information about random memory access to local or remote memory, and then aggregates and schedules the required communication. A novel aspect is the use of a profile-based performance model to automatically select the optimal communication method. This advance allows application programmers to write Titanium code in a straightforward way, and get performance comparable to a popular hand-tuned library.