National Energy Research Scientific Computing Center 2004 Annual Report
Navigation
Cray provides the next major NERSC system
On August 10, 2006, Cray Inc. and the DOE Office of Science announced that Cray had won the contract to install a next-generation supercomputer at NERSC. The systems and multi-year services contract includes delivery of a Cray XT4 supercomputer, with options for future upgrades that would quadruple the size of the system and eventually boost performance to one petaflop/s (1,000 trillion floating point operations per second) and beyond.
A successor to the massively parallel Cray XT3 supercomputer, the XT4 system installed at NERSC will be among the world’s fastest general-purpose systems and will be the largest XT4 system in the world. It will deliver sustained performance of more than 16 trillion calculations per second when running a suite of diverse scientific applications at scale. The system uses thousands of AMD Opteron processors running a tuned, lightweight operating system and interfaced to Cray’s unique SeaStar network.
Cray began building the new supercomputer at the manufacturing facility in late 2006 and delivered it in early 2007 (Figure 1), with completion of the installation and acceptance scheduled for the fall.
Figure 1. NERSC’s Cray XT4 supercomputer, when complete, will deliver sustained performance of at least 16 teraflop/s.
As part of a competitive procurement process (see detailed discussion in next section), The NERSC procurement team evaluated systems from a number of vendors using the Sustained System Performance (SSP) metric. The SSP metric, developed by NERSC, measures sustained performance on a set of codes designed to accurately represent the challenging computing environment at the Center.
“While the theoretical peak speed of supercomputers may be good for bragging rights, it’s not an accurate indicator of how the machine will perform when running actual research codes,” said NERSC Director Horst Simon. “To better gauge how well a system will meet the needs of our 2,500 users, we developed SSP. According to this test, the new system will deliver over 16 teraflop/s on a sustained basis.”
“The Cray proposal was selected because its price/performance was substantially better than other proposals we received, as determined by NERSC’s comprehensive evaluation criteria of more than 40 measures,” said Bill Kramer, General Manager of the NERSC Center.
The XT4 supercomputer at NERSC will consist of almost 20,000 AMD Opteron 2.6-gigahertz processor cores (19,344 compute CPUs), with two cores per socket making up one node. Each node has 4 gigabytes (4 billion bytes) of memory and a dedicated SeaStar connection to the internal network. The full system will consist of over 100 cabinets with 39 terabytes (39 trillion bytes) of aggregate memory capacity. When completely installed, the system will increase NERSC’s sustained computational capability by almost a factor of 10, with an SSP of at least 16.01 teraflop/s (as a reference, Seaborg’s SSP is 0.89 Tflop/s, and Bassi’s SSP is 0.8 Tflop/s). The system will have a bisection bandwidth of 6.3 terabytes per second and 402 terabytes of usable disk.
In keeping with NERSC’s tradition of naming supercomputers after world-class scientists, the new system will be called “Franklin” in honor of Benjamin Franklin, America’s first scientist. The year 2006 was the 300th anniversary of Franklin’s birth.
“Ben Franklin’s scientific achievements included fundamental advances in electricity, thermodynamics, energy efficiency, material science, geophysics, climate, ocean currents, weather, population growth, medicine and health, and many other areas,” said Kramer. “In the tradition of Franklin, we expect this system to make contributions to science of the same high order.”
BVSS and PERCU: A comprehensive approach to HPC procurement
NERSC’s consistency in deploying reliable and robust high-end computing systems is due in large part to flexible procurement practices based on a process that can be summed up with the acronyms BVSS and PERCU—Best Value Source Selection and Performance, Effectiveness, Reliability, Consistency, and Usability.
Originally developed at Lawrence Livermore National Laboratory (LLNL) for the procurement of ASCI systems, BVSS has been used to procure all the major HPC systems installed at NERSC since the center moved to Berkeley Lab in 1996. The intent of BVSS is to reduce procurement time, reduce costs for technical evaluations, and provide an efficient and cost-effective way of conducting complex procurements to select the most advantageous offer. The flexibility of BVSS allows vendors to propose (and buyers to consider) different solutions than may have been envisioned at the outset, and allows buyers to evaluate and compare features in addition to price, focusing on the strengths and weaknesses of proposals. The end result at NERSC is usually a firm, fixed-price contract with hundreds of criteria that both NERSC and the vendor agree on.
Based on its success at NERSC, BVSS has since been adopted by Pacific Northwest National Laboratory and other organizations. And an offer to other supercomputing centers to get a firsthand look at the process by observing the procurement of NERSC-5 (resulting in the choice of the XT4) drew representatives from several National Science Foundation and Department of Defense facilities.
Within the BVSS framework, NERSC translates scientific requirements into about 50 high-level factors that reflect the attributes computational scientists want in a large system:
- Performance: How fast will a system process their work if everything is perfect?
- Effectiveness: What is the likelihood they can get the system to do their work?
- Reliability: The system is available to do work and operates correctly all the time.
- Consistency/variability: How often will the system process their work as fast as it can?
- Usability: How easy is it for them to get the system to go as fast as possible?
NERSC uses this PERCU methodology (developed by Bill Kramer as part of his Ph.D. research) to assess systems not just before purchase but throughout their life. PERCU includes the Sustained System Performance (SSP) and Effective System Performance (ESP) metrics, which NERSC uses to assure its client community and stakeholders that the systems will be highly productive and cost effective. SSP provides a quantitative assessment of sustained computer performance over time with a complex workload, while ESP is used to monitor the impact of configuration changes and software upgrades in existing systems. NERSC now has a web site for all the SSP benchmarks, to which other computer centers can download tests and report their own results (see http://www.nersc.gov/projects/SDSA/software?benchmark=ssp).
Bill Kramer
NERSC General Manager Bill Kramer has shared this procurement expertise by organizing sessions at the two largest HPC conferences, the 2006 International Supercomputer Conference (ISC) held in Dresden, Germany, in June, and SC2006, held in Tampa, Fla., in November. At ISC2006, Kramer and Michael Resch of the Stuttgart Supercomputing Center co-chaired a panel discussion on “Acquisition and Operation of an HPC System.” The session, which drew approximately 50 attendees, also included presentations by representatives of NASA, LLNL, and the National Center for High-Performance Computing (NCHC) in Taiwan. That panel discussion led to a workshop organized by Kramer at the SC2006 conference in Tampa. The goal of this workshop was to serve as a starting point for accumulating and disseminating the shared expertise of the HPC community in assessing and acquiring HPC systems, with the expectation of creating a document of best practices for HPC system procurements.
NERSC Global Filesystem marks first full year in production
NERSC has historically been a leader in providing new systems and services to help users make the best use of the Center’s computing resources. The NERSC Global Filesystem (NGF), which allows users to create and access a single file from any HPC system on the machine room floor, marked its first full year in production in 2006. NGF, which currently provides 70 terabytes (TB) of usable storage for users, ended the year at 88 percent of capacity and is scheduled to be upgraded to 140 TB in 2007. The goal of NGF is to increase scientific productivity by simplifying data management and access.
NGF does this by creating a single data file, using a single unified namespace, which can be used on any of NERSC’s computing architectures. NGF’s single unified namespace makes it easier for users to manage their data across multiple systems. Advantages include:
- Users no longer need to keep track of multiple copies of programs and data or copy data between NERSC systems.
- Storage utilization is more efficient because of decreased fragmentation.
- Computational resource utilization is more efficient because users can more easily run jobs on an appropriate resource.
- NGF provides improved methods of backing up user data.
- NGF improves system security by eliminating the need for collaborators to use “group” or “world” permissions.
While the single unified namespace feature is important due to the heterogeneous computing environment of NERSC, NGF also proved its worth when Seaborg was temporarily taken out of service in mid-2006 due to security concerns. Those users who had asked to have project directories set up for their research were able to easily move their jobs to the other systems. In all, 77 projects have requested to use NGF, and these typically represent large users. For example, just 14 projects account for 70 percent of the storage used in NGF and of these, four projects account for 50 percent of the storage. The largest users are in the field of astrophysics and include the Planck satellite data project, the cosmic microwave background radiation project and the Nearby Supernova Factory. The groups have found that NGF helps them use the best computing system depending on the nature of the task and also provides flexibility for mapping out workflows to different systems.
One of the challenges facing both large and small research groups is consistency of the data being analyzed. NGF helps ensure that all members are using the same data file, rather than inadvertently working on different versions of a file. This also helps users keep track of the location of each file and means that files do not have to be moved from system to system as a job progresses.
NGF provides users with immediate access to data as soon as it is created. This instant availability of data, such as files generated on the visualization server DaVinci, enables users to computationally “steer” jobs running on NERSC systems. Future plans call for possible use of NGF over wide area networks from other computing centers and to have home and scratch file systems globally accessible.
To protect user data, NGF files are backed up to NERSC’s HPSS biweekly, and NGF is scheduled to become fully integrated with HPSS in the future.
As deployed at NERSC, NGF is expected to have a long life of 10 to 15 years or more. It is expected that during this time the file system will change and evolve, just as the systems in the Center that are connected to it will. It is also expected that the user data will have long-term persistence in the file system, ranging from months and years up to the deployed life of the file system, at the discretion of the users.
Integrating NERSC’s storage and file systems
Over the past year, NERSC’s Mass Storage Group, with Jason Hick as its new group leader, has improved the storage system’s reliability, performance, and availability while managing an unprecedented amount of data. The storage systems, in fact, reached 3 petabytes of stored data in November 2006.

Jason Hick
The Mass Storage Group participates in the GPFS-HPSS Integration (GHI) collaboration with IBM. The collaboration has focused on designing a new interface that allows HPSS to be more fully integrated with new GPFS software features. In the new version of GHI, HPSS will transparently serve as an additional hierarchy of storage so data will move between GPFS and HPSS as it is needed. The new version also explicitly performs highly parallel backups of GPFS.
Damian Hazen of the Mass Storage Group developed and demonstrated a proof of concept for the new GHI design at SC06 in November. The demonstration was hailed as a great success and served as the foundation for the new software, which is expected to be available in 2007–08.
Once in production, the software will provide a unified global name space to users by allowing users to access their files and data through any GPFS file system, including NGF. As time goes on, the data will move automatically to an HPSS system while leaving a file stub in GPFS. The user can still access the data through GPFS—the software will handle automated data movement between HPSS and GPFS when necessary.
“As we prepare to enter the petascale computing era, data is sure to increase, and integration of the storage and file system at NERSC is one approach to easing the data management challenges that users are sure to face,” said Hick.
In 2007, the group is focusing on several projects, such as upgrading to a new version of HPSS, improving storage system bandwidth and capacity, and providing a Globus-based GridFTP server capable of accessing HPSS directly.
The group continues to prepare for the upgrade to HPSS version 6.2, which is expected to occur toward the end of 2007. The upgrade will remove HPSS’s dependence on the legacy Distributed Computing Environment (DCE) software.
Deploying a new tape technology that will expand capacity and handle increased demand is also under way. The new technology will more than double the previous tape capacity and bandwidth capabilities, holding 500 gigabytes of uncompressed data in one cartridge, or nearly 1 terabyte in a compressed format.
The Globus-based GridFTP server is expected to be available after the new version of HPSS is in place. This GridFTP server will provide HPSS access to GridFTP clients. GridFTP is gaining support in scientific community as the data movement interface of choice. Providing a GridFTP server that accesses HPSS directly will provide a reliable and high performance wide area network transfer option for HPSS data.
OSF power supply is upgraded
While NERSC systems staff members implement new approaches to improve the usability and reliability of computing resources, the Center also upgraded its electrical infrastructure in 2006. To accomplish this, a planned power outage was carried out during the week of October 30 at the Oakland Scientific Facility (OSF). The outage allowed the NERSC computer room to be safely upgraded to accommodate a new uninterruptible power supply (UPS) and future computing systems, including Franklin, NERSC’s new Cray supercomputer. Several carefully timed email notices during the previous month had informed all NERSC users about the outage.
The electrical substations in the OSF basement could deliver up to 6 megawatts (MW) of power, but only 2 MW were actually used in the machine room. However, NERSC needs 4 MW to power the increased computing capability and cooling requirements of Franklin and future machines.
To meet these needs, Pacific Gas and Electric Company (PG&E) upgraded its connection to the building, and connected new 480V feeds between the basement and the machine room to deliver the increased power. The chilled water piping under the machine room floor was also rearranged to improve the air flow, since each of Franklin’s 102 racks will need 2300 cubic feet of cooled air per minute.
In February 2007, NERSC completed the power upgrade by installing its first uninterruptible power supply (UPS) to protect critical data in the NERSC Global Filesystem (NGF) and HPSS. With the UPS in operation, if an unscheduled power outage does happen, the UPS will allow a graceful shutdown of NERSC’s critical storage disks and databases. That added margin of safety will benefit NERSC staff and users with increasing reliability and decreasing the amount of time required to recover from power failures.
Meeting security challenges and sharing expertise
NERSC computer security efforts aim to protect NERSC systems and users’ intellectual property from unauthorized access or modification. This a formidable challenge, considering that NERSC resources are accessed remotely by more than 2500 scientists via a wide range of non-dedicated networks, and that government agencies and supercomputer centers are attractive targets for hackers. Security measures in an open science environment must strike a delicate balance between permitting legitimate researchers unencumbered access to resources and preventing illegitimate adversaries from compromising those same resources.
The biggest challenge of the year came on October 5, 2006, when NERSC’s security team discovered that a security incident had the potential of compromising some users’ passwords, SSH keys, and Grid certificates. The security response team quickly disabled all the affected accounts, notified the users by email, and took Seaborg offline for several days while they performed a comprehensive security check and took remedial steps to assure the system was not at risk. NERSC staff worked around the clock and from two continents to restore the system to service in one-third the expected time.
Apparently no actual damage was done to any NERSC systems or user files, but as an extra precaution, all passwords that had been used at NERSC for the previous three years were disabled. Users were emailed instructions on how to change their passwords and regenerate their SSH keys and Grid certificates, and they were advised to do the same on any systems outside of NERSC that may have been exposed to this security threat. NERSC staff generated and implemented a long list of action items for restoration of full service as well as long-term security improvements.
A week later, in an email from Washington to all NERSC staff, Bill Kramer forwarded the compliments of Office of Science officials on a job well done. “While the goal is to avoid such a thing, handling it well when it happens is another mark of excellence,” Kramer wrote.
Computer security is often compared to an arms race, with each side constantly innovating to meet the opponents’ challenges. Active participation in the cybersecurity community is one way that NERSC’s security team stays up to date. For example, at the SC06 conference, a full-day tutorial on “Computing Protection in Open HPC Environments” was presented by Steve Lau, formerly of NERSC and now at the University of California, San Francisco; Bill Kramer and Scott Campbell of NERSC; and Brian Tierney of Berkeley Lab’s Distributed Systems Department.
NERSC staff also contributed to a workshop titled “DOE Cybersecurity R&D Challenges for Open Science: Developing a Roadmap and Vision,” held January 24–26, 2007, in Washington, D.C. The goal of the workshop was to identify the research needs and opportunities associated with cybersecurity for open science Brent Draney, Scott Campbell, and Howard Walter contributed a white paper on “NERSC Cyber Security Challenges that Require DOE Development and Support.” Draney and Kramer participated in the workshop’s deliberations, which were summarized in a report to DOE management.