NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
  GUPFS Testbed Configuration

GUPFS Testbed Configuration

The GUPFS project uses a testbed system to conduct investigations and evaluations of the component technologies needed for a center-wide shared file system, and to explore these components interactions. In addition to these uses, we have employed the testbed to develop the GUPFS benchmark methodology and the actual benchmark codes used to conduct the technology evaluations. The testbed continues to be a useful resource in attracting the attention of component technology vendors and developing relationships with a number of these vendors.

The testbed we used during FY 2003 was the expanded testbed upgraded at the end of FY 2002. This upgrade is detailed in the GUPFS Project FY 2002 report. This testbed was designed and built to provide sufficient hardware and computational resources to support the evaluation of multiple new component technologies, and to provide the underlying SAN fabric and storage resources with sufficient aggregate performance to stress-test existing and emerging shared file system technologies. This design emphasized the extensibility of the testbed system in order to accommodate future technology developments. In this regard, the testbed proved to be very effective. A variety of shared-file systems were successfully tested, a number of fabric components were integrated and tested throughout the year, and additional storage solutions were bought in and evaluated.

The base configuration of the GUPFS testbed during FY 2003, and the changes in that configuration throughout the year are presented in the following sections.

1.    FY 2003 Initial Testbed Configuration

The GUPFS FY 2003 testbed system presented a microcosm of a parallel scientific cluster — dedicated computational nodes, special-function service nodes, and a high-speed interconnect for message passing. It used an internal jumbo frame Gigabit Ethernet as the primary high-speed message passing interconnect. An internal 10/100 Mb/s Fast Ethernet LAN was employed for system management and NFS distribution of the user home file systems. The testbed supplied Fibre Channel as the base SAN fabric, as well as Fibre Channel storage, and a variety of alternative fabrics and bridges between these fabrics.

During FY 2003, the testbed was configured as a Linux parallel scientific cluster, with a management node, a core set of 16 dedicated dual Pentium-4 compute nodes, a set of six special-purpose dual Pentium-4 nodes, and a reserve of five auxiliary dual Pentium-3 compute nodes from the original FY 2002 testbed. The Fibre Channel SAN fabric was expanded extensively with the addition of two 16-port 2 Gb/s FC switches. The Gigabit Ethernet used as the message passing interconnect for parallel jobs was also expanded to support the increased number of nodes and iSCSI testing. A picture of the FY 2003 testbed appears on the following page as Figure 1. The FY 2003 testbed configuration is shown in Figure 2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 1. The FY 2003 testbed, with the NetStorager shown in front.

The following major components were included in the FY 2003 testbed:

System nodes

·       Twenty-two dual Pentium-4 nodes: sixteen in 2U cases and six in 4U cases (these are described in greater detail later in this section)

·                Six dual Pentium-3 nodes in 4U cases

Fabric

·                Ethernet

o       One 32-port Extreme 7i Gigabit Ethernet switch

o       One 16-port Extreme 5i Gigabit Ethernet switch

o       Two 10/100 Ethernet switches for system management

·                Fibre Channel

o       Two 16 port 2 Gb/s Fibre Channel Switches (Brocade SilkWorm 3800 and Qlogic SANbox2-16)

o       One 16 port 1 Gb/s Fibre Channel Switch (Brocade SilkWorm 2800)

o       One Cisco SN5428 iSCSI Router fabric bridge to Ethernet

·                Myrinet

o       One Myrinet 2000 8-port switch with eight host interface cards

·                InfiniBand

o       One InfiniCon ISIS InfinIO 7000 1x InfiniBand switch, with eight 1x HCA host adapters, and fabric bridge modules for Fibre Channel and Gigabit Ethernet

Storage

·                A EMC CLARiiON CX600 disk subsystem

·                A Dot Hill 7124 RAID disk subsystem

·                A Silicon Gear Mercury II RAID subsystem

·                A Chaparral A8526 RAID subsystem with attached storage

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 2. FY 2003 base GUPFS testbed configuration.


The new Pentium-4 nodes all utilize the same motherboard and are configured similarly. The only differences among them are the sizes of the cases in which they are installed. Sixteen of the new technology nodes were put in 2U cases in order to save space, eliminating the need to buy more than one additional cabinet. Six of the new technology nodes were put in 4U cases to allow standard-height-profile peripheral component interconnect (PCI) cards to be installed for the Myrinet 2000 Host interfaces, Intel PRO/1000 T IP iSCSI cards, and early 1x InfiniBand HCAs for the InfiniCon InfinIO 7000 fabric bridge. The need to have PCI-X slots on the motherboard to support the high-performance FC, InfiniBand, and Gigabit Ethernet cards dictated the class of motherboards and processors that were acquired, as the only motherboards available with PCI-X buses were relatively high-end server motherboards.

·       All Pentium-4 nodes, regardless of the size of their case had the same base configuration. This configuration consisted of the same motherboards, dual 2.2 GHz Pentium IV Prestonia Xeon CPUs, 2 GB of DDR memory, 10/100 and Gigabit Ethernet interfaces, 36 GB SCSI disks, and Qlogic 2340 2 Gb/s Fibre Channel HBAs.

·       All Pentium-3 nodes had the same base configuration. This consisted of identical motherboards, dual Pentium III 1 GHz CPUs, 1 GB of memory, 10/100 and Gigabit Ethernet interfaces, and 18 GB SCSI disks. One of the Pentium-3 nodes was configured as a management node and had additional 10/100 and Gigabit Ethernet interfaces. The remaining five Pentium-3 nodes were configured as auxiliary compute nodes with Qlogic 2310 2 Gb/s Fibre Channel HBAs

 

All Pentium-4 nodes, regardless of the size of their case, are configured with:

·       Supermicro P4DP6 motherboards with six PCI-X slots, two of which are 133 MHz capable

·       Dual 2.2 GHz Pentium IV Prestonia Xeon CPUs

·       2 GB of DDR PC2100 ECC memory

·       Dual onboard Intel PRO/100 Ethernet interfaces

·       Dual onboard U160 Adaptec SCSI controllers

·       Onboard VGA graphics

·       One 36 GB Ultra 160 LVD 10K RPM SCSI disk drive

·       One Qlogic qla2340 133 MHz PCI-X Fibre Channel HBA (low or standard profile)

·       One Intel PRO/1000 XT 133 MHz PCI-X Gigabit Ethernet NIC (low or standard profile)

 

All Pentium-3 nodes shared a common base configuration. The management/interactive node and the computational nodes differed only in that the computational nodes contained a 2-Gb/s Fibre Channel interface card, while the management node contained additional Fast Ethernet cards and an additional Gigabit Ethernet card.

All of the Pentium-3 nodes were installed in 4U rack mount cases and had the following base configuration:

·       Intel Server Board STL2 motherboards, with two 64-bit 66 MHz PCI slots with additional 32-bit PCI slots

·       Dual Pentium III 1 GHz CPUs

·       1 GB of PC133 ECC memory

·       Onboard VGA (video graphics array) graphics

·       Onboard U160 Adaptec SCSI controllers

·       One onboard Intel PRO/100 Ethernet interface

·       One 18 GB Ultra 160 LVD 10 k RPM SCSI disk drive

·       One Qlogic qla2200 64-bit Fibre Channel Optical HBA (compute nodes only)

·       One Intel PRO/1000 T 64-bit PCI Gigabit Ethernet NIC (the management node had two)

 

The identical base configuration of all the Pentium-4 nodes, including those intended as special-purpose nodes, allowed them to be used at times as compute nodes for the purpose of scalability testing. Four of the new nodes were configured to be special-purpose nodes. These special-purpose nodes had the same basic hardware and software configuration as the dedicated compute nodes. The four special-purpose nodes are configured to perform the following functions:

·          Code development and benchmark debugging

·          Metadata and lock manager services

·          A dedicated installation target for developing and testing new kickstart configurations

·          A storage server for testing distribution of shared file system with NFS gateways

Two additional Pentium-4 nodes were initially reserved for transient special-purpose usage, such as running the InfiniCon InfiniBand subnet manager, which initially ran under Windows 2000. When InfiniCon’s subnet manager became able to run under Linux, both of these nodes were reconfigured as general purpose compute nodes and were usable in evaluations.

The increased scale, many advanced technology components, and flexible and expandable design of the updated GUPFS testbed will enable many interesting and important evaluations to be conducted over the next several years. These evaluations should lead to the selection of the best and most appropriate component technologies for the rollout of a high-performance shared file system during the second phase (FY 2005–2006) of the GUPFS project.

All testbed nodes ran Linux, based on the RedHat 7.1, 7.2, 7.3, 8.0, or 9.0 distribution, depending on the requirements of the file system tested. The testbed supported parallel job submission and execution using Open PBS and utilized MPICH as the MPI implementation for parallel jobs. Portland Group C, C++, and various flavors of FORTRAN compilers provided the compilation and execution environment for the parallel jobs.

Independent Linux systems on individual nodes are automatically installed through PXEboot kickstart mechanisms. This allowed for multiple, completely different system images to be present on each of the nodes, enabling rapid reconfiguration of the testbed so that it could quickly switch among different software environments, each of which was needed to conduct  a different evaluation.

All nodes except the management node were connected to a switched 2 Gb/s Fibre Channel SAN fabric by 2 Gb/s Qlogic 2300 family FC Host Bus Adapters (HBAs). The 2 Gb/s FC fabric was entirely optical. The 1 Gb/s FC copper fabric was retired and replaced with an optical fabric, except as necessary to attach the original 1 Gb/s FC storage to the 1 Gb/s Brocade SilkWorm 2800 switch.

In addition to the testbed nodes being attached to the Fibre Channel SAN fabric, the various storage devices, both permanent and under evaluation, were attached to the same switched FC fabric, as were a number of fabric bridges. These fabric bridges included the Cisco SN5428 Storage Router bridging between Gigabit Ethernet iSCSI storage traffic from hosts and FC fabric attached storage devices, and the InfiniCon InfinIO fabric bridge between InfiniBand attached hosts and FC fabric attached storage.

1.1     Storage Configuration

The disk storage devices connected to the Fibre Channel SAN fabric were:

·             A dual-controller DotHill 7124 RAID subsystem, with an expansion cabinet

·             A dual-controller Silicon Gear Mercury II RAID subsystem

·             A single-controller Chaparral A8526 RAID subsystem with attached storage

·             A dual-controller EMC CLARiiON CX 600 RAID subsystem with storage

 

Each of the RAID controllers had two or more Fibre Channel ports for connecting to the switch, and these FC ports could be used simultaneously. All four storage devices supported various RAID configurations and utilized similar 10,000 RPM 73 GB disk drives. The DotHill contained 20 drives, the Silicon Gear 12 drives, the Chaparral 10 drives, and the EMC 30 drives. Total unformatted SAN attached storage capacity was approximately 5.3 terabytes (TB), with a nominal maximum of 4.2 TB of formatted RAID 5 storage. The DotHill and Silicon Gear were both limited to 1 Gb/s FC interfaces, while the Chaparral and EMC supported 2 Gb/s FC interfaces

The storage configuration was chosen to enable the exploration of the relative performance, reliability, and interoperability of multiple storage vendors’ products. The quantity and character of the storage was dictated by:

·             The technology available at the time of acquisition

·             The desire to be able to achieve maximum performance from each storage controller

·             The desire to be able to explore the Linux support for file systems greater than 2 TB on 32-bit architectures when such support became available

·             Price

1.2     Testbed Configuration Changes during FY 2003

A number of changes in the testbed configuration occurred during FY 2003. These included upgrading the InfiniCon InfinIO switch and HCAs from 1x (2.5 Gb/s) to 4x (10 Gb/s) InfiniBand, exchanging the Myrinet 2000 PCI based Rev C host interface adapters for higher performance PCI-X Rev D cards, and connecting the Alvarez cluster 10/100 management network with the GUPFS testbed Gigabit Ethernet fabric. Another modification was the exchange of three of the five Intel iSCSI HBAs for a newer version that could run with more up to date Linux kernels, allowing the iSCSI HBAs to be tested in conjunction with the newer file systems.

The Myrinet 2000 host adapter exchange allowed all eight Myrinet 2000 host adapters to be installed and the full capabilities of the testbed Myrinet fabric to be investigated. The new Rev D adapters were available in low profile form factor (2U) which allowed all of them to be installed in nodes. The previous adapters were full height (4U). Since the testbed only had six 4U Pentium-4 nodes, only six of the eight original adapters could be installed. Once all eight Rev D adapters were installed, the GUPFS project proceeded with plans to install the GUPFS 8 port Myrinet switch blade with the Alvarez Myrinet switch, making the eight connected GUPFS nodes part of the Alvarez system. This allowed testing of GPFS 1.3 for Linux using Alvarez compute nodes with GUPFS nodes as high-performance Network Storage Devices (NSDs) in lieu of Alvarez’s low performance storage. This also allowed GPFS testing at a large scale (64 or more nodes) in conjunction with 1600 MB/s storage bandwidth, a combination unachievable by either system alone. In addition, it provided us an  opportunity to begin investigating shared file systems running on multiple systems.

During FY 2003, the GUPFS testbed InfiniBand configuration received several updates. InfiniCon upgraded the HCAs and switch modules to 4X (10 Gb/s), enabling early investigation of storage transfers over 4X InfiniBand. As in the case of the 1x IB HCAs, the initial 4x HCAs were full height (4U). Because the 4x HCAs required PCI-X slots, and because the testbed only had six 4U Pentium-4 nodes with PCI-X slots, the InfiniBand fabric deployment was limited to six systems, although components were available for eight systems. In one of a number of software upgrades, it became possible to run the InfiniCon IB subnet manager under Linux. This allowed the system running the subnet manager under Windows 2000 to be converted back to Linux and used as a compute node in evaluations.

A second InfiniCon hardware upgrade brought in a second generation of 4x HCAs. In addition to providing higher performance, these HCAs were available in low profile (2U) form factor. This made it possible for us to install all eight of the new HCAs in 2U Pentium-4 nodes, to fully populate the configuration for the first time. This allowed more meaningful scalability numbers to be obtained, enabling more direct and clear interconnect/fabric performance comparisons.


LBNL Home
Page last modified: Tue, 22 Jun 2004 22:50:18 GMT
Page URL: http://www.nersc.gov/projects/GUPFS/testbed/GUPFS_testbed03.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science