NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
  2

Networking:


Sandial MS Word report.

DRAFT Shadial Shadow 14000 Evaluation Report 9

Shadial Shadow 14000 Switch Evaluation Report



1. Overview

The document is to document the evaluation findings of Sandial’s Shadow 14000 switch evaluation.


The Global Unified Parallel File System (GUPFS) project is a multiple-phase, five-year NERSC Center project to provide a scalable, high-performance, high-bandwidth, shared file system for all the NERSC production systems. The GUPFS project is intended to evaluate emerging storage, fabric, and file system technologies to determine the best solutions for a center-wide shared file system.


As a center-wide file system, GUPFS will be deployed on all of the NERSC production systems and different systems have different performance requirements. As GUPFS starts to consolidate storage resources into a NERSC center-wide storage area network (SAN) infrastructure, GUPFS needs a storage networking solution that could manage the end-to-end network service requirements (QoS – Quality of Service).

The object of GUPFS Quality of Service (QoS) requirement is to ensure that access to shared storage resources are available to client systems in quantities as assigned to the clients by the policies. For example, when a large parallel job is delayed in doing I/O due to contention for storage access and bandwidth congestion, large portions of the system may be idled, reducing the throughput of the center's primary computational system.

It is desirable that the quality of service (QoS) policies can be changed dynamically and be centrally managed. It is intended that the QoS policies will be implemented in the fabric used to access the storage and/or in the storage servers/devices, not in the individual client systems.


The purpose of this evaluation is to evaluate how a technology like Sandial’s Shadow 14000 may be used by GUPFS to provide end-to-end QoS management.


The top-level evaluation objectives include:


  • Functionality and performance validation

  • Switch reliability and data availability

  • Traffic management with policies

2. Test Setup
















The evaluation was conducted on the GUPFS test bed. The GUPFS test bed consists of a 32-node linux cluster, which can be divided into 4 sets of 8-node clusters, each with different interconnects, and several fabric (FC and iSCSI) switches, and a variety of FC storage devices (www.nersc.gov/projects/gupfs/testbed).

Linux Host Configuration

  • Supermicro P4DP6 motherboards with six PCI-X slots, two of which are 133 MHz capable

  • dual 2.2 GHz Pentium IV Prestonia Xeon CPUs

  • 2 GB of DDR PC2100 ECC memory (512MB was used to avoid the bounce buffer problem)

  • dual on-board Intel PRO/100 Ethernet interfaces

  • dual on-board U160 Adaptec SCSI controllers

  • on-board VGA graphics

  • one 36 GB Ultra 160 LVD 10K RPM SCSI disk drive

  • one Qlogic qla2340 133 MHz PCI-X Fibre Channel HBA (low or standard profile)

  • one Intel PRO/1000 XT 133 MHz PCI-X Gigabit Ethernet NIC (low or standard profile)

  • RedHat 7.3 Linux kernel 2.4.18-10smp


Test Software

  • NERSC PIORAW (MPI-based Parallel I/O Benchmark), to measure port/fabric performance and scalability. The PIORAW benchmark spawns parallel I/O jobs on multiple hosts and each I/O job performs sequential I/O’s (reads and writes).

  • lmdd from Lmbench, to measure I/O throughput and scalability


The evaluation was conducted between 12/1/2003 and 2/29/2004.



3. Findings and Results

3.1 Basic Switch Functionality – Performance and FC Port Interoperability

Objective: Baseline functionality and performance test

Observations:

  • The Shadow 14000 switch worked with all the host connections (initiators) within the GUPFS test bed. The switch also worked with all the storage devices (targets) that support point-to-point connection. The switch port was able to auto sense between 1 and 2Gbps connections. We were able to transfer (read and write) data between any target (1 Gbps or 2Gbps FC devices) and any initiator (2Gbps only) without any problem.

  • The switch does not support FCAL (or loop) devices, as a result, the switch does not support the DotHill device. However, the port status of the switch port where the DotHill device was connected to was showing ‘isolated’ which was not very informative. Support for FCAL devices is claimed to be available in the next software release.

  • We used both lmdd from Lmbench and NERSC’s PIORAW I/O benchmark to measure the switch performance.

The following two tables show the PIORAW results with two different switches: the Qlogic SANbox2-64 switch and the Sandial Shadow 14000 switch. There is no significant performance difference between the SANbox2-64 results and the Shadow 14000 results.

The tests were run with two file sizes: 128 MB for in-cache runs and 4231 MB for out-of-cache runs. The tests were run with different number of hosts and different number of I/O processes per host to compare the switch performance under different workloads. Under the column ‘Test Configuration (NxP)’, N equals to the number of hosts and P equals to the number of I/O processes per host. For example, for the Test Configuration 4x1, the test was run with 4 hosts and with only one I/O process per host.

In this evaluation, we only measured the streaming I/O (sequential I/O) performance, which is measured in terms of MB/s. The block size used in the I/O test was 1 MB. We did not measure any random I/O performance.


DDN S2A 8500 4 LUNs (fsize=128MB)

Test

Configuration (NxP)

Qlogic

Sanbox2-64

Sandial

Shadow 14000


Write

Read

Write

Read

1x1

132.36

131.40

128.26

128.36

1x4

193.75

197.38

194.50

197.33

1x8

194.22

197.36

194.80

197.33


4x1

520.45

511.52

484.42

478.44


1x1

132.37

131.44

131.54

129.38

2x1

262.30

261.85

260.82

261.45

3x1

384.67

382.08

380.80

381.86

4x1

519.78

510.98

484.98

480.46


1x4

193.78

197.38

194.42

197.39

2x4

387.05

394.42

388.31

394.43

3x4

580.42

591.72

582.86

591.73

4x4

750.21

788.91

744.38

788.80


DDN S2A 8500 4 LUNs (fsize=4231MB)

Test

Configuration (NxP)

Qlogic

Sanbox2-64

Sandial

shadow 14000

Write

Read

Write

Read

1x1

131.45

113.48

132.81

113.52

1x4

193.81

84.07

194.46

85.72

1x8

194.21

92.06

194.80

89.37


4x1

522.89

453.69

505.27

427.40


1x1

133.15

115.31

132.72

116.75

2x1

263.82

227.56

263.67

224.40

3x1

393.86

338.84

390.24

333.50

4x1

520.73

452.22

501.15

431.01


1x4

194.02

84.64

194.46

84.05

2x4

386.96

165.41

388.44

164.99

3x4

580.41

243.42

582.72

243.34

4x4

742.17

313.22

744.23

315.98


3.2 Traffic Management with Policies

Objective: QoS – Quality of Service with Policies. The Shadow 14000 features a hybrid Time Division Multiplex (TDM) managed bandwidth scheduling to deliver dynamic bandwidth allocation, bandwidth on demand and tunable network performance.

Observations:

  • Traffic management with policies works as expected. It was very easy to change the bandwidth allocation between any pairs of initiator and target.

  • The following table shows the 1x4 PIORAW results using 1 2Gb/s storage device under different policy settings: the baseline (the default), max bandwidth set to 50 MB/s, and max bandwidth set to 100 MB/s.



Write

Read

Baseline (2Gb/s)

191.21

194.82

Max BW=50MB/s

56.19

56.13

Max BW=100MB/s

112.25

112.27

  • To test how Shadow 14000 traffic management works in an environment that different systems have different bandwidth requirements, we set the following policy between four pairs of initiator-target:


Initiator-target pair

Max BW

guscn01 - lun1

default

guscn02 - lun2

150 MB/s

guscn03 - lun3

100 MB/s

guscn04 - lun4

50 MB/s

The above policy defines how the max bandwidth is allocated for each initiator-target pair. The default policy was used for the guscn01-lun1 pair so the max bandwidth is limited by what the ports can do (which is 2Gb/s). For the other three pairs, the allocated bandwidths were reduced to 150 MB/s, 100 MB/s, and 50 MB/s. The following table shows the lmdd results with and without out the policy.



Policy (Max Bandwidth)

lmdd (4x4, fsize=2560 MB, bsize=16 MB)

No Policy

With Policy

Guscn01 - lun1

50 MB/s

199.56

197.71

Guscn02 - lun2

100 MB/s

197.75

168.38

Guscn03 - lun3

150 MB/s

196.83

112.89

Guscn04 - lun4

200 MB/s

197.29

56.39


3.3 Switch Interoperability: E-port connectivity and ISL Trunking

Objective: Interoperability test between Shadow 14000 and Qlogic SANbox2-64

Observations:

  • The results are not conclusive. Initially we were able to set up 8 connections between Shadow 14000 and SANbox2-64 and we were able to transfer data between initiators and targets. However, after switch reboot to clear a configuration mistake, we have not been able to get E-port to work correctly again before we ran out of time. The following table shows the baseline performance obtained from using the Sanbox2-64 switch only and the performance of the same tests obtained when the two switches were connected via 8 E-ports.


Yotta Yotta GSX 2400 8 LUNs (fsize=32MB, bsize=1MB)


Qlogic

E-port

sanbox2-64

Sanbox2-64-shadow 14000

Write

Read

Write

Read

1x1

122.40

133.35

116.51

129.45

2x1

239.48

266.07

153.72

170.66

3x1

358.71

393.36

170.58

298.81

4x1

477.81

524.60

186.17

350.52

5x1

567.21

648.94

192.20

493.88

6x1

636.48

811.88

193.44

534.42

7x1

745.81

937.49

193.69

660.58

8x1

857.49

1089.67

194.38

702.14

From the results, it seems that when E-ports were used to connect the two switches together, the performance on writes was somehow limited by a single 2Gb/s FC port but the performance on reads was more than what a single 2Gb/s FC port could do. This set of numbers is very puzzling.

  • We were told that when multiple ISL links are used, only one link will be active between two switches. It seemed to be the case when we connected the two switches together the second time. But in the first time when we connected the two switches together via 8 ports, all 8 ports seemed to be working fine (as shown by the previous table).

  • Failing over between multiple E-ports was working correctly when the primary ISL path was disabled manually and the traffic was automatically re-routed to a second path (after some period of time).

  • Trunking between Shadow 14000 and SANbox2-64 did not seem to work. We were not able to get it to work in our environment before we ran out of time.

  • Shadow 14000 supports both hard zone and soft zone. We only tested zoning with soft zone. Zoning on Shadow 14000 seemed to work as expected. However, defining zones requires switching between multiple windows. A user has to define a set of zone names first on a different window before switching back to the main window to select the zone members.

  • We did not experience any performance difference when zoning was used.

  • Shadow 14000 only supports one zone set which is very limited.

  • Once the Shadow 14000 switch was connected to a Qlogic SANbox2-64 switch, It seemed that the Qlogic SANbox2-64 switch was taking over the control of the zoning setting. I was not able to change the zoning configuration on the Shadow 14000 switch, even after I disabled the connected E-ports.

3.4 RAS - Reliability, Availability, and Serviceability

Objective: Switch Availability

Observations:

  • When multiple paths were available between an initiator and a target, when we disabled an active path to simulate a port failure during an active I/O test, the connection was able to fail over to the second path after some period of time and the test was able to continue to the end. However, we did not measure the time it took for the connection to fail over to the second path nor did we verify whether there were any i/o packet drops during the fail-over.

  • During the evaluation period, we rebooted the Shadow 14000 switch to fix a configuration mistake. However, the switch did come back up after reboot. There seemed to have some conflict between the two HASMs (HA Switch Module) that both wanted to be the primary at boot time. As a result, the switch was not able to come back online. After talking to the support, we physically removed one of the HASMs from the chassis and the switch was able to start up without any problem. The second HASM was inserted to the chassis after the switch was up and the switch stayed up and the second HASM became the standby HASM.

3.5 Switch Administration

Objective: Perform basic system administration functions using SahdowView

Observations:

  • The cable management is really nice.

  • The ShadowView interface is very intuitive. However, it is not easy to see and manage individual port when there are too many ports. It does not have a visual mapping between the ports (port ID) and the physical locations of the ports and therefore it is very difficult to figure out which port is for which host or storage device.

  • Initially, ShadowView did not work with the installed Netscape version on our testbed. We upgraded to Mozilla 1.0.2 but ShadowView requires only Netscape. To work around the problem, a netscape symbolic link was created to point to mozilla to make ShadowView happy.

  • When the ShadowView web interface was up, the login window was buried under the bottom of the screen. We had to scroll the window down in order to enter the name and the password.

  • The switch ‘host name’ should probably be included in the ‘System View’ screen also.

  • Name Server Table’ does not show the symbolic node names.


4. Summary

Based on a connection-oriented architecture, the Shadow 14000 offers network visibility and control for the storage network backbone. The Shadow 14000 features hybrid Time Division Multiplex (TDM) managed bandwidth scheduling to deliver dynamic bandwidth allocation, bandwidth on demand and tunable network performance. The switch supports up to 144 2 Gb/s any-to-any non-blocking network interface and with system level redundancies with hot swappable components, maintains a higher degree of availability and fault tolerance. We believe products like the Shadow 14000 switch will be a key technology in GUPFS to satisfy the QoS requirements.



LBNL Home
Page last modified: Wed, 23 Jun 2004 20:15:07 GMT
Page URL: http://www.nersc.gov/projects/GUPFS/results/network/sandial/index.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science