|
2
Networking:
Sandial MS Word report.
DRAFT Shadial Shadow
14000 Evaluation Report 9
Shadial
Shadow 14000 Switch Evaluation Report
1. Overview
The
document is to document the evaluation findings of Sandial’s Shadow
14000 switch evaluation.
The
Global Unified Parallel File System (GUPFS) project is a
multiple-phase, five-year NERSC Center project to provide a scalable,
high-performance, high-bandwidth, shared file system for all the
NERSC production systems. The GUPFS project is intended to evaluate
emerging storage, fabric, and file system technologies to determine
the best solutions for a center-wide shared file system.
As a center-wide
file system, GUPFS will be deployed on all of the NERSC production
systems and different systems have different performance
requirements. As GUPFS starts to consolidate storage resources into a
NERSC center-wide storage area network (SAN) infrastructure, GUPFS
needs a storage networking solution that could manage the end-to-end
network service requirements (QoS – Quality of Service).
The object of GUPFS
Quality of Service (QoS) requirement is to ensure that access to
shared storage resources are available to client systems in
quantities as assigned to the clients by the policies. For example,
when a large parallel job is delayed in doing I/O due to contention
for storage access and bandwidth congestion, large portions of the
system may be idled, reducing the throughput of the center's primary
computational system.
It
is desirable that the quality of service (QoS) policies can be
changed dynamically and be centrally managed. It is intended that the
QoS policies will be implemented in the fabric used to access the
storage and/or in the storage servers/devices, not in the individual
client systems.
The
purpose of this evaluation is to evaluate how a technology like
Sandial’s Shadow 14000 may be used by GUPFS to provide end-to-end
QoS management.
The
top-level evaluation objectives include:
Functionality
and performance validation
Switch
reliability and data availability
Traffic
management with policies
2. Test Setup

The evaluation was conducted on the
GUPFS test bed. The GUPFS test bed consists of a 32-node linux
cluster, which can be divided into 4 sets of 8-node clusters, each
with different interconnects, and several fabric (FC and iSCSI)
switches, and a variety of FC storage devices
(www.nersc.gov/projects/gupfs/testbed).
Linux Host Configuration
Supermicro P4DP6 motherboards with
six PCI-X slots, two of which are 133 MHz capable
dual 2.2 GHz Pentium IV Prestonia
Xeon CPUs
2
GB of DDR PC2100 ECC memory (512MB was used to avoid the bounce
buffer problem)
dual on-board Intel PRO/100 Ethernet
interfaces
dual on-board U160 Adaptec SCSI
controllers
on-board VGA graphics
one 36 GB Ultra 160 LVD 10K RPM SCSI
disk drive
one Qlogic qla2340 133 MHz PCI-X
Fibre Channel HBA (low or standard profile)
one Intel PRO/1000 XT 133 MHz PCI-X
Gigabit Ethernet NIC (low or standard profile)
RedHat 7.3 Linux kernel 2.4.18-10smp
Test Software
NERSC PIORAW (MPI-based Parallel I/O
Benchmark), to measure port/fabric performance and scalability. The
PIORAW benchmark spawns parallel I/O jobs on multiple hosts and each
I/O job performs sequential I/O’s (reads and writes).
lmdd
from Lmbench, to measure I/O throughput and scalability
The
evaluation was conducted between 12/1/2003 and 2/29/2004.
3. Findings and Results
3.1 Basic Switch Functionality –
Performance and FC Port Interoperability
Objective: Baseline functionality and
performance test
Observations:
The Shadow 14000 switch worked with
all the host connections (initiators) within the GUPFS test bed. The
switch also worked with all the storage devices (targets) that
support point-to-point connection. The switch port was able to auto
sense between 1 and 2Gbps connections. We were able to transfer
(read and write) data between any target (1 Gbps or 2Gbps FC
devices) and any initiator (2Gbps only) without any problem.
The switch does not support FCAL (or
loop) devices, as a result, the switch does not support the DotHill
device. However, the port status of the switch port where the
DotHill device was connected to was showing ‘isolated’ which was
not very informative. Support for FCAL devices is claimed to be
available in the next software release.
We used both lmdd from Lmbench and
NERSC’s PIORAW I/O benchmark to measure the switch performance.
The following two tables show the
PIORAW results with two different switches: the Qlogic SANbox2-64
switch and the Sandial Shadow 14000 switch. There is no significant
performance difference between the SANbox2-64 results and the Shadow
14000 results.
The tests were run with two file
sizes: 128 MB for in-cache runs and 4231 MB for out-of-cache runs.
The tests were run with different number of hosts and different
number of I/O processes per host to compare the switch performance
under different workloads. Under the column ‘Test Configuration
(NxP)’, N equals to the number of hosts and P equals to the number
of I/O processes per host. For example, for the Test Configuration
4x1, the test was run with 4 hosts and with only one I/O process per
host.
In this evaluation, we only measured
the streaming I/O (sequential I/O) performance, which is measured in
terms of MB/s. The block size used in the I/O test was 1 MB. We did
not measure any random I/O performance.
|
DDN
S2A 8500 4 LUNs (fsize=128MB)
|
|
Test
Configuration
(NxP)
|
Qlogic
Sanbox2-64
|
Sandial
Shadow
14000
|
|
|
Write
|
Read
|
Write
|
Read
|
|
1x1
|
132.36
|
131.40
|
128.26
|
128.36
|
|
1x4
|
193.75
|
197.38
|
194.50
|
197.33
|
|
1x8
|
194.22
|
197.36
|
194.80
|
197.33
|
|
|
|
4x1
|
520.45
|
511.52
|
484.42
|
478.44
|
|
|
|
1x1
|
132.37
|
131.44
|
131.54
|
129.38
|
|
2x1
|
262.30
|
261.85
|
260.82
|
261.45
|
|
3x1
|
384.67
|
382.08
|
380.80
|
381.86
|
|
4x1
|
519.78
|
510.98
|
484.98
|
480.46
|
|
|
|
1x4
|
193.78
|
197.38
|
194.42
|
197.39
|
|
2x4
|
387.05
|
394.42
|
388.31
|
394.43
|
|
3x4
|
580.42
|
591.72
|
582.86
|
591.73
|
|
4x4
|
750.21
|
788.91
|
744.38
|
788.80
|
|
DDN
S2A 8500 4 LUNs (fsize=4231MB)
|
|
Test
Configuration
(NxP)
|
Qlogic
Sanbox2-64
|
Sandial
shadow
14000
|
|
Write
|
Read
|
Write
|
Read
|
|
1x1
|
131.45
|
113.48
|
132.81
|
113.52
|
|
1x4
|
193.81
|
84.07
|
194.46
|
85.72
|
|
1x8
|
194.21
|
92.06
|
194.80
|
89.37
|
|
|
|
4x1
|
522.89
|
453.69
|
505.27
|
427.40
|
|
|
|
1x1
|
133.15
|
115.31
|
132.72
|
116.75
|
|
2x1
|
263.82
|
227.56
|
263.67
|
224.40
|
|
3x1
|
393.86
|
338.84
|
390.24
|
333.50
|
|
4x1
|
520.73
|
452.22
|
501.15
|
431.01
|
|
|
|
1x4
|
194.02
|
84.64
|
194.46
|
84.05
|
|
2x4
|
386.96
|
165.41
|
388.44
|
164.99
|
|
3x4
|
580.41
|
243.42
|
582.72
|
243.34
|
|
4x4
|
742.17
|
313.22
|
744.23
|
315.98
|
3.2 Traffic Management with
Policies
Objective: QoS – Quality of Service
with Policies. The Shadow 14000 features a hybrid Time Division
Multiplex (TDM) managed bandwidth scheduling to deliver dynamic
bandwidth allocation, bandwidth on demand and tunable network
performance.
Observations:
Traffic management with policies
works as expected. It was very easy to change the bandwidth
allocation between any pairs of initiator and target.
The following table shows the 1x4
PIORAW results using 1 2Gb/s storage device under different policy
settings: the baseline (the default), max bandwidth set to 50 MB/s,
and max bandwidth set to 100 MB/s.
|
|
Write
|
Read
|
|
Baseline
(2Gb/s)
|
191.21
|
194.82
|
|
Max
BW=50MB/s
|
56.19
|
56.13
|
|
Max
BW=100MB/s
|
112.25
|
112.27
|
|
Initiator-target
pair
|
Max
BW
|
|
guscn01
- lun1
|
default
|
|
guscn02
- lun2
|
150
MB/s
|
|
guscn03
- lun3
|
100
MB/s
|
|
guscn04
- lun4
|
50
MB/s
|
The above policy defines how the max
bandwidth is allocated for each initiator-target pair. The default
policy was used for the guscn01-lun1 pair so the max bandwidth is
limited by what the ports can do (which is 2Gb/s). For the other
three pairs, the allocated bandwidths were reduced to 150 MB/s, 100
MB/s, and 50 MB/s. The following table shows the lmdd results with
and without out the policy.
|
|
Policy
(Max Bandwidth)
|
lmdd
(4x4, fsize=2560 MB, bsize=16 MB)
|
|
No
Policy
|
With
Policy
|
|
Guscn01
- lun1
|
50
MB/s
|
199.56
|
197.71
|
|
Guscn02
- lun2
|
100
MB/s
|
197.75
|
168.38
|
|
Guscn03
- lun3
|
150
MB/s
|
196.83
|
112.89
|
|
Guscn04
- lun4
|
200
MB/s
|
197.29
|
56.39
|
3.3 Switch Interoperability: E-port
connectivity and ISL Trunking
Objective: Interoperability test
between Shadow 14000 and Qlogic SANbox2-64
Observations:
The results are not conclusive.
Initially we were able to set up 8 connections between Shadow 14000
and SANbox2-64 and we were able to transfer data between initiators
and targets. However, after switch reboot to clear a configuration
mistake, we have not been able to get E-port to work correctly again
before we ran out of time. The following table shows the baseline
performance obtained from using the Sanbox2-64 switch only and the
performance of the same tests obtained when the two switches were
connected via 8 E-ports.
|
Yotta
Yotta GSX 2400 8 LUNs (fsize=32MB, bsize=1MB)
|
|
|
Qlogic
|
E-port
|
|
sanbox2-64
|
Sanbox2-64-shadow
14000
|
|
Write
|
Read
|
Write
|
Read
|
|
1x1
|
122.40
|
133.35
|
116.51
|
129.45
|
|
2x1
|
239.48
|
266.07
|
153.72
|
170.66
|
|
3x1
|
358.71
|
393.36
|
170.58
|
298.81
|
|
4x1
|
477.81
|
524.60
|
186.17
|
350.52
|
|
5x1
|
567.21
|
648.94
|
192.20
|
493.88
|
|
6x1
|
636.48
|
811.88
|
193.44
|
534.42
|
|
7x1
|
745.81
|
937.49
|
193.69
|
660.58
|
|
8x1
|
857.49
|
1089.67
|
194.38
|
702.14
|
From the results, it seems that when
E-ports were used to connect the two switches together, the
performance on writes was somehow limited by a single 2Gb/s FC port
but the performance on reads was more than what a single 2Gb/s FC
port could do. This set of numbers is very puzzling.
We were told that when multiple ISL
links are used, only one link will be active between two switches.
It seemed to be the case when we connected the two switches together
the second time. But in the first time when we connected the two
switches together via 8 ports, all 8 ports seemed to be working fine
(as shown by the previous table).
Failing over between multiple E-ports
was working correctly when the primary ISL path was disabled
manually and the traffic was automatically re-routed to a second
path (after some period of time).
Trunking between Shadow 14000 and
SANbox2-64 did not seem to work. We were not able to get it to work
in our environment before we ran out of time.
Shadow 14000 supports both hard zone
and soft zone. We only tested zoning with soft zone. Zoning on
Shadow 14000 seemed to work as expected. However, defining zones
requires switching between multiple windows. A user has to define a
set of zone names first on a different window before switching back
to the main window to select the zone members.
We did not experience any performance
difference when zoning was used.
Shadow 14000 only supports one zone
set which is very limited.
Once the Shadow 14000 switch was
connected to a Qlogic SANbox2-64 switch, It seemed that the Qlogic
SANbox2-64 switch was taking over the control of the zoning setting.
I was not able to change the zoning configuration on the Shadow
14000 switch, even after I disabled the connected E-ports.
3.4 RAS - Reliability,
Availability, and Serviceability
Objective: Switch Availability
Observations:
When multiple paths were available
between an initiator and a target, when we disabled an active path
to simulate a port failure during an active I/O test, the connection
was able to fail over to the second path after some period of time
and the test was able to continue to the end. However, we did not
measure the time it took for the connection to fail over to the
second path nor did we verify whether there were any i/o packet
drops during the fail-over.
During the evaluation period, we
rebooted the Shadow 14000 switch to fix a configuration mistake.
However, the switch did come back up after reboot. There seemed to
have some conflict between the two HASMs (HA Switch Module) that
both wanted to be the primary at boot time. As a result, the switch
was not able to come back online. After talking to the support, we
physically removed one of the HASMs from the chassis and the switch
was able to start up without any problem. The second HASM was
inserted to the chassis after the switch was up and the switch
stayed up and the second HASM became the standby HASM.
3.5 Switch Administration
Objective: Perform basic system
administration functions using SahdowView
Observations:
The cable management is really nice.
The ShadowView interface is very
intuitive. However, it is not easy to see and manage individual
port when there are too many ports. It does not have a visual
mapping between the ports (port ID) and the physical locations of
the ports and therefore it is very difficult to figure out which
port is for which host or storage device.
Initially, ShadowView did not work
with the installed Netscape version on our testbed. We upgraded to
Mozilla 1.0.2 but ShadowView requires only Netscape. To work around
the problem, a netscape symbolic link was created to point to
mozilla to make ShadowView happy.
When the ShadowView web interface was
up, the login window was buried under the bottom of the screen. We
had to scroll the window down in order to enter the name and the
password.
The switch ‘host name’ should
probably be included in the ‘System View’ screen also.
‘Name Server Table’ does not show
the symbolic node names.
4. Summary
Based
on a connection-oriented architecture, the Shadow 14000 offers
network visibility and control for the storage network backbone. The
Shadow 14000 features hybrid Time Division Multiplex (TDM) managed
bandwidth scheduling to deliver dynamic bandwidth allocation,
bandwidth on demand and tunable network performance. The switch
supports up to 144 2 Gb/s any-to-any non-blocking network interface
and with system level redundancies with hot swappable components,
maintains a higher degree of availability and fault tolerance. We
believe products like the Shadow 14000 switch will be a key
technology in GUPFS to satisfy the QoS requirements.
|