IBM SP Parallel Scaling - Perspective
In the abstract, scaling is about translations of size or extent. Here's a view from 30,000 feet of the extent in time and level of parallelism of jobs
that are run on seaborg.
In the following graphs the horizontal axis is time (labeled by month and year) and the colored rectangles are parallel jobs each of which has a defined start and
stop time (its horzontal position) and a definite level of parallelism depicted here as the box's height (# nodes). The boxes are arranged vertically as best they can without overlapping. The color of each box corresponds to some property of the job itself, i.e., user who ran the job, how much memory was used, the wait time, etc.
Although it could use
more annotation, it does show reasonably well the changes in job mix, overall
level of parallelism in use, and changes in
the extent (scale) of the machine itself (see December 2002).
Have fun thinking about this. If you find it useful, have questions or ideas
how to improve it feel free to let me know.
-David Skinner (dskinner@nersc.gov)
hint: use your browser's horizontal scrollbar to navigate the data
Jobs by User
|
|
Jobs by Memory Usage per Task
|
(blue = less memory, red = more memory)
|
Jobs by Wait Time
|
(blue = shorter wait, red = longer wait)
|
Questions
- What can one learn from this?
- Some trends in job size are notable as it regards scaling. As seaborg
doubled in size , the number of jobs increased correspondingly, and over time
the number of jobs then decreased as the parallelism of jobs increased. The
wait time for large concurrency jobs is also seen to change in accordance
with changes in queueing policies. Recurring temporal usage patterns can be
identified (see the weekly change in wait time during 08/02-10/02).
Other trends and usage patterns can be eyeballed.
Being able to look at all or most of the data in a sensible way is meant to
give an abstract overview from which quantitative metrics involving reductions
of the data might be considered.
- What does the overall height or space between the boxes mean?
- They have no direct meaning since they depend on the
algorithm used to do the packing, the job mix, and the number of jobs. The goal is not to provide machine utilization data but rather to depict the extent, both temporal and parallel, of jobs on the machine. Interpeting the height as utilization is a common confusion with people new to this sort of display.
- Why are there gaps?
- If there are no jobs during some period of time either the machine was down or the Load Leveler data is missing.
- How was this done?
- I wrote a little c++ program, Industrious Box Packer (ibp.cc), that packs boxes in 2D. The graphical output is from PHP's png functions.
- Why are the narrow jobs at the bottom and the short jobs at the top?
- The algorithm is roughly to put taller boxes (more parallel) down
first and then try to fit smaller (less parallel)jobs around or in between them.
This provides a reasonably compact packing and nicely stratifies jobs between the capability and capacity realms.
- Who is Piet Mondrian?
- A neoplasticisist artist who was fond of rectangles.
- Where did this idea come from?
- Hard to tell for sure, but someone mentioned it might have something to do
with the area where I grew up.
Back to the Overview