HPSS Mass Storage
The High Performance Storage System (HPSS) is a modern, flexible,
performance-oriented mass storage system.
It has been used at NERSC for archival
storage since 1998.
At NERSC, the data in storage doubles almost every year. As of August 16 2007, we have over 3.5 petabytes of data stored in about 61 million files. HPSS sustains an average transfer rate of more than 100 MB/s, 24 hours per day, with peaks to 450 MB/s.
NERSC has two HPSS systems:
- archive
- Used for storing user files
- Default system used when no argument is given on the HSI or PFTP command on NERSC production machines.
- Better disk cache performance
- Generally delivers higher transfer rates and lower response times.
- hpss ("regent")
- Used for computer system backups.
- Initial releases of new storage software are deployed here first.
- Original NERSC HPSS production system.
Some characteristics of the NERSC HPSS systems:
- Theoretical capacity: 22 Petabytes.
- Buffer (disk) cache: 100 Terabytes.
- Theoretical maximum throughput: 3.2 GB/sec.
Users can access NERSC's HPSS machines through a variety of clients such as hsi, htar, ftp, pftp, and grid clients.
| Notices |
|---|
|
HPSS Accounts
All NERSC users have an HPSS account for each active username on the computational systems. If you have problems accessing your HPSS account contact the NERSC support office at 1-800-66-NERSC, menu option 2, or 510-486-8612.
Each HPSS account has a storage allocation. You are charged Storage Resource Units (SRUs) for HPSS usage. SRU charges are determined by a formula that takes into account (1) file space used, (2) the number of individual files, and (3) the amount of data transferred to and from HPSS. See HPSS Charging for more information. SRU account balances are available in the NIM web interface.
On May 19, 2003, HPSS quota restrictions went into effect. This means that if a user is out of Storage Resource Units in all their HPSS repositories, that user will be restricted. They will no longer be able to write data to HPSS (although they will continue to be able to read data).
Users can check their HPSS SRU balances by logging into the NERSC Information Management System and looking at the resource "HPSS" in their account usage summary. See also What happens if a repo or user SRU balance is negative?
NERSC HPSS Charging
NERSC uses Storage Resource Units (SRUs) to help manage HPSS storage. The goal is to provide a balanced computing environment with appropriate amounts of storage and adequate bandwidth to keep the compute engines fed with data. Performance and usage tracking allows NERSC to anticipate demand and maintain a responsive storage environment. Storage management also recognizes storage as a distinct resource, in support of an increasing amount of data intensive computing. Finally, storage management and the quota system are intended to encourage efficient usage by the user community.
Since late March 2003 SRUs have been managed by the NERSC Information Management (NIM) system. Prior to that time SRUS were managed from outside of NIM.
Since May 19, 2003, HPSS restrictions have been in effect. This means that if a user is out of Storage Resource Units in all their HPSS repositories that user will be restricted so that they can no longer write data to HPSS (although they will continue to be able to read data). See: What happens if a repo or user SRU balance is negative?.
Users can check their HPSS SRU balances by logging into the NERSC Information Management System and looking at the resource "HPSS" in your account usage summary.
A SRU Calculator is available for estimating SRU usage.
- Calculating a User's Storage Resource Units
- Apportioning User SRUs to Repositories: Project Percents
- User Quotas or Allowed Percents
- User Statuses for HPSS
- What happens if a repo or user SRU balance is negative?
- SRU usage reports
Calculating a User's Storage Resource Units
Effective with the SRU integration with NIM two changes were made to the SRU calculation, which resulted in lower SRU charges (lower by about 13.9 percent on average).
- A gigabyte is now considered to be 10243 bytes rather than the one billion bytes used previously (1,073,741,824 rather than 1,000,000,000).
- SRUs are computed daily rather than monthly.
Three measures of use are included in computing SRUs:
- Number of files stored (files)
- GB of space used in the archive (space)
- GB of I/O transferred (I/O).
The formula used to compute the number of SRUs incurred by a user each day is:
daily user SRUs = 0.0000393 x files + 0.0131147 x space (GB) + 4.0 x I/O (GB)
Yearly usage is the sum of daily usage; the yearly formula is:
yearly user SRUs = 0.01436 x Avg files
+ 4.787 x Avg space (GB)
+ 4.0 x I/O (GB)
For an explanation on how the formula was derived see SRU Formula Coefficients.
Apportioning User SRUs to Repositories: Project Percents
DOE's Office of Science awards Storage Resource Units to each NERSC project every year. The SRUs are deposited into the project's HPSS group account; this group account is called the HPSS repository (or repo). Users charge their HPSS SRU usage to the HPSS repos of which they are members.
If a login name belongs to only one HPSS repo all of its usage is charged to that repo. If a login name belongs to multiple repos its daily charge is apportioned among the repos using the project percents for that login name. Default project percents are assigned based on the size of each repo's storage allocation. The user (only the user, not the project managers) can change her or his project percents by selecting Change SRU Proj Pct from the Actions pull-down list in NIM's Main Menu. Users should try to set project percents to reflect their actual use of HPSS for each of the projects of which they are a member.
View "Change Project Percentages"
Note that this is quite different from the way that computational resources are charged.
- On each computational system each job is charged to a specific repository. This is possible because the batch system has accounting hooks that handle charging to project accounts. The HPSS system has no notion of project accounting, but only of user accounting. Rather, the user must say "after the fact" how to distribute her or his HPSS usage charges to the HPSS repos he or she is a member of. For a given project the MPP repository and the HPSS repository have the same name.
- On the PDSF users have access to the resources based on the share their experiment contributes to the PDSF. To see the HPSS repo your PDSF experiment uses you can enter the experiment name in the Repository box in NIM's Main Menu and hit Go. In the resulting project / repo display area click on the User Status by Repo tab to see the HPSS repo name. In some cases multiple PDSF experiments (called PDSF repos in NIM) share the same HPSS repo.
If a user changes her or his project percents this change will apply to all days in the month the change is made, but not to days prior to the month in which the change is made.
If a login name is added to a new repo or removed from an existing repo
If a login name is added to a new repo or removed from an existing repo the project percents for that user are adjusted based on the size of the SRU allocations of the repos the login name currently belongs to. However, if the user has previously changed the default project percents the relative ratios of these previously set project percents are respected.
For example: say that user u1 belongs to repos r1 and r2 and has changed the default project percents from 50% for each repo to 40% for r1 and to 60% for r2:
Login Repo Repo Allocation Proj%
u1 r1 50,000 SRUs 40
u1 r2 50,000 SRUs 60
Now assume that u1 becomes a new member of repo r3 which has a storage allocation of 100,000 SRUs. The project percents will be adjusted as follows (to preserve the old ratio of 40:60 between r1 and r2 while adding r3 which has the same SRU allocation as r1+r2):
Login Repo Repo Allocation Proj%
u1 r1 50,000 SRUs 20
u1 r2 50,000 SRUs 30
u1 r3 100,000 SRUs 50
If SRUs are added to or taken from an HPSS repo
If SRUs are added to or taken from an HPSS repo the project percents for the users in that repo are adjusted as needed to reflect the new sizes of each repo's storage allocation unless the user has changed the project percents from their default values (in this case the project percents are not changed).
For example: say that user u2 belongs to repos r1 and r2 and has not changed the default project percents. Repo r2 gets a new infusion of SRUS:
Login Repo Old Repo Alloc Old Proj% New Repo Alloc New Proj% u2 r1 50,000 SRUs 50 50,000 SRUs 25 u2 r2 50,000 SRUs 50 150,000 SRUs 75
User Quotas or Allowed Percents
Principal Investigators, PI Proxies and Project Managers can assign Allowed Percents (or user quotas) to each user in their repo. These allowed percents have been operational for MPP and PVP repos for a long time, but have only recently been available for HPSS (with the integration of SRUS in NIM).
The default Allowed Percent is 100% for each user; Project managers can change these as appropriate.
A user's HPSS allowed and used percentages as well as SRU balances are shown in NIM's Account Usage display:
- % Used:
- the percentage of the repo's HPSS SRU allocation that the login name has used
- % Allowed:
- the percentage of the repo's HPSS allocation that the login name is authorized to use (also known as the "user quota")
- Balance:
- the user's SRU balance for this repo. The login name's balance is computed by subtracting the login's usage within that repo from its "Allowed Percentage" of that repo. If the balance of the repository as a whole is less than the login's computed balance, than the lesser number (the repo's balance) is used instead. This user balance is shared with the other repo members.
View Account Usage Summary
User Statuses for HPSS
Within NIM the term User Status is used to display two sorts of statuses:
-
Repository user statuses:
In the
Account
Usage
Summary area
NIM displays the User Status for each
(login, repo) pair.
The Repository User Status (for the login name in that repo) is one of:
- Active
- The user is a member of the repo and has a positive user balance in that repo.
- Restricted
- The user is a member of the repo but has a negative user balance in that repo.
- Limited
- The user is no longer a member of the repo but still has limited access to its resources.
- Deleted
- The user has been removed from this repo.
- Admin Member
- The user is an administrative member (PI, PI Proxy or Project Manager) of the project who doesn't use this resource.
HPSS Repo User Statuses are also displayed in NIM's project / repo display area under the HPSS Usage, User %s tab.
-
Machine user statuses
(for computational resources or HPSS):
In the
login
info area
under the
Logins by Host tab NIM displays the status
for each login name on each
machine the user has access two. These Machine User Statuses have the
following meaning for HPSS:
- Active
- The login name is in a normal active state with no restrictions. The login name can read and write data.
- Restricted
- The login name is restricted because it has no repo to charge to. The login name can read data from HPSS but cannot write to HPSS.
- Disabled
- The login name has been temporarily disabled.
- Limited
- The login name is restricted because the user is no longer a member of any active repository. The login name can read data from HPSS but cannot write to HPSS. On HPSS a login name remains limited for about one year prior to being deactivated.
- Deactivated
- The login name has been disabled and its HPSS files can be archived to the "crypt". The login name no longer has access to HPSS.
- Crypt
- The login name's files have been moved to the crypt and may be deleted at any time. This has not yet been implemented.
- Deleted
- The login name has been removed from HPSS.
What happens if a repo or user SRU balance is negative?
Accounting information is sent from HPSS to NIM once daily (in the early morning, Pacific Time). At this time actions are taken if a repo or user SRU balance is negative.
If a repo runs out of SRUs all login names associated with it are marked as restricted for that repository (see repository user statuses).
Login names are "HPSS restricted" if all of the repos associated with this login name are restricted (see machine user statuses). HPSS restricted login names are able to read data from HPSS but cannot write any data to HPSS.
Likewise, when a login name goes over its individual "allowed percent" in a given repo, that (login, repo) pair is marked as restricted. The login name is HPSS restricted only if the (login, repo) repository user status is restricted for each repo associated with this login name.
HPSS repos that are negative continue to incur SRU charges every day for each member that has HPSS files or I/O activity. This is because there is a daily charge for files stored within HPSS and for I/O activity. Note that restricted users can still incur I/O charges by reading files. Also, project percents are not adjusted when a repo goes negative. See Calculating a User's SRUs and Apportioning User SRUs to Repos.
Likewise, a user who has gone over her or his allowed percent in a given repo will continue to incur charges in that repo. Project percents are not automatically adjusted when a login name exceeds its allowed percent, although a Project Manger can ask the user to adjust them.
SRU Usage Reports
The following SRU Usage Reports are available in NIM:
- Search Daily HPSS User Usage:
Use this query to see the actual number of files and gigabytes a user (login name) has stored in HPSS on a daily basis, I/O transactions to and from HPSS, aw well as the amounts these three usage factors (files, space and I/O) contribute to the user's SRU charge.
From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Daily User HPSS. You will see the following query:
View "Search Daily HPSS Usage"
Note that:
- HPSS usage data is stored by "Begin Date" and "Last Date"; in every usage record the number of files and Gbytes stored remains constant from Begin Date to End Date. You can think of End Date as an approximation of the Job Date used for MPP usage queries.
- By Default the user's organization will not be displayed in the report (it will if you deselect the Hide? box).
- Search Monthly HPSS User Usage:
Use this query to see the average number of files and gigabytes a user (login name) has stored in HPSS each month, total I/O transactions to and from HPSS by month, aw well as the amounts these three usage factors (files, space and I/O) contribute to the user's SRU charge.
- Search Yearly User/Repo Usage:
Use this query to search yearly or year-to-date (for the current fiscal year) usage information. For FY 2003 and later HPSS and MPP usage is available; for FY 2002 HPSS, MPP, and PVP usage is available; for FY 2001 MPP and PVP usage is available. Project Managers can use this query to find users who are within a certain percentage of their user quota (or allowed percent).
From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Yearly Usr/Repo. You will see the following query (for an HPSS report set the Resource Type to HPSS):
- Search Year to Date Repository Usage:
Use this query to get a summary usage report of all NERSC repositories. From the Search & Reports pull-down menu in NIM's Main Menu frame select Use: Yearly Repo. For an HPSS report set the Resource Type to HPSS. For more information see Year to Date Repository Usage Query and Year to Date Repository Usage Report.
HPSS Passwords
HPSS uses the Distributed Computing Environment (DCE) for user authentication. DCE accounts are currently handled separately from other NERSC accounts.
Your DCE username, which is the same as your NERSC username, is known as your "DCE Principal."New or Forgotten Passwords
If you are getting a new HPSS account or forget your HPSS/DCE password contact NERSC Support at 1-800-66-NERSC, option 2, or (510) 486-8612 to get your password reset.
The password you receive from NERSC Account Support is temporary and must be changed before you access HPSS. Follow the procedure described below for changing your password.
Changing Your HPSS Password
You change your HPSS/DCE password by using ssh to connect to the NERSC Authentication Server, "auth.nersc.gov", and following the next example.
You will need to know a special login/password pair to log onto the authentication server, "auth.nersc.gov". This information can be obtained by logging onto any NERSC machine and typing the command:
% module help WWW
Note that this special login/password pair is only for initial access to the authentication server and is not to be confused with your HPSS/DCE username and password. To change your password do:
% ssh auth.nersc.gov -l {special login}
Enter Password: {special password}
[auth]: chpass
DCE Principal: your_HPSS_user_name
Enter Password: your_HPSS/DCE_password
New Password: new_HPSS/DCE_passwd
Re-enter new Password: new_HPSS/DCE_passwd
% exit
Note: The above example is similar, but not identical to the procedure used to generate the encrypted identity combo strings needed for FTP access to HPSS; to learn more about that matter, go to the PFTP/FTP Authentication web page.
For more general info on DCE technology, see the DCE web page.
Accessing HPSS
Once you have successfully set up your account and password, you can access HPSS using the HSI and HTAR utilities, NERSC's PFTP utility, clients that use the FTP protocol. and grid-enabled clients.
HPSS cannot be accessed via SSH.
Access from NERSC platforms
HSI, HTAR, and PFTP are available on NERSC platforms. The default HPSS system is archive and need not be specified on the command line. The older regent HPSS system should be specified as "hpss" when using the HSI utility at NERSC. Specifying the full DNS names archive.nersc.gov or hpss.nersc.gov will result in network traffic using slower external networks.HPSS can be accessed interactively and used in batch scripts.
The first time you log into HPSS with HSI, you are prompted for your "DCE Principal" (NERSC username) and password and an .hsipw file is created in your HOME directory. The next time you access HPSS, HSI will use the information contained in this file to authenticate you to the system. You will not be prompted for a username/password pair. Occassionally this .hsipw may become corrupted. If you have trouble connecting to HPSS, try removing this file and reauthenticating.
Access from outside NERSC
The NERSC production HPSS system is known as "archive.nersc.gov". "hpss.nersc.gov" is the "regent" system that is used for system backups, although it does contain some older user data.
archive.nersc.gov can be reached using the hsi and htar utilities and ftp clients. The HSI utility is available for download and use by NERSC users. FTP clients have to use a special process for encrypting username/password pairs.
Accessing HPSS - HSI
HSI is a flexible and powerful interface utility to HPSS. The HSI commands are similar to those in ftp and pftp (e.g., put and mput) and UNIX (e.g., mv, mkdir, rm, cp, cd). HSI also has commands similar to those in CFS. HSI can be used both interactively or in batch scripts.
A related utility, HTAR, is useful for archiving multiple files to HPSS without using the intermediate local file storage that would be needed if one first used the tar utility followed by HSI.
For complete documentation on HSI, see the HSI documentation. There are man pages for HSI on production NERSC computers.
Authentication
NERSC's HPSS systems rquire DCE authentication. Your DCE Principal (login name) is the same as your NERSC user id. Your DCE password, however, is handled differently. See the HPSS Passwords section of this document.
Your first login to HPSS with HSI will require the use of both your DCE principal (login name) and your DCE password. After this HSI will have established a local credential on the computer from which you connect. Subsequent connections from that system will be automatically authenticated. To force generation of new credentials, use the -L option, as follows:
% hsi -L
Connecting with HSI
From NERSC machines
To connect to the main user system (archive) from a NERSC machine, simply start the HSI utility:
% hsi
This is equivalent to the following:
% hsi archive
From NERSC machines do not specify archive.nersc.gov as this will force HSI to use the slower external network interface.
NERSC's other HPSS system - the original HPSS system at NERSC - is named "hpss". (It seemed like a good idea at the time, but now causes confusion. It is know internally as "regent"). This system is now used for backups, and does not offer the same capacity and performance as "archive." However, it does contain some older user data. To connect to it from within NERSC use the command:
% hsi hpss
From outside NERSC
NERSC's HPSS system can be accessed via HSI from outside the NERSC domain. The discussion above applies except the DNS (internet) names of the machines should be specified as one of:
- archive.nersc.gov (current production)
- hpss.nersc.gov(deprecated for user file storage)
Remember that hpss.nersc.gov is not the current NERSC production machine for users.
HSI binaries for a number of different platforms can be downloaded by NERSC users.
Starting and Using HSI
HSI can accept input several different ways; some examples:
| From a command line: | hsi |
| Single-line execution: | hsi "mkdir foo; cd foo; put data_file" |
| From a command file: | hsi "in command_file" |
HSI can also read from standard input and write to standard output using pipes.
For "get" and "put" operations, HSI uses a special syntax to identify and separate the local and HPSS file names:
- The local file name is always on the left, and the HPSS file name is always on the right.
- A ":" (colon character) always separates the local pathname from the HPSS pathname, and the colon character must be surrounded by whitespace.
Examples:
% put local_file : hpss_file % get local_file : hpss_file
Recursive operations are allowed for the following commands:
cget, chgrp, chmod, chown, cput
delete, get, ls, mdelete, mget, mput
put, rm, stage, touch
Wildcards are supported.
Frequently Used Commands
HSI's command set is rich, and will look familiar to users of UNIX, FTP, and other storage utilities. A small set of commands will satisfy most user storage needs.
Short List of HSI Commands by Function
A complete list of HSI commands is available in Chapter 8 of the HSI manual.
HPSS File and Directory Commands
| Command | Function |
|---|---|
| cd | Change current directory |
| get, mget | Copy one or more HPSS-resident files to local files |
| cp | Copy a file within HPSS |
| rm mdelete | Remove one or more files from HPSS |
| ls | List a directory |
| put, mput | Copy one or more local files to HPSS |
| pwd | Print current directory |
| mv | Rename an HPSS file |
| mkdir | Create an HPSS directory |
| rmdir | Delete an HPSS directory |
Local File and Directory Commands
| Command | Function |
|---|---|
| lcd | Change local directory |
| lls | List local directory |
| lmkdir | Make a local directory |
| lpwd | Print current local directory |
| command | Issue shell command |
File Administrative Information
| Command | Function |
|---|---|
| chmod | Change permissions of file or directory |
Miscellaneous HSI commands
| Command | Function |
|---|---|
| help | Display help information |
| quit, exit, end | Terminate HSI |
| in | Read commands from a local file |
| out | Write HSI output to a local file |
| log | Write all HSI commands and responses to a local log file |
| prompt | Toggles HSI prompting for mget, mput, and mdelete |
Accessing HPSS - HTAR
While HSI is a very fast and flexible tool for dealing with large data transfers, it can be slow for transferring a large number of small files or data from a stream buffer. For these cases the HTAR utility performs well. HTAR
- creates tar files directly in HPSS, thereby avoiding the extra copying and creation of a tar file on production machine filesystems
- significantly outperforms piping data, e.g. through hsi, by computing the appropriate class of service for the aggregate size of the files to be transfered
- produces easily readable indices for finding and retrieving data archived in tar files
Connecting with HTAR
Connections with HTAR operate in much the same way as in HSI. See the HSI page for information about connecting to NERSC's HPSS systems. HTAR is available on NERSC computers and is available for downloading for NERSC users to use from their local machines.
The target HPSS system is specified with the -Hserver option, e.g.:
% htar -Hserver=archive.nersc.gov -tvf blah.tar
Using HTAR
For details, see the HTAR man page.
HTAR operates much the same as the unix tar command but with the tarfile archive residing in HPSS storage. Archive creation "-c" puts data into HPSS and archive extraction brings data to your local machine.
Basic Syntax:
The core syntax for HTAR is analogous to unix tar:
htar -{c|K|t|x|X} -f tarfile [directories],[files]
As in the unix tar command the "-c" "-x" and "-t" options respectively function to create, extract, and list tar files. The "-K" option verifies an existing tarfile in HPSS.
One useful feature of HTAR is the creation of indexes ".idx" files. These files provide a means to find the location of files prior to retrieval of a tarfile. Using the "-X" option will cause htar to build an index file for a specifed standard tar file, so it can subsequently be used with htar.
Frequently Used HTAR Invocations
A small set of cases will satisfy most user storage needs.
| Command | Function |
|---|---|
| htar -cvf dirs.tar directory1 directory2 | Create dirs.tar in HPSS containing directory1 and directory2; provide verbose listing of actions while processing. |
| htar -cf files.tar file1 file2 | Create files.tar in HPSS containing file1 and file2. |
| htar -tvf files.tar | List contents of files.tar in HPSS. |
| htar -xvf files.tar | Extract contents of files.tar in HPSS. |
| htar -Xf files.tar | Build index file "files.tar.idx" for tar file files.tar. |
The HTAR man page has more detailed information on this utility.
Accessing HPSS - ftp/pftp
File can be transfered to and from HPSS via the standard internet protocol ftp and HPSS pftp utility. There is no sftp (secure ftp) or scp access.
As standard ftp clients only support authentication via the transmission of unencrypted passwords, which NERSC does not permit, special procedures must be used with ftp and pftp. The procedures are described below. The NERSC HPSS ftp daemons also support kerberos ftp clients.
PFTP
PFTP is a variant of ftp which is available on NERSC systems. It is better than ftp for large file transfers (> 100 MB). PFTP has the advantage of being compatible with NERSC "sleepers," which will gracefully suspend connections when HPSS is down or unavailable.
pftp/ftp Authentication
NERSC has developed an ftp access method that does not send your username/password pair over the network in plain text. Your plain text username and password will not work when you use ftp to connect to HPSS.
To be able to use ftp you must generate two text strings which contain information about your account in encrypted form. These strings are then used as your ftp "username" and "password." Each encrypted pair also contains information about the specific subnet from which they were generated. Additional encrypted pairs must be generated for each subnet from which you want to use pftp/ftp to connect to HPSS.
Encrypting your password
In the example to follow, this machine is named "highline".In the following steps, all text the user must type is shown in red.
Step 1
You need to log on to the authentication server, "auth.nersc.gov. to encrypt your username/password. If you don't know the special login/password pair to log on to this server, the information can be obtained by logging into any NERSC system and typing the command:
module help WWW
Note that this special login/password pair is only for initial access to the authentication server and is not to be confused with your DCE/HPSS login and password that you will be encrypting.
Step 2
In a window (xterm) on your workstation, connect via ssh to the NERSC authentication server, "auth.nersc.gov".
highline 10: ssh auth.nersc.gov -l {special login}
auth@mover2.nersc.gov's password: {special password}
<Login notice info removed>
You are in an authentication shell
Type help to list the commands you can run
[auth]:
Now you are in a restricted shell that will accept only a few commands. Among them is "ftppass", which will be used in step 3. You can see the allowed commands via the "help" command:
[auth]: help The following commands are the only ones recognized: ftppass ftpproxy chpass help h quit q exit For abbreviated help on commands type 'help commandname' The commands: q, quit and exit will all exit auth [auth]:
Step 3
Use the "ftppass" command to generate an encrypted_string combo of your HPSS username and password; these will be used to access pftp/ftp instead of your usual HPSS login id and password.
[auth]: ftppass DCE Principal: your_HPSS_username DCE Password: your_HPSS_password login [encrypted_string] password [encrypted_string] [auth]: exit Bye Connection to auth.nersc.gov closed.
The encrypted_strings are those returned in the lines beginning with "login" and "password". These are to be used as your "login" and "password" when connecting to HPSS via ftp.
Proxy Servers
If you are behind a firewall and make pftp/ftp connections through a proxy server you can use the ftpproxy command to connect to auth.nersc.gov from one network and generate keys for another network.
The syntax for a proxy server with address 123.45.56.78 is
[auth]: ftpproxy 123.45.56.78
Replace the IP address above with that of your IP proxy server.
Automatic authentication using a .netrc file
On UNIX hosts your may place your encrypted strings in a .netrc that resides in your HOME directory. This is a text file with sets of three-line entries, one for each system you wish to access, of the following form:
- The first line specifies the name of the storage system;
- The next two lines are the "login" and "password" lines returned by auth.nersc.gov.
For example
machine archive.nersc.gov login [encrypted_string] password [encrypted_string]
Multiple pftp/ftp hosts can be put in the .netrc file, separated by blank lines.
Make sure the UNIX permissions for the ".netrc" file is "600" or "Owner Read-Write"; if they are anything else, the file will not be used by pftp/ftp and the process will not work.
When you have stored your encrypted_strings in your .netrc file, you will not need to type in your username/password combination to gain pftp/ftp access to HPSS.
Using HPSS from Batch Jobs
Once you are set up for automatic authentication (see the sections on HSI, HTAR, and ftp/pftp) you can access HPSS from within batch scripts.
HSI will accept one-line commands on the HSI command line, e.g.:
hsi put filename
HSI, ftp, and pftp read from Standard Input (STDIN) and a list of commands can be placed in a text file (script) and redirected into the given utility, e.g.:
ftp < file_with_ftp_commands
"Here" Documents
Another method uses what are called "Here Documents," in which the commands are embedded in the batch script rather than in a separate file external to the main script. The start of a "here-doc" block in a script is signalled by the presence of double angle brackets: << followed by a identifying tag. Lines up to the line containing the tag are treated as if they had been typed at the command prompt.
Here is a simple script which performs an ftp file transfer:
pftp -v -i archive <<_EOS cd my_HPSS_directory mget data* quit _EOS
This example will execute the FTP commands between the "_EOS" strings.
Project Directories
A special project directory can be created in HPSS for groups of researchers who wish to easily share files. The files in this directory will be readable by all members of a repository.
Project directories for group sharing of files will be made available on request. These directories will have the following properties:
- reside at /nersc/projects for the "archive" HPSS system.
- be owned by the PI or designated other;
- have a suitable group attribute (new group if required).
Periodically, all objects in a project directory will be changed to be owned by the owner of that project directory, i.e. the PI or designated other. In terms of accounting, this means that a file may initially be charged to the creator of the file, but eventually will be charged to the owner of the project directory. This solves the potential ownership and accounting problems which may arise as individual researchers leave projects.
To request creation of a Project Directory, the PI or Repository Manager of the requesting repository should send email to consult@nersc.gov with the following information:
- the repository for which it will be set up
- the names and user IDs (login IDs) of the users who will have access to the directory, if not all users of the repository will (the default is that all will)
The NERSC Consultants will notify the requester when the directory is ready for use.
Accessing HPSS - Sleepers
Sleepers are only available using the HSI or PFTP clients from NERSC production machines.
When scheduled maintenance or unexpected events necessitate taking HPSS down, "sleepers" are enabled. This causes all jobs attempting to use HPSS to wait. Usually this causes no problems for these jobs, which resume safely when sleepers are removed. However, users may wish to test for HPSS system availability, and take alternate actions based on this, so a way to detect sleepers is available.
Testing for sleepers can be accomplished by using the "hpss_avail" utility, which is available on all NERSC supercomputers. This utility takes a single argument, which may be "archive", "hpss", or "help"; case is not significant. Any other argument, or none, will result in usage text being returned. The "help" argument will result in more detailed help text. The utility returns its result in the predefined shell variable "status ($? in some shells)". It may be tested, used in a subsequent shell command, or output. Its value will persist only until the next shell command is executed, and then it will be overwritten by the results of that next command. Here are two examples of querying a system and printing a message based on the returned status value. The first uses the C Shell and the second the Korn shell.
#!/bin/csh
hpss_avail archive; set READY=$status
if ($READY == 0) then
echo "ARCHIVE up and available"
else
echo "ARCHIVE is unavailable"
endif
#!/usr/bin/ksh
hpss_avail archive
READY=$?
if [ $READY -eq 0 ]; then
echo "ARCHIVE up and available"
else
echo "ARCHIVE is unavailable"
fi
Possible alternative actions to take when sleepers are enabled might include (1) moving files to alternate file systems, such as $HOME; or (2) changing file names to prevent overwriting or name collisions by subsequent file creations.
Accessing HPSS - Usage Advice and Examples
This section advises and demonstrates some useful techniques for using HSI and ftp/pftp, including their use in batch scripts.
- Some Advice on Efficient Use of HPSS
- A Complete Batch Script Using PFTP
- A Complete Batch Script Using HSI
Some Advice on Efficient Use of HPSS
Accessing HPSS in Batch Jobs
Each HPSS read request can involve a storage library mounting a tape, which may
take an arbitrary amount of time, depending on how many requests that library is
currently servicing. Doing HPSS reads in a batch job can stall the entire ensemble of
processors dedicated to the job. A better strategy is to read any files needed by
batch jobs in advance of the job's execution; NERSC provides a special batch job
class, named "xfer" for HPSS file transfers. Files in user
$SCRATCH space on NERSC supercomputers will likely persist there for
several weeks, so pre-reading can be done in advance of submitting the batch job that
will use them. Writing files into HPSS generally takes less time than reading them,
since they are written into HPSS disks, and transferred to tape later.
Accessing HPSS in a Single Session
Each invocation of HSI or ftp/pftp constitutes a
separate "session" on the NERSC servers, and each session involves startup and
shutdown overhead. It is more efficient to perform multiple operations in a single
session, than to use multiple sessions each to perform a single operation. This
means is it inefficient call either of these utilities in a scripted loop; it's
better to generate a list of files in a loop, and use that for a set of commands
in a single session. Command files can be used with HSI via the in
command, as documented in the
HSI User Guide.
Ordering Multiple-File Reads
Files and directories located logically close together within HPSS may reside on
different tapes, so multiple-file read commands can incur multiple tape-mount delays.
A useful technique for reading many files in a single session is to first use the
HSI command "ls -P" to produce a list of the required
directories and/or files, and direct the command's output into a file. Sort that
output file on the last two fields in the output lines, i.e., tape position, and
tape identifier, respectively. Perform the sorts to group the file lines by tape ID,
and in ascending positional order for each tape. Edit the file to remove extraneous
lines and fields, and perform the get operations on the desired files
in their sorted ordering. The resulting command input file can be used with the HSI
in command in a single session, and will minimize tape delays and
decrease overall access time.
File can be aggregated into collections with the HTAR utility, allowing more efficient access to members of the collection. HTAR writes tar-like archive files directly into HPSS, with a companion index file to each archive. This allows subsequent reads of any subset of an htar archive's contents with only a single tape mount. File sets that were written unaggregated can be re-written with htar after being read. The cost of this rewriting is the extra storage resources used, since the original files are not removed.
Example 1. Complete Batch Script Using PFTP
This example shows a batch script with pftp actions in it. In this more complex example, we show the use of both single and multiple-file movement commands, as well as directory change commands. Here, also, we show the "+" character used to bracket a "here document." This example also assumes that you have a ".netrc" file in your home directory with the appropriate encrypted password combination.
#!/bin/csh
# First, copy the source from the submitting directory
pftp -i -v archive <<+
cd my_HPSS_directory
mget data*
get source.f
quit
+
./myprog data outfile
# Save the output file in HPSS.
pftp -i -v archive <<+
cd my_HPSS_directory
put outfile
mput restart*
quit
+
exit
Example 2. Complete Batch Script Using HSI
This example shows a script containing HSI actions. In this example, we show the use of HSI commands that accomplish the same actions ftp does in Example 2, above. Note that in this case, a single-line command is used, so no "here-doc" is needed. This simplifies the script, and demonstrates some of HSI's advantages over pftp or ftp. This script assumes that you have previously interactively logged into HSI at least once to encrypt your username/password.
#!/bin/csh
# First, copy the data and program source from the
# submitting directory
hsi archive "cd my_HPSS_directory; get data* source.f"
./myprog data outfile
# Save the output file in HPSS.
hsi archive "cd my_HPSS_directory; put outfile; \
put restart*"
exit
Note that in the above, the individual hsi commands are separated by semicolons, (;) and the set of commands is contained in quotes, ("). The semicolons are necessary, and are currently the only allowed command separator. The quotes are required to prevent shell interpretation of wild card characters, and are recommended for general safety in one-liners. Note that the suppression of shell interpretation prevents the effective use of wild-card file and directory specifications in one-liners.
Unlike an interactive HSI session, no termination command (e.g. exit, quit, etc.) is needed in a one-liner.
In addition to one-line commands, HSI can also take input command sets from files. For more information on this see the HSI Documentation.
For New HPSS Users
If you are a new user to of NERSC's HPSS system for storing your data, there are a few things you need to know. The questions and answers below will guide you through the process of starting to use HPSS.
How do I get an HPSS account?
If you have never used HPSS, call the NERSC Account Support staff at 1-800-66-NERSC, menu option 2, or 1-510-486-8612 to set up an account. Your HPSS user name will be one of your existing NERSC computer user names (if you have one). You will be given a temporary DCE password which you must change before you can successfully use HPSS.
How do I change my initial temporary password?
You need to log into our authentication server and use the chpass command.
How do I access HPSS?
You can access NERSC's HPSS systems from NERSC production platforms and from any machine that supports HSI/pftp/ftp/Grid.
Why doesn't my username and password work with ftp or pftp?
For security reasons, NERSC no longer supports using clear text names and passwords for pftp or ftp. You can find out how to encrypt your password in pftp/ftp Authentication.
![]() |
Page last modified: Thu, 23 Jun 2005 22:33:18 GMT Page URL: http://www.nersc.gov/nusers/systems/hpss/print.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |
![]() |

