NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 
htar Command

Purpose
-------
Manipulates HPSS-resident tar-format archives.


Why Use htar?
-------------

htar has been optimized for creation of archive files directly in HPSS, 
without having to go through the intermediate step of first creating 
the archive file on local disk storage, and then copying the archive 
file to HPSS via some other process such as ftp or hsi. The program 
uses multiple threads and a sophisticated buffering scheme in order to 
package member files into in-memory buffers, while making use of the 
high-speed network striping capabilities of HPSS.

In most cases, it will be significantly faster to use htar to create a 
tar file in HPSS than to either create a local tar file and then copy 
it to HPSS, or to use tar piped into ftp (or hsi) to create the tar 
file directly in HPSS.

In addition, htar creates a separate index file, which contains the 
names and locations of all of the member files in the archive (tar) 
file. Individual files and directories in the archive can be randomly 
retrieved without having to read through the archive file. Because the 
index file is usually smaller than the archive file, it is possible that 
the index file may reside in HPSS disk cache even though the archive 
file has been moved offline to tape. Since htar uses the index file for 
listing operations, it may be possible to list the contents of the 
archive file without having to incur the time delays of reading the 
archive file back onto disk cache from tape.

It is also possible to create an index file for a tar file that was not 
originally created by htar or to recreate an index that has been 
unintentionally deleted.

Syntax
------

htar  -{c|t|x|K|X} -f archive_file [-?] [-B] [-E] [-L inputlist]
       [-H opt[:opt...]] [-h] [-M maxfiles] [-m] [-o] 
       [-d debuglevel] [-p] [-v] [-V] [-w] 
       [-I {IndexFile | .suffix}] [-Y [archive_file COS ID][:Index File COS ID]]
       [-S Bufsize] [-T Max Threads] [Filespec | Directory ...]

The htar command manipulates HPSS-resident archives, or archives that reside
on a remote system (subject to the restrictions noted below), by writing files to,
or retrieving files from HPSS.

Htar files written to HPSS are in the POSIX 1003.1 "tar" format, and may be retrieved
from HPSS and read by native "tar" programs.  

The local files used by the htar command are represented by the Filespec 
parameter. If the Filespec parameter refers to a directory, then that directory, 
and, recursively, all files and directories within it, are referenced as well.

Unlike the standard Unix "tar" command, there is no default archive device; 
the "-f archive_file" flag is required.

"Archive" and "Member" files
-----------------------------

Throughout the htar documentation, the term "archive file" is used to refer
to the tar-format  file, which is named by the "-f archive_file" command 
line option. The term "member file" is used to refer to individual files 
contained within the archive file.

HTAR Index File
----------------
As part of the process of creating an archive file on HPSS, htar also 
creates an index file, which is a directory of the files contained in 
the archive. The Index File includes the position of member files 
within the archive, so that files and/or directories can be randomly
retrieved from the archive without having to read through it sequentially.
The index file is usually significantly smaller in size than the
archive file, and may often reside in HPSS disk cache even though 
the archive file resides on tape. All htar operations make use 
of an index file.  

It is also possible to create an index file for an archive file that was 
not created by htar, by using the "Build Index" [-X] function (see below).

By default, the index filename is created by adding ".idx" as a suffix
to the Archive name specified by the -f parameter.  A different 
suffix or index filename may be specified by the "-I " option, as
described below.

By default, the Index File is assumed to reside in the same directory
as the Archive File.  This can be changed by specifying a relative
or absolute pathname via the -I option. The Index file's relative pathname 
is relative to the Archive File directory unless an absolute pathname
is specified. 

Use of Absolute Pathnames
-------------------------
Although htar does not restrict the use of absolute pathnames (pathnames 
that begin with a leading "/") when the archive is created, it will remove 
the leading / when files are extracted from the archive. All extracted 
files use pathnames that are relative to the current working directory.

However, when using the "verify" action (-K), absolute pathnames are used unless
the -Hrelpaths ("relative paths") option is specifed (see below).

HTAR Consistency File
---------------------
HTAR writes an extra file as the last member file of each Archive,
with a name similar to:

        /usr/tmp/HTAR_CF_CHK_64474_982644481

This file is used to verify the consistency of the Archive File and
the Index File.  Unless the file is explicitly specified, HTAR does not 
extract this file from the Archive when the -x action is selected.
The file is listed, however, when the -t action is selected.

Tar File Restrictions
-----------------------
When specifying path names that are greater than 100 characters for
a file (POSIX 1003.1 USTAR) format, remember that the path
name is composed of a prefix buffer, a / (slash), and a name
buffer.

The prefix buffer can be a maximum of 155 bytes and the name buffer
can hold a maximum of 100 bytes. Since some implementations of TAR
require the prefix and name buffers to terminate with a null ('\0')
character, htar enforces the restriction that the effective prefix
buffer length is 154 characters (+ trailing zero byte), and the
name buffer length is 99 bytes (+ trailing zero byte). If the path 
name cannot be split into these two parts by a slash, it cannot be 
archived. This limitation is due to the structure of the tar archive 
headers, and must be maintained for compliance with standards and 
backwards compatibility. In addition, the length of a destination for 
a hard or symbolic link ( the 'link name') cannot exceed 100 bytes
(99 characters + zero-byte terminator).

HPSS Default Directories
------------------------

The default directory for the Archive file is the user's HPSS home directory.
An absolute or relative HPSS path can optionally be specified for either
the Archive file or the Index file. By default, the Index file is created
in the same HPSS directory as the Archive file.

Local Temporary Directory
--------------------------

HTAR makes use of the TMPDIR environment variable when creating
temporary files.  If TMPDIR is not set in the environment, then 
"/tmp" is used.

HTAR Command Options
---------------------
Htar arguments can be catorgorized into three types: file specification
arguments, action arguments and optional arguments.  The file 
specification arguments determine the name of the archive file to
be created in HPSS and the member files to operate on.  The action
flag specifies the operation to be performed by the htar command.  The
optional flags modify htar behavior.

Archive File and File Specification Arguments
---------------------------------------------

-f archive_file   Uses "archive_file" as the name of archive to be read or
written. Note: This is a required parameter for htar, unlike the standard
tar utility, which uses a built-in default name.

Filespec           A file specification has one of the following forms:
                   WildcardPath
                       or
                   Pathname 
                       or
                   Filename  

"WildcardPath" is a path specification that includes standard filename
pattern-matching characters, as specified for the shell that is
being used to invoke htar.  The pattern-matching characters are
expanded by the shell and passed to htar as command line arguments.

Note that using wildcard characters for the -t  and -x actions may
not work as expected since the shell does pattern expansion. Unless 
there are existing local files that match
the pattern. For example, 

    htar -xf someFile.tar a*
	
will only extract files beginning with "a" in "someFile.tar" 
that also already exist in the current local working directory.


Action Flags
-------------
Exactly one of the following action flags must be specified: 

-c      Creates a new HPSS-resident archive file, and writes the local 
files specified by one or more File parameters into the archive. 
Warning: any pre-existing archive file will be overwritten without
prompting. This behavior mimics that of the AIX tar utility.

-t      Lists the member files in the order in which they appear in the HPSS-
resident archive file.   Listable output is written to standard output;
all other output is written to standard error.

-x      Extracts the member files specified by file specification arguments.
If the file specification arguments refer to a directory, the htar command 
recursively extracts that directory and all of its subdirectories from the
archive file. If the File parameter is not specified, htar extracts all of
the files from the archive file. If an archive contains multiple copies of
the same file, the last copy extracted overwrites all previously extracted
copies. If the file being extracted does not already exist on the system, it 
is created. If you have the proper permissions, then htar command restores 
all files and directories with the same owner and group IDs as they have 
on the HPSS tar file. If you  do not have the proper permissions, then 
files and directories are restored with your owner and group IDs. 

-K      Verifies the contents of the archive file , based upon the verification
level options given by the -y parameter, and any applicable -Hverify or
-Hrelpaths options.

-X      builds a new index file by reading the entire tar file. This operation
is used either to reconstruct an index for tar files whose Index File is
unavailable (e.g., accidentally deleted), or for tar files that were not
originally created by htar. 


Optional Flags
--------------

-?      Displays htar's verbose help

-B      Displays block numbers as part of the listing (-t option). 
This is normally used only for debugging.

-d debuglevel   Sets debug level (0 - N) for htar. 0 disables debug,
1 - n enable progressively higher levels of debug output. 5 is the
highest level; anything > 5 is silently mapped to 5.

-E      If present, specifies that a local file should be used for the
file specified by the "-f archive file" option.  If not specified, then the
archive file will reside in HPSS.  

[-H opt[:opt...]]  Specifies HPSS-specific options.  Multiple "-H" parameters
may be specified, and multiple colon-separated options may be specified for each 
-H.  Options may be either standalone keywords, or may be of the form "opt=value".
The option string must not contain whitespace characters.  

    Opt may be any of the following:

    nostage  - specifies that HTAR should try to read the archive file directly
        from tape for read operations such as -x (extract), rather than having HPSS
        potentially stage the entire file onto disk cache when it is opened.

        This option can be useful when only a small number of files are being 
        extracted from a large archive. However, misuse of this option can cause 
        HPSS tape drive resource contention, and should normally be used only
        after coordinating with the site's HPSS administrators.

    crc - specifies that HTAR should generate CRC checksums when creating the 
        archive. For extract (-x), specifying this option will cause checksums to
        be regenerated and verified for files that were added to the archive with 
        checksums enabled. For build index (-X), this option will cause the archive 
        to be read, and a checksum to be added to the index. For list (-t) operations, 
        this option will cause the checksum to be listed following the object
        permissions

    nocrc - specifies that HTAR should should not generate CRC checksums 
        when writing to the archive (-c or -X) or regenerate and compare CRCs
        (-x).

    rmlocal  - specifies that HTAR should attempt to remove local files on a
        creation run (-c) after the archive is created and any post-transfer 
        verification has completed without errors.  Only local files that were
        successfully copied to the archive will be removed.  Local directories
        are not affected by this option, only files and symbolic links.
    
     verify=option[,option...] - specifies one or more verification options that
        should be performed following successful creation of the archive (-c),
        or for the "verify" (-K) command.  Multiple options can be specified
        by separating them with a comma, with no whitespace. Options are processed
        from left to right, and, in the case of conflicting options, the last one
        encountered is used without comment.

        Options are as follows:
        
        info        - compares tar header info with the corresponding values in the
                      index
        crc/nocrc   - enables CRC checking of archive files for which a CRC
                      was generated when the file was added to the archive

        compare/nocompare - enables/disables a byte-by-byte comparison of archive member
                      files and their local file counterparts. If -Hrelpaths
                      is not specified, then absolute paths for member files
                      in the archive will also be treated as absolute local
                      paths.

        0           - enables "info" verification
        1           - enables level 0 + "crc" verification
        2           - enables level 1 + "compare" verification
        all         - enables all comparison options (currently, tar hdr checking,
                      CRC checking, and local file comparisons).

    relpaths  - specifies that HTAR should use relative paths instead of
        absolute paths when comparing the archive and local member files.
        This option was intended to provide a way to compare files with 
        absolute paths on an archive with member file(s) that were created
        with relative paths by a previous "extract" (-x) action.
    

-h      Forces the tar command to follow symbolic links as if they were
normal files or directories. Normally, the tar command does not follow
symbolic links.

-I index_name   Specifies the index file name or suffix.  If the first
character of the index_name is a period, then index_name is appended
to the Archive name, e.g. "-f the_htar -I .xndx" would create an index
file called "the_htar.xndx".  If the first character is not a period, then
index_name is treated as a relative pathname for the index file (relative
to the Archive file directory) if the pathname does not start with "/",
or an absolute pathname otherwise.

The default directory for the Index file is the same as for the Archive
file.  If a relative Index file pathname is specifed, then it is
appended to the directory path for the Archive file.  For example, if
the Archive file resides in HPSS in the directory "projects/prj/" 
and is called files.tar, then an Index file specification of 
"-I projects/prj/files.old.idx" would fail, because htar would look for 
the file in the directory "projects/prj/projects/prj".  The correct 
specification in this case is "-I files.old.idx".

-L InputList    Writes the files and directories listed in the "InputList"
file to the archive. Directories named in the InputList file are
treated recursively (Note: this was not the case for earlier versions
of HTAR). Note that "home directory" notation ("~") is not expanded for 
pathnames contained in the InputList file, nor are wildcard characters, 
such as "*" and "?".

-M maxfiles      Sets the maximum number of member files that can be contained
in the archive when it is initially created. The default maximum number of
member files, and an absolute maximum number of files, are defined when HTAR is
built. No limit will be enforced if:

- The default maximum number of files was set to a negative value when HTAR
was built, and the -M option is NOT specified, or
- A value less than 0 is specified for the -M option, and the absolute maximum
number of files was also set to a negative value when HTAR was built.

If the value specified for the -M option exceeds the absolute maximum value 
that was defined when HTAR was built, HTAR will issue a warning message, and
use the absolute maximum value.

-m      Uses the time of extraction as the modification time. The default
is to preserve the modification time of the files. Note that the modification
time of directories is not guaranteed to be preserved, since the operating
system may change the timestamp as the directory contents are changed 
by extracting other files and/or directories.  htar will explicitly
set the timestamp on directories that it extracts from the Archive, but
not on intermediate directories that are created during the process of
extracting files. 

-o      Provides backwards compatibility with older versions (non-AIX)
of the tar command. When this flag is used for reading, it causes
the extracted file to take on the User and Group ID (UID and GID)
of the user running the program, rather than those on the archive.
This is the default behavior for the ordinary user. If htar is being
run as root, use of this option causes files to be owned by root 
rather than the original user.

-p      Says to restore fields to their original modes, ignoring the present
umask. The setuid, setgid, and tacky bit permissions are also restored
to the user with root user authority.

-S bufsize      Specifies the buffer size to use when reading or writing
the HPSS tar file.  The buffer size can be specified as a value, or
as kilobytes by appending any of  "k","K","kb", or "KB" to the value.
It can also be specified as megabytes by appending any of  "m" or "M"
or "mb" or "MB" to the value, for example, 23mb.

-T Max Threads      Specifies the maximum number of threads to use when 
copying local member files to the Archive file.  The default is defined 
when htar is built; the release value is 15.  The maximum number of threads
actually used is dependent upon the local file sizes, and the size of the
I/O buffers.  A good approximation is usually 

   buffer size/average file size

If the -v or -V option is specified, then the maximum number of local file 
threads  used while writing the Archive file to HPSS is displayed when the 
transfer is complete.

-V      "Slightly verbose" mode. If selected, file transfer progress will
be displayed in interactive mode. This option should normally not be selected 
if verbose (-v) mode is enabled, as the outputs for the two different options
are generated by separate threads, and may be intermixed on the output.

-v      "Verbose" mode. For each file processed, displays a one-character 
operation flag, and lists the name of each file. The flag values displayed 
are:
    "a"  - file was added to the archive 
    "x"  - file was extracted from the archive
    "i"  - index file entry was created (Build Index operation)

-w      Displays the action to be taken, followed by the file name, and
then waits for user confirmation. If the response is affirmative,
the action is performed. If the response is not affirmative, the file
is ignored.

-Y auto | [Archive CosID][:IndexCosID] Specifies the HPSS Class of Service 
ID to use when creating a new Archive and/or Index file. If the keyword
"auto" is specified, then the HPSS "hints" mechanism is used to select
the archive COS, based upon file size.  If "-Y cosID"  is specified, then 
"cosID" is the numeric COS ID to be used for the Archive File.  
If  "-Y :IndexCosID" is specified, then "IndexCosID" is the numeric COS ID 
to be  used for the Index File. The default COS ID (or "auto") is a site-specific
option that is defined when HTAR is built. If both COS IDs are specified, the 
entire parameter must be specified as a single string with no embedded spaces, 
e.g. "-Y 40:30". This option may also be specified by the "HTAR_COS" environment 
variable. The environment variable is overridden by the -Y command line option,
if both are used.

HTAR Memory Restrictions
-------------------------
When writing to an HPSS archive, the tar command uses a temporary file
(normally in /tmp) and maintains in memory a table of files. You'll receive 
an error message if htar cannot create the temporary file, or if there 
is not enough memory available to hold the internal tables.

Authentication
---------------
HTAR uses DCE authentication in order to grant access to HPSS.
When you execute htar, the program will look for a .hsipw file in ~HOME and
use it if it is found. If it is not found,  you will be prompted for your 
DCE login and password.  if this is successful, a .hsipw file will be created in 
your home directory and used for subsequent executions of htar.


HTAR Execution Environment
---------------------------
At NERSC, HTAR is actually a wrapper script, which sets the
proper environment variables and then execs the htar executable.

HTAR makes use of the following HPSS environment variables,
if they are available:

HPSS_SERVER_HOST - contains the server hostname and optional
  port number of the HTAR server.

HPSS_HOSTNAME - contains the hostname or IP address of the
  network interface to which HPSS mover(s) should connect 
  when transferring data.  This is overridden by the file
  specified in the PFTP_CONFIG_FILENAME environment variable.
  The default interface is the one specified by the "hostname" 
  command.  Note that this is often a slow interface, such as
  the control ethernet on an IBM SP2.

HPSS_PATH_ETC - pathname of a local directory containing
  the HPSS network options file

PFTP_CONFIG_FILENAME - pathname of a file containing the list
  of HPSS network interfaces to be used 

HTAR also references the following non-HPSS environment variables:

TMPDIR - used when creating temporary files 
HOME   - used when searching for the network options file 
         (normally only used by HPSS system administrators).

Notes: 
-------
1. The maximum size of a single Member file within the Archive is 
approximately 8 GB, due to restrictions in the format of the
tar header.  HTAR does not impose any restriction on the size
of the Archive File when it is written to HPSS; however, space
quotas or other system restrictions may limit the size of the Archive 
File when it is written to a local file (-E option).

2.  HTAR will optionally write to a local file; however, it will
not write to any file type except "regular files".  In particular, it
is not suitable for writing to magnetic tape.  To write to a magnetic
tape device, use the "tar" or "cpio" utility.

Exit Status

This command returns the following exit values:

0       Successful completion.

>0      An error occurred.

Examples

1.      To write the file1 and file2 files to a new archive called
"files.tar" in the current HPSS home directory, enter:

htar -cf files.tar file1 file2

2.      To write the file1 and file2 files to a new archive called "files.tar" 
on a remote FTP server called "cashew.nersc.gov", creating the tar file
in the user's remote FTP home directory, enter:

htar -cf files.tar -F cashew.nersc.gov file1 file2

2.      To extract all files from the project1/src directory in the 
Archive file called proj1.tar, and use the time of extraction as the 
modification time,  enter:

htar -xm -f proj1.tar project1/src

3.      To display the names of the files in the out.tar archive file
within the HPSS home directory, enter:

htar -vtf out.tar

Files

/usr/local/bin/htar       Specifies the name of the htar wrapper script.

/usr/local/bin/htar.exe   Contains the htar executable.

/tmp/tar*       Specifies a temporary file.


Related Information

For file archivers: the cat command, dd command, pax command.
For HPSS file transfer programs: pftp, hsi


Bugs and Limitations: 
------------
- There is no way to specify relative Index file pathnames that are
  not rooted in the Archive file directory without specifying an
  absolute path.

- HTAR does not provide the ability to append, update or remove files. 


LBNL Home
Page last modified: Sat, 01 Mar 2008 00:57:45 GMT
Page URL: http://www.nersc.gov/nusers/resources/hpss/htarman.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science