Read/Write UNIX TAR and CPIO Format Files
Usage: tar [-acCMtxXyh] [-#!ADFjJLNpPqrRsSTvV] [-fQwWZ-] [-#n]
[-B blksize] [-Hon] [-Hoff] [-b sex] [-bL] [-bH]
[-d dir] [-E endset] [-ff] [-I include] [-m map]
[-O offset] [ tarfile ] [ file1 file2 ... ]
tar is used to read or write a simple archive format popular
for exchanging files between dissimilar machines.
tar normally expects the archive to be in a file specified by
the tarfile operand.
When adding files, the names are in the user's normal file
name space and wildcards can be used in the normal fashion.
When listing or extracting files, the file names that follow
are considered to be case-sensitive in the name space of
what's in the archive and must match the complete path
specified there. Full wildcarding is supported. For example,
tar -x myarchive.tar ".../*.[ch]"
would cause any .c or .h files anywhere in the archive to be
extracted. (The "..." construct matches any number of
directory levels and the "[ ]" construct matches any
character in the enclosed set.) Notice that if wildcards are
used, they should be enclosed in single or double quotes so
the C shell won't try expanding them before tar sees them.
Also, if want to specify a character that's normally a
wildcard as an ordinary character, you will need to "double-
escape" it. For example, to extract a file named
"mail[2008]", you would need to type:
tar -x myarchive.tar mail^^[2008^^]
to ensure that the escape character (even if it was inside
quotes) is actually passed through the C shell to tar.
When extracting files, this version of tar incorporates logic
to interactively crunch up a filename in the archive into
something legal on an NT filesystem. If -F is specified, FAT
naming rules are enforced. Otherwise, HPFS or NTFS rules
are assumed, meaning long filenames assumed to be legal.
Any renamings will be listed in a .map file.
When reading an archive, this version of tar automatically
detects whether it was written in CPIO or TAR format and
what bytesex was used.
tar also incorporates logic to automatically convert between
the \n line endings used in an archive and the \r\n line
endings used under NT unless the file appears to be
binary, based on its content. The environment variables
TARBINARY and TARASCII can also be used to specify sets of
files by name which are to be considered binary or ASCII,
respectively, regardless of content. Each of these variables
may contain a list of wildcards. If a filename or just the
tail of it (i.e., just the name + extension, leaving off the
preceding path) matches one of the wildcards in the list,
that file is considered to be of the specified type. If a
filename matches both lists or if it matches neither list,
the usual test based on file content will be made. Files
that receive line end conversions are highlighted in the
listings produced by tar in the ASCIICONVERT color for easy
review.
There is no limit on the overall length of an archive except
whatever limit may be imposed by the filesystem if the archive
is written to disk. The filesize limit for individual files
within an archive is determined by the archive format: for
tar archives, the limit is 8.4 million petabytes, essentially
unlimited; for CPIO binary and new portable CPIO archives, the
limit is 4G bytes; for CPIO ASCII archives, the limit is 8G
bytes. (But when using tar for interchange with other
systems, bear in mind that those other systems may impose
their own smaller limits.)
When adding files to an archive, timestamps outside the legal
range (January 1, 1970 to 337February 7, 2106) for a tar archive
will be truncated to these dates.
Tape Drives:
The tarfile can be the tape device, specified by its special
file name, \\.\tape0 (or \\.\tape1, \\.\tape2, etc.,
if you have more than one), or via the -# option.
When reading/writing to a tape, tar rewinds the tape when it
starts up and rewinds again and then ejects when it finishes
unless -N is specified.
Basic Commands:
-a Add files to the end of the archive. If the
archive is on a tape device, this operation may
not be possible, depending on whether your drive
supports repositioning and rewriting the last
physical block on the tape. For example, it
works with DAT drives but not with QIC drives.
If -a does not work with your drive, you'll have
to use -c instead.
-c Create a new archive, truncating any existing
archive to zero bytes before writing to it.
-C Copy entire archive segments (including headers and
and any padding) to stdout. After the last
segment, write a trailer to mark the end of the
archive. (If you intend to concatenate archives,
use the -Z option to suppress writing the trailer.)
-M Just build a mapfile for renaming files in the
archive to NT conventions; don't extract
anything.
-t List the contents of the archive. This is the
default.
-x Extract files from the archive. Default is all
files in the archive.
-X Extract everything EXCEPT the specified files from
the archive.
-y Extract the specified files in the archive to
stdout.
-h Help. (This screen.)
Basic Options:
-# Use the default tape device, \\.\tape0.
-#n Use the n-th tape device, where n is a single
decimal digit. For example, -#1 means tar should
use \\.\tape1.
-! Non-interactive. Files are renamed as necessary
for NT conventions. (Particularly useful
with -M when trying to read a new, large archive
file.)
-A The Archive bit is reset for any files or direct-
ories copied to a TAR or CPIO archive file. (When
extracting files, the -A option is ignored and the
Archive bit is always set.)
-B blksize Use the specified blocksize when creating a new
archive. Default is 10240 bytes if supported
by the device. When reading or adding to an
existing archive on tape, tar tries to determine
and use whatever blocksize was used when the
archive was created. How it does that depends
what release of Windows NT you're running and
whether your drive supports variable blocksizes.
If you're running NT 3.51 or later and variable
blocksizes are supported, this option is ignored
and the actual blocksize is determined directly
using variable blocksize support. Otherwise,
tar first tries this specified blocksize; if that
doesn't work, it tries all the possible multiples
of 512 bytes up to the maximum supported on your
machine.
-D Dim. Don't insert ANSI escape sequences into the
output to highlight anything.
-F FAT filesystem naming when extracting or building
the map file.
-Hon Hardware compression on, if supported. (Default is
to use the current setting for compression.)
-Hoff Hardware compression off.
-j New portable System V CPIO ASCII format.
-J New portable System V CPIO ASCII format with
checksum.
-L Long listing similar to ls -L showing the attri-
butes, timestamp and length of each file in the
archive.
-N No rewind or eject. If the tarfile is on a tape
device, tar normally rewinds the tape at the start
and then rewinds and ejects at the end. This
option turns that off.
-p CPIO format, using binary headers.
-P CPIO format, using ASCII headers.
-q Quiet. tar normally prints the header of each
file as it's extracted (-x) or added (-a or -c) to
the archive. This option turns that off.
-r CarriageReturn/NewLine expansion is turned off.
(Default is normally to convert any \n characters
not preceded by a \r in the archive to \r\n
combinations under NT unless the file
appears to be binary.)
-R CarriageReturn/NewLine expansion is forced ON, even
for files that appear to be binary.
-s Read the archive from stdin when listing the table
of contents or extracting. Write the archive to
stdout when adding files. (Implies non-inter-
active.)
-S Stop if a file is encountered that cannot be
extracted. Normally, a warning message is given
but processing continues.
-T Total the sizes of all selected files.
-v Verbose. Like -L, but also show the offset of each
file from the beginning of the archive and what
archive format and bytesex was used. Also turns
on warnings about line-end conversions being turned
off on binary files.
-V Don't use variable block I/O even if the drive
claims it supports it. Useful as a workaround
if your drive's firmware has a bug.
-- End of options.
Advanced Options:
-b sex Byte sex in the archive: abcd (little-endian),
badc (big-endian), cdab or dcba. Default is to
autosense bytesex in existing archives and to use
abcd for new archives.
-bL Little-Endian bytesex. (An alias for -b abcd.)
-bB Big-Endian bytesex. (An alias for -b badc.)
Note: To write an archive intended to be read
on a RISC or Motorola-based UNIX machine,
use -b badc or -bB (big-endian).
-d dir Default destination drive and directory when
extracting files.
-E endset Offset at which to stop reading the archive file.
-f Fullpath option. Put the full pathname (minus any
disk prefix) specified on the command line into the
archive header when adding. (In this context, the
full path means the full name given on the command
line, not the fully-qualified name starting from
the root directory.) When extracting, use the full
pathname given in the header to determine where the
files will go.
-ff Another variation on the fullpath option that will
put the entire pathname, even including the drive
letter into the tar archive. The resulting name
isn't really legal in a tar file, but it's useful
for doing backups of several drives at once.
-I include Files to be added to or read from the archive are
specified in the include file. If the name of
the include file is given as "-", the names
will be read from stdin. If more than one -I
include file is given, the lists of names they
hold will be concatenated, one after another.
Any files specified on the command line will be
added onto the end.
-m map Specific filename to be used for showing mappings
from names in the archive to names used on
NT. (If -M is specified, but -m isn't
used to specify a name for the mapfile, the
default is to paste a .map extension onto the name
of the tar file; if -s is specified, i.e., the tar
file doesn't have a name, no map file is used
unless -m is given.)
-O offset Offset at which to start reading the archive file.
Given in bytes from beginning of the file.
-Q Very Quiet. tar normally warns of any garbled
sections that it skipped; this turns off those
warnings also.
-w Share all files being copied to the archive for
read/write access by other processes. (Default
is to do that only with files already open by
another process.)
-W Warnings. Show just the files that can't be
extracted to NT because of their file
types.
(Shown in the FOREIGNFILES color.)
-Z Suppress writing the trailer normally written
following the last segment extracted from an
archive with the -C option. (Useful for
concatenating segments extracted from several
separate archives.)
Examples:
1. To list the contents of a tar file on tape, showing the
timestamps and sizes of the files:
tar -L \\.\tape0
2. To extract everything on the tape into the current
directory, again showing timestamps and sizes:
tar -xL \\.\tape0
3. To copy all the *.c files in the current directory to a
new tar tape, overwriting anything that may already be
on the tape, again showing timestamps and sizes:
tar -cL \\.\tape0 *.c
4. Same as (3), but write it in big-endian format, suitable
for a UNIX RISC machine:
tar -cLbB \\.\tape0 *.c
5. Same as (3), but adding files to an existing archive
on the tape rather than overwriting it:
tar -aL \\.\tape0 *.c
Note: Adding to an archive on tape isn't supported by
all types of tape drives. See the comments
regarding the -a operation above.
6. Extract everything on a tar-format floppy into the
current directory:
dskread a: | tar -xsL
7. Write all the *.c files in the current directory to a
tar-format floppy in big-endian format, verifying each
write operation along the way:
tar -csbB *.c | dskwrite -vx a:
TAR Format:
Tar files are organized as a series of 512-byte blocks.
Individual files always start on a block boundary with a
header block followed by the uncompressed data in the file.
At the end of the file are two blocks filled with binary
zeros. The header has the following format, packed with
individual fields byte-aligned:
typedef struct {
char name[100],
mode[8],
userid[8],
groupid[8],
filesize[12],
timestamp[12],
checksum[8],
linkflag,
linkname[100];
union {
char unused_chars[255];
struct {
char magic[6],
version[2]
username[32],
groupname[32],
devmajor[8],
devminor[8],
prefix[155];
} ustar;
} u;
} tar_header;
Traditionally, everything in a tar header is in ASCII with
nulls and spaces to punctuate the fields and numbers are
always in octal. But eleven octal digits (plus a space) in
the filesize field would only allow a maximum value of 8.59GB,
which is certainly smaller than may be supported on many
modern systems, including Windows. Thus, a popular extension
supported by this tar is to interpret numeric fields as
binary if the high bit is set in the first character.
The mode, user and group ids aren't meaningful on NT
and are ignored when extracting and just filled in with
read/write for owner, owned by root when adding. The
timestamp is in seconds since Jan 1 00:00:00 GMT 1970. The
checksum is calculated as if that field contained spaces.
The linkflag tells the file type, reported in the long listing
as one of the following:
- Normal File
D Directory
L Link (not a separate file, just another name
for one that already exists)
S Symbolic Link
C Character Device
B Block Device
F FIFO
Under NT, only the normal files and directories have
any meaning. Directories are normally highlighted. The other
file types are normally reported in bright red but otherwise
ignored.
The last 255 bytes may contain either all binary zeros or
the new "USTAR" trailer, used when the filename is longer
than 100 characters. In USTAR format, the magic field
contains the null-terminated string "ustar", the version
is "00" (without a null) and, if the prefix field is not
null, the actual pathname is formed by concatenating the
prefix + a slash + the name. If the prefix is null, the
name field is used alone.
When writing USTAR format, the username and groupname
are null, the devmajor is 0 and devminor is 1. When
reading USTAR format, all the fields except the prefix
are ignored.
If the filename is too long even in USTAR format, tar will
use the GNU extension convention of writing a special prefix
consisting of a header marked with a special linkflag
indicating that the data which follows is the full name of
the next file in the archive.
CPIO Format:
If -p is specified, tar will read and write CPIO format files,
using binary headers of the following format:
typedef struct {
short magic, /* Always 0x71c7 == Octal 070707 */
dev; /* Device containing directory
entry for this file. */
ushort inode, /* UNIX inode number. */
mode,
userid,
groupid,
nlink,
rdev; /* Device ID for special files. */
ulong timestamp;
ushort namelen; /* including trailing null. */
ulong filesize;
char name[ namelen rounded to word ];
} cpio_header;
The dev, inode, mode, userid, groupid, nlink and rdev fields
are not meaningful on NT and are ignored when
extracting and filled in with 1, 1, read/write by owner,
0, 0, 1 and 0, respectively, when writing.
If -P is specified, tar will read and write CPIO format files
using the alternate ASCII format headers, where each ushort is
written as a 6-character octal number, each ulong as an 11-
character octal number, and name is null-terminated.
In a CPIO file, data immediately follows the header and is not
padded to a block boundary.
Portable CPIO ASCII Format:
If -p or -P is specified, tar will read and write archives
using headers defined by the portable CPIO ASCII format
introduced with UNIX System V:
typedef char long_hex[8]
typedef struct new_cpio_ascii_header_str {
char magic[6];
long_hex inode,
mode,
userid,
groupid,
nlink,
timestamp,
filesize,
dev_major,
dev_minor,
rdev_major,
rdev_minor,
namelen, /* including trailing null. */
checksum;
char name[ namelen+1 ];
} new_cpio_ascii_header;
The magic field always contains either "070701" (normal) or
"070702" (checksum variation.) The inode, mode, userid,
groupid, nlink, dev and rdev fields are not meaningful on
NT and are ignored when extracting and filled in with
1, read/write by owner, 0, 0, 1, 1, and 0, respectively, when
writing.
The header, including the filename, is padded to the next
DWORD (4-byte) boundary. The data section immediately follows
and is also padded to the next DWORD boundary.
The difference between the -p and -P options is that if -P
is specifed, the checksum field is filled in with a simple
32-bit sum of all the bytes in the file, each taken as an
unsigned 8-bit value.
Colors:
You may set your own choices for screen colors using these
environmental variables:
Name Use Default
ASCIICONVERT ASCII files receiving line Bright Yellow
end conversion
COLORS Normal screen colors <null string>
DIRECTORIES Directories Bright
FOREIGNFILES Filetypes not supported by NT Bright Red
READONLYDIRS Directories marked read-only same as DIRECTORIES
READONLYFILES Files marked read-only same as COLORS
Colors recognized are black, red, green, yellow, blue, magenta
(or red blue), cyan (or blue green) or white. Foreground and
background colors may also be bright, dim or reverse. The
names of the colors and the words bright, dim, reverse and on
may be in either upper or lower or mixed case.
Either or both the foreground and background colors may be
specified; if you don't specify a value, it's considered
transparent and inherits the color underneath it. DIRECTORIES,
FOREIGNFILES and ASCIICONVERT inherit from COLORS. If COLORS
is null, tar uses the current screen colors it finds at startup.
Specifying COLORS=none turns off all use of COLOR.
If the -D (dim) option is specified, all highlighting is
turned off, regardless of the settings for these environment
variables.
|