http://invisible-island.net/autoconf/
Copyright © 2014–2015,2016 by Thomas E. Dickey


TAR versus Portability

Standards

There is no standard version of the tar program. This may surprise some, who either assume that because it is available “everywhere” or have read comments to the contrary, suppose it must be a standard. In POSIX (since 2001), the equivalent of tar is the pax program. As noted in the rationale:

The pax utility was new for the ISO POSIX-2:1993 standard. It represents a peaceful compromise between advocates of the historical tar and cpio utilities.

Arnold Robbins (in Unix in a Nutshell) gives more detail:

pax [options] [patterns]

Portable Archive Exchange program. When members of the first POSIX 1003.2 working group could not standardize on either tar or cpio, they invented this program. (See also cpio and tar.)

GNU/Linux and Max OS X use almost identical versions of pax, developed by the OpenBSD team, based on the original freely available version by Keith Muller.

I used (and preferred) cpio starting in early 1986, when I wrote sccs_tools. My project had about twenty tape cartridges storing snapshots of the project's sources. The cpio program was used for writing and reading those tapes. I continued to use cpio for my own backups when I started development with Linux early in 1994. Here is a fragment from my backup script from May 1994:

cpio --verbose --reset-access-time --format=ustar -B -o -O $DST

The nice thing about cpio was that it accepts a list of pathnames from its standard input. Sadly, cpio was not prevalent on the systems where I was developing at that time, and I began to rely upon tar. Unlike cpio, tar requires its pathnames to be given on the command-line, limiting its use of standard input/output to the actual data being processed. Aside from doing my backups, I also exchange data with others in tar-files. The compelling reason for using tar is that (unlike cpio) if I provide a tar-file to others, they are likely to have a program to read it.

Unlike cpio, there were several implementations of tar. Others have tabulated differences (I will not summarize those here).

Autoconf

Usually tar is just a “given”, used for distributing and receiving a set of files.

Portability

The lynx web browser uses tar in its menu of file operations. When I first started working with lynx, the program pathname (and options) were compiled-in, using constants. I modified this in stages:

The last step addressed the more common tar (or pax!) variants. Here is the configure check which I wrote:

dnl CF_TAR_OPTIONS version: 1 updated: 2004/01/26 20:58:41
dnl --------------
dnl This is just a list of the most common tar options, allowing for variants
dnl that can operate with the "-" standard input/output option.
AC_DEFUN([CF_TAR_OPTIONS],
[
case ifelse($1,,tar,$1) in
*pax)
        TAR_UP_OPTIONS="-w"
        TAR_DOWN_OPTIONS="-r"
        TAR_PIPE_OPTIONS=""
        TAR_FILE_OPTIONS="-f"
        ;;
*star)
        TAR_UP_OPTIONS="-c -f"
        TAR_DOWN_OPTIONS="-x -U -f"
        TAR_PIPE_OPTIONS="-"
        TAR_FILE_OPTIONS=""
        ;;
*tar)
        # FIXME: some versions of tar require, some don't allow the "-"
        TAR_UP_OPTIONS="-cf"
        TAR_DOWN_OPTIONS="-xf"
        TAR_PIPE_OPTIONS="-"
        TAR_FILE_OPTIONS=""
        ;;
esac
 
AC_SUBST(TAR_UP_OPTIONS)
AC_SUBST(TAR_DOWN_OPTIONS)
AC_SUBST(TAR_FILE_OPTIONS)
AC_SUBST(TAR_PIPE_OPTIONS)
])dnl

It supplements this chunk:

CF_PATH_PROG(TAR,   tar, pax gtar gnutar bsdtar star)
CF_TAR_OPTIONS($TAR)
AC_DEFINE_UNQUOTED(TAR_UP_OPTIONS,   "$TAR_UP_OPTIONS")
AC_DEFINE_UNQUOTED(TAR_DOWN_OPTIONS, "$TAR_DOWN_OPTIONS")
AC_DEFINE_UNQUOTED(TAR_FILE_OPTIONS, "$TAR_FILE_OPTIONS")
AC_DEFINE_UNQUOTED(TAR_PIPE_OPTIONS, "$TAR_PIPE_OPTIONS")

With these parameters of “tar” it was possible to rework some hardcoded command-lines to un-tar files which were downloaded by lynx, e.g., (and simplifying):

gzip -dc filename.tar.dc | $TAR_PATH $TAR_DOWN_OPTIONS $TAR_PIPE_OPTIONS

That worked well enough, but there were a few trouble-spots.

Option syntax

The configure check assumes too much about the option syntax, by basing the available options on the tar program name. It would be possible to improve on this by testing the program against known useful options.

File ownership

The check does not concern itself with the ownership of files which are extracted from the tar archive. Lynx disables setuid operation, but could be run by the root user.

A configure script cannot be counted on to run as root, and cannot test whether a tar program requires some special option to preserve file ownership.

SVR4 tar on AIX, HPUX, Solaris documents these options, with some variations.
I omit an unrelated paragraph from the “o” option for brevity:

o
When o is used for reading, it causes the extracted file to take on the user and group IDs of the user running the program rather than those on the tape. This is the default for the ordinary user and can be overridden, to the extent that system protections allow, by using the p function modifier.
p
Cause file to be restored to the original modes and ownerships written on the archive, if possible. This is the default for the superuser, and can be overridden by the o function modifier. If system protections prevent the ordinary user from executing chown(), the error is ignored, and the ownership is set to that of the restoring process (see chown(2)). The set-user-id, set-group-id, and sticky bit information are restored as allowed by the protections defined by chmod() if the chown() operation above succeeds.

The same options were documented in SunOS 4 tar (with fewer words, of course):

o      Suppress information specifying owner and modes  of  directories
       which  tar  normally  places  in  the archive.  Such information
       makes former versions of tar generate an error message like:

              filename/: cannot create

       when they encounter it.

p      Restore the named files to their original  modes,  ignoring  the
       present  umask(2V).   SetUID  and  sticky  information  are also
       extracted if you are the super-user.  This option is only useful
       with the x key letter.

Not all tar programs have made that distinction. In 1997, there was a thread on devel@XFree86.Org with this item:

Date: Sun, 13 Jul 1997 01:24:55 +1000
From: David Dawes <dawes@rf900.physics.usyd.edu.au>
To: devel@XFree86.Org
Subject: Extract utility (was: Re: missing 'p' flag for tar in RELNOTES)

On Fri, Jun 06, 1997 at 10:03:16PM +0200, Matthieu Herrb wrote:
>David Dawes wrote (in a message from Fri 6)
> >
> > Not all versions of tar require the 'p' flag for this.  Gnu tar for
> > example doesn't require this.  Neither does the 'tar' that comes with
> > Solaris 2.5 (in spite of what the man page implies).  Which tar does
> > OpenBSD use?
>
>A modified pax.
>
> > Is using OpenBSD's cpio a better option
> > (if it knows how to extract tar archives)?
>
>it's based on pax too, but it does preserve the file modes on
>extraction, so it's indeed better.
>
> > I'm more and more coming to the conclusion that we should provide an
> > 'extract' binary for each OS that people can use to unpack the .tgz
> > files in a reliable way.  I would currently see this as being say GNU
> > tar, with the --unlink flag that some BSD versions have added included
> > and enabled by default, and modified to use zlib to avoid the need for
> > a separate gzip binary.
>
>Yes. For example OpenBSD's pax based commands can't read some tarballs
>made by GNU tar.

I've done some work on this, and I have something which we can hopefully
use for 3.3.1.  It is gnu tar 1.12, with support added to make use of
zlib so that it is self-contained.  When run as "extract" it sets
the -x, -z and --unlink-first flags, and accepts multiple .tgz files
on the command line.  The -t flag can be used to override -x and list
the contents.  When run under any other name, it behaves like tar.

The code for this is available as utils-1.0.0.tgz in the beta directory.
Can those who build binary distributions please check that it compiles
and works OK.  If there are any problems, let me know.

Building it should only require running 'make' from the utils directory.

>That's the reason for which I didn't contribute back my buid-bindist
>scripts for 3.3. This has forced me to use one of the pax based
>commands. Unfortunatly none of them have the equivalent of the GNU tar
>'--exclude-from' option, so I had to build explicit lists of files to
>include in each tarball.

This binary can be used (under the name gnu-tar) to build the bindists.
In fact, it is probably best to use this one so that compatibility problems
are avoided.

David

Path length

Beyond the parameterization, lynx's extraction of files from an archive is simplistic, assumes no errors. In practice, that could fail for any of several reasons. But the most interesting one is due to tar-file format differences, e.g., in the way excessively long pathnames are stored.

Although POSIX documented (with pax) a scheme for storing long filenames in 1989, it was not until the mid-1990s before things started to settle out. Not everyone got on board at the same time.

For instance, Ant's documentation for the tar task says:

Early versions of tar did not support path lengths greater than 100 characters. Over time several incompatible extensions have been developed until a new POSIX standard was created that added so called PAX extension headers (as the pax utility first introduced them) that among another things addressed file names longer than 100 characters. All modern implementations of tar support PAX extension headers.

Ant's tar support predates the standard with PAX extension headers, it supports different dialects that can be enabled using the longfile attribute. If the longfile attribute is set to fail, any long paths will cause the tar task to fail. If the longfile attribute is set to truncate, any long paths will be truncated to the 100 character maximum length prior to adding to the archive. If the value of the longfile attribute is set to omit then files containing long paths will be omitted from the archive. Either option ensures that the archive can be untarred by any compliant version of tar.

For more detailed information on Ant, see the documentation on The TAR package.

Tar variants

The interesting tar variants of course are those which I can inspect and compare their behavior at different points in time. That equates to saying that I can read the source code.

Legacy (Unix) tar

I have access to a few Unix systems for comparison (AIX 5-7, HPUX 11, Solaris 8-11). Because source is not generally available, there is not much to say.

Illumos (descendent of OpenSolaris) has tar source (and cpio source) in its Github repository.
Interesting enough, it started as 4.3 BSD tar:

/*
 * Copyright (c) 1988, 2010, Oracle and/or its affiliates. All rights reserved.
 * Copyright 2012 Milan Jurik. All rights reserved.
 * Copyright 2015 Joyent, Inc.
 */

/*      Copyright (c) 1983, 1984, 1985, 1986, 1987, 1988, 1989 AT&T */
/*        All Rights Reserved   */

/*      Copyright (c) 1987, 1988 Microsoft Corporation  */
/*        All Rights Reserved   */

/*
 * Portions of this source code were derived from Berkeley 4.3 BSD
 * under license from the Regents of the University of California.
 */

For what it's worth, the cpio source also uses BSD code and has similar copyrights:

/*
 * Copyright (c) 1988, 2010, Oracle and/or its affiliates. All rights reserved.
 * Copyright 2012 Milan Jurik. All rights reserved.
 * Copyright (c) 2012 Gary Mills
 */

/*      Copyright (c) 1983, 1984, 1985, 1986, 1987, 1988, 1989 AT&T */
/*      All Rights Reserved                                     */

/*
 * Portions of this source code were derived from Berkeley 4.3 BSD
 * under license from the Regents of the University of California.
 */

Early BSD tar

The earliest sources I have at hand for tar are

3BSD tar is more interesting than ansitar, because the latter works only for tapes, not files. Also, ansitar uses a different header format.

Some of the BSD source code was reportedly AT&T source code, but not apparent because AT&T neglected to mark their sources. In reading the BSD source for tar and its manual page, there is no copyright notice applied until 1986 (for the 4.3BSD source code) and 1990 (for the manual page). That is not AT&T:

/*
 * Copyright (c) 1980 Regents of the University of California.
 * All rights reserved.  The Berkeley software License Agreement
 * specifies the terms and conditions for redistribution.
 */

The successive releases from CSRG are clearly related (1980 through 1990).
A new implementation (part of pax) by Keith Muller was introduced after that (seen in 4.4BSD-Lite):

/*-
 * Copyright (c) 1992 Keith Muller.
 * Copyright (c) 1992, 1993
 *      The Regents of the University of California.  All rights reserved.
 *
 * This code is derived from software contributed to Berkeley by
 * Keith Muller of the University of California, San Diego.

Public Domain tar

Outside the BSD sources, there was another tar implementation. You may find a copy in DECUS as "posixtar" (July 9, 1987):

/*
 * A public domain tar(1) program.
 *
 * Written by John Gilmore, ihnp4!hoptoad!gnu, starting 25 Aug 85.
 *
 * @(#)tar.c 1.21 10/29/86 Public Domain - gnu
 */

This happens to be the same version that John Gilmore posted to mod.sources volume 7 as v07i088: Public-domain TAR program (1986/12/10).
It is likely the version fetched by Stallman as the basis for GNU tar.

A quick check indicates that Gilmore wrote this shortly after leaving Sun:

It is mentioned in the BACKLOG file for GNU tar 1.12:

 1. ....-..-.. John Gilmore: Re: I'm writing a public domain -tar-
 2. 1985-09-14 Richard M. Stallman: I'm writing a public domain -tar-
 3. 1985-12-03 John Gilmore: Re: tar
 4. ....-..-.. David C. Anderson: Re: tar
 5. 1986-10-31 John Gilmore: Re: wanted: a VMS program to write UNIX tar tapes
 6. 1986-12-22 Richard M. Stallman: I got the tar
 7. 1987-02-14 John Gilmore: Re: tar
 8. 1987-12-15 Brian Reid: (none)
 9. 1988-02-03 Jay Fenlason: (none)

Schilling refers to a version obtained from Sun Users Group as being the first that Gilmore published, and also leads the reader to believe that Gilmore did the work as an employee of Sun. For example:

The social background is: Star is maintained by me since 1982. Gtar started as PD-TAR/SUG-TAR from
John Gilmore (a Sun employee) in late 1986 and it was taken by Stallman in 1989. In the early 1990s, the
maintained changed frequently and in that time (1993) I first reported the problem. – schily Sep 5 at 9:47

The mod.sources volume 7 files are older, and there are significant differences:

 Makefile          |  107 +++====
 PORTING           |   45 !
 README            |   59 +!==
 TODO              |   76 ++-==
 buffer.c          |  712 +++++++++++++++++------===============================
 create.c          |  526 +-!======================================
 extract.c         |  407 +++++++-!=======================
 list.c            |  477 +!!================================
 port.c            |  431 ++++++++++++++++++++++!========
 port.h            |   19 =
 sugtar/diffarch.c |  319 ++++++++++++++++++++++++
 sugtar/open3.h    |   45 +++
 tar.1             |  185 +-=============
 tar.c             |  450 +-=================================
 tar.h             |  176 =============
 15 files changed, 1184 insertions(+), 102 deletions(-), 240 modifications(!), 2508 unchanged lines(=)

Likewise, the tie-in to Sun is weaker than stated by Schilling.

Reflecting on it, there are other problems with Schilling's statement. But aside from those I have commented on, there is no independent source of information which can be used to compare against Schilling's account. For each detail where there is another source of information, it differs.

Gilmore made a second posting of pdtar to comp.sources.unix volume12 v12i068: Public domain TAR (1987/11/29). One of the differences between the two postings was the addition of wildmat.c, which is present in GNU tar 1.09, indicating that this latter posting was used in the development of GNU tar. First, compare against the volume 7 posting:

 Makefile            |  157 ++++++!===
 PORTING             |   57 +!
 README              |   54 !!
 TODO                |   69 +-==
 buffer.c            |  763 +++++++++++++++++++-----!==========================
 create.c            |  594 ++++++-!!!!!=============================
 extract.c           |  454 ++++++++++!!!!================
 list.c              |  507 +++!!!!===========================
 names.c             |  118 =======
 port.c              |  541 +++++++++++++++++++++++++++=========
 port.h              |   29 
 tar.1               |  215 +++!!!=======
 tar.5               |  217 ==============
 tar.c               |  496 ++++-=============================
 tar.h               |  180 ===========
 volume12/diffarch.c |  323 ++++++++++++++++++++++
 volume12/msd_dir.c  |  214 ++++++++++++++
 volume12/msd_dir.h  |   36 ++
 volume12/open3.h    |   50 +++
 volume12/wildmat.c  |  132 ++++++++
 20 files changed, 2028 insertions(+), 115 deletions(-), 428 modifications(!), 2635 unchanged lines(=)

Now, compare against the SUG version:

 Makefile                 |  157 +++!=====
 PORTING                  |   57 !!
 README                   |   59 !!=
 TODO                     |   64 !==
 buffer.c                 |  688 +++!!=========================================
 create.c                 |  599 +++++-!!!!!=============================
 diffarch.c               |  324 =====================
 extract.c                |  451 +++!!!========================
 list.c                   |  509 ++!!=============================
 names.c                  |  118 =======
 open3.h                  |   50 ==
 pdtar-volume12/msd_dir.c |  214 ++++++++++++++
 pdtar-volume12/msd_dir.h |   36 ++
 pdtar-volume12/wildmat.c |  132 +++++++++
 port.c                   |  541 +++++++=============================
 port.h                   |   29 
 tar.1                    |  215 ++!===========
 tar.5                    |  217 ==============
 tar.c                    |  492 +++!=============================
 tar.h                    |  180 ===========
 20 files changed, 872 insertions(+), 41 deletions(-), 343 modifications(!), 3876 unchanged lines(=)

Considering the numbers, it seems that the SUG version is about midway between the Usenet postings for volume 7 and volume 12.

PAX and USTAR

Here, pax is mainly of interest because it implements the USTAR (Unix standard tar format), provided by modern implementations of tar.

The program itself was the result of a failure to agree on whether tar or cpio was the one to standardize, and as a result we have a program which does either. The newsgroup thread beginning with John S. Quarterman's posting tar vs. cpio to comp.std.unix on June 1, 1987 summarizes the different points of view.

According to Glen Fowler, the first “public implementation” of pax was written by Mark H. Colburn.
He posted it to comp.sources.unix as “Usenix/IEEE POSIX replacement for TAR and CPIO”
(volume 17, issues 74, 75, 76, 77, 78, and 79, date February 3, 1989).

The manual pages for pax on some Unix vendors attribute pax to Mark H. Colburn:

but not others:

While there was early discussion (in 1990) for Minix to use Colburn's pax, as of 2015 Minix manual pages list only tar (no pax). This is apparently BSD tar (based on bulk import from NetBSD).

Later implementations include

While there are a few exceptions, e.g., Linux From Scratch which uses Gunnar Ritter's version,
most BSD- and Linux-systems provide the implementation by Keith Muller:

The Austin Group has a credits page where they mention Gunnar Ritter's Heirloom Toolkit.
It also refers to Schilling's pax, although the latter appears to be an error:

Working with the Open Source community

The group includes developers from the Open Source community. As part of acknowledging their valuable input the copyright holders have made several grants relating to use of the documentation in those projects. Some of these are listed: the Linux Man Pages project, the FreeBSD project, the NetBSD operating system, the Cygwin Project, Gunnar Ritter's Heirloom Toolkit and other tools, Joerg Schilling's pax and find, Jens Schweikardt book, and the ISPRAS Linux testing project.

GNU tar

For documentation on features, see the GNU tar manual. The manual's notion of history is in terms of random notes about features.

The mail in early 1988 from Jay Fenlason is a hint to when he began work on GNU tar. His progress was reported in successive GNU bulletins:

The earliest versions of GNU tar do not appear to be online. The earliest which you may find are (modified) versions 1.09 for MSDOS:

The two are the same, except that the FreeDOS files contain some additional DOS-specific files written by Kai Uwe Rommel to support direct disk access for OS/2 and DOS. Those would not have been incorporated into the GNU sources.

The accessible source-archives are not much help in researching its early history:

From the latter, the v1.09.tar.gz file is probably useful for comparisons. Comparing against Gilmore's second posting, you can see that GNU tar had grown somewhat (as well as discarding some pieces, such as the manual page in favor of the “texinfo” file):

 Makefile                     |  247 ++!===
 PORTING                      |   57 -
 README                       |   54 -
 TODO                         |   55 -
 buffer.c                     | 1352 +++++++++++++++++++++!!!!!!!==============
 create.c                     | 1276 ++++++++++++++++++++++!!================
 diffarch.c                   |  721 ++++++++++++-!!=======
 extract.c                    |  747 +++++++++!=============
 getoldopt.c                  |   89 ==
 list.c                       |  726 +++++++!==============
 msd_dir.c                    |  218 ======
 msd_dir.h                    |   41 =
 names.c                      |  135 ===
 open3.h                      |   69 
 paxutils-1.09/COPYING        |  249 +++++++
 paxutils-1.09/ChangeLog      |  636 ++++++++++++++++++++
 paxutils-1.09/getdate.y      |  882 ++++++++++++++++++++++++++++
 paxutils-1.09/getopt.c       |  596 ++++++++++++++++++
 paxutils-1.09/getopt.h       |  102 +++
 paxutils-1.09/getopt1.c      |  160 +++++
 paxutils-1.09/gnu.c          |  605 +++++++++++++++++++
 paxutils-1.09/mangle.c       |  226 +++++++
 paxutils-1.09/rmt.h          |   77 ++
 paxutils-1.09/rtape_lib.c    |  620 +++++++++++++++++++
 paxutils-1.09/rtape_server.c |  226 +++++++
 paxutils-1.09/tar.texinfo    | 1289 ++++++++++++++++++++++++++++++++++++++++
 paxutils-1.09/update.c       |  534 ++++++++++++++++
 paxutils-1.09/version.c      |   90 ++
 port.c                       | 1319 ++++++++++++++++++++++++=================
 port.h                       |   47 
 tar.1                        |  215 ------
 tar.5                        |  217 ------
 tar.c                        | 1225 +++++++++++++++++++++++!!!============
 tar.h                        |  297 +++-======
 wildmat.c                    |  151 ===
 35 files changed, 10391 insertions(+), 671 deletions(-), 786 modifications(!), 3702 unchanged lines(=)

The change-logs for GNU tar are helpful, since only a half-dozen people have done a significant number of commits to its source archives. Using the script which I wrote for counting changelogs, here are the percentages for developers with at least one percent of the total:

Percent Name
2.8 David J MacKenzie
20.8 François Pinard
1.5 Jay Fenlason
2.9 Michael I Bushnell
28.5 Paul Eggert
1.3 Pavel Raiskup
37.3 Sergey Poznyakoff
4.9 “other”

GNU tar releases were not at uniform intervals, but it is still useful to see how the contributions break down by time:

Version Date DJM FP JF MIB PE PR SP
1.28 2014-07-27         11.7 5.5 76.6
1.27 2013-10-05         30.5 16.2 43.8
1.26 2011-03-12         72.1   25.6
1.25 2010-11-07         46.9   53.1
1.24 2010-10-24         76.5   23.5
1.23 2010-03-10             92.3
1.22 2009-03-09             100.0
1.21 2008-12-27             96.8
1.20 2008-05-05         8.1   87.8
1.19 2007-10-10             94.4
1.18 2007-06-29         6.7   56.7
1.17 2007-06-08         31.0   68.1
1.16 2006-10-21         19.0   78.6
1.15 2004-12-20         10.0   88.7
1.14 2004-05-11         61.6   33.1
1.13 1997-07-08 9.8 61.0 4.7 9.0 11.5    
1.12 1997-04-25   100.0          
1.11 1992-09-09 66.9     32.4      
1.10 1991-07-01 10.0   25.0 56.0      
1.09 1990-10-16 18.6   78.0        
1.08 1990-01-26 34.4   29.7        
1.07 1989-01-26     100.0        

Schilly tar

Schilly tar (sometimes referred to as "star") was first published at the end of April 1997. It had not been published anywhere before that date.

For instance, Schilling commented in comp.unix.solaris 12/9/1996:

In article <5892f4$2...@news.Informatik.Uni-Oldenburg.DE>,

Christian Kuehnke <Christia...@arbi.Informatik.Uni-Oldenburg.DE> wrote:
>
>j...@cs.tu-berlin.de (Joerg Schilling) writes:
>> For a tar implementation that has no known bugs, will read all
>> (currently except HP-UX) tar streams and is the fastest implementation
   ^^^^^^^^^^^^^^^^^^^^^^ if they contain device files

>> at all (faster than ufsdump) look at:
>>
>> ftp://ftp.fokus.gmd.de/pub/unix/star
>>
>> for the rest of the goods.
>
>Nice. But why don't you provide the source?

I always intended to provide star in source.

There are some reasons, why I din't do this up to now:

1) I dont want star to go the same way as gnu tar

You remember...
Gnu tar has been first written in August 1985 by John Gilmore,ihnp4!hoptoad!gnu.
It has been brought to the public at the Sun User Group meeting in december 1987
in San Jose as 'sugtar'. This version was really nice.
The actual version has been ported to death.

For this reason, I want to have star in my hands until I know the line for
portability to other systems is clear.

Star has been first written in 1982 by me. The main growth in functionality
did come in May 1985. Although star has been designed to be very portable,
id did run only on UNOS, SYSVr0-2, SunOS and Solaris. The major porting effort
has been taken in 1994. It now runs on

SunOS, Solaris, HP-UX, IRIX, Linux, DG/UX, AIX

2) Makefile system

In May 1996 I made a makefile sytstem that allows simultaneous compilation
on all supported platforms. This still needs some fine tuning until it may
do the way to the public.

I expect star to be available in souce in January 1997.

Joerg

P.S.        Star has been ported to DG/UX with the help of Data General.
        It will soon be available on Data General systems as a fast backup.

PP.S.         GMD in Birlinghoven currently switches from 2MB/s X.25 to
        34 MB/s ATM. For this reason our ftp server may not be reacheable
        from outside germany until the mid of the next week.

--
EMail:        jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
        j...@cs.tu-berlin.de                  (uni)  If you don't have iso-8859-1
        j...@fokus.gmd.de                  (work) chars my name is
URL:        http://www.fokus.gmd.de/usr/schilling    J"org Schilling

The actual announcement at the end of April 1997 was much longer (and was cross-posted to 26 newsgroups):

Path: euryale.cc.adfa.oz.au!newshost.carno.net.au!harbinger.cc.monash.edu.au!news.mira.net.au!news.netspace.net.au!news.mel.connect.com.au!munnari.OZ.AU!news.Hawaii.Edu!news.caldera.com!news.eli.net!uunet!in1.uu.net!160.45.4.4!fu-berlin.de!cs.tu-berlin.de!js
From: js@cs.tu-berlin.de (Joerg Schilling)
Newsgroups: comp.unix.admin,comp.unix.misc,alt.os.linux,alt.sys.sun,bln.comp.sun,bln.comp.unix,comp.os.linux.development.apps,comp.os.linux.misc,comp.sys.hp.apps,comp.sys.hp.misc,comp.sys.sgi.admin,comp.sys.sgi.apps,comp.sys.sgi.misc,comp.sys.sun.admin,comp.sys.sun.apps,comp.sys.sun.misc,comp.unix.aix,comp.unix.bsd.freebsd.misc,comp.unix.solaris,de.comp.os.linux.misc,de.comp.os.unix,linux.dev.admin,linux.dev.apps,maus.os.linux,maus.os.linux68k,maus.os.unix,uk.comp.os.linux
Subject: STAR (tape archiver) source code released
Date: 30 Apr 1997 10:57:06 GMT
Organization: Technical University of Berlin, Germany
Lines: 108
Distribution: inet
Message-ID: <5k78i2$fht$1@news.cs.tu-berlin.de>
NNTP-Posting-Host: 130.149.25.72
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Summary: Star is a fast and Posix compliant tape archiver
Xref: euryale.cc.adfa.oz.au comp.unix.admin:57542 comp.unix.misc:29041 alt.os.linux:20710 alt.sys.sun:11011 comp.os.linux.development.apps:32517 comp.os.linux.misc:172775 comp.sys.hp.apps:6817 comp.sys.hp.misc:11233 comp.sys.sgi.admin:46008 comp.sys.sgi.apps:14655 comp.sys.sgi.misc:30303 comp.sys.sun.admin:86045 comp.sys.sun.apps:15307 comp.sys.sun.misc:29517 comp.unix.aix:98941 comp.unix.bsd.freebsd.misc:40018 comp.unix.solaris:105077 de.comp.os.unix:409


Star, the fastest tar archiver for UNIX is now available in source.

Star has many improvements compared to other tar
imlementations (including gnu tar). See below for a short description
of the highlight of star.

Star is located on:

ftp://ftp.fokus.gmd.de/pub/unix/star

Revision history (short)

1982    First version on UNOS (extract only)
1985    Port to UNIX (fully funtional version)
1985    Added pre Posix method of handling special files/devices
1986    First experiments with fifo as external process.
1993    Remote tape access
1993    diff option
1994    Fifo with shared memory integrated into star
1994    Very long filenames and sparse files
1994    Gnutar and Ustar(Posix) handling added
1994    Xstar format (extended Posix) defined and introduced
1995    Ported to many platforms

Supported platforms:

SunOS Solaris Linux HP-UX DG/UX IRIX AIX FreeBSD

Joerg

-------------------------------------------------------------
Star is the fastest known implementation of a tar archiver.
Star is able to make backups with more than 12MB/s if the
disk and tape drive support such a speed. This is more than
double the speed that ufsdump will get.
Ampex got 13.5 MB/s with their new DLT tape drive.
Ufsdump got a maximum speed of about 6MB/s with the same hardware.

Star development started 1982, development is still in progress.
The current version of star is stable and
I never did my backups with other tools than star.

Its main advantages over other tar implementations are:

        fifo                    - keeps the tape streaming.
                                  This gives you faster backups than
                                  you can achieve with ufsdump, if the
                                  size of the filesystem is > 1 GByte.

        pattern matcher         - for a convenient user interface
                                  (see manual page for more details).
                                  To archive/extract a subset of files.

        sophisticated diff      - user tailorable interface for comparing
                                  tar archives against file trees
                                  This is one of the most interesting parts
                                  of the star implementation.

        no namelen limitation   - Pathnames up to 1024 Bytes may be archived.
                                  (The same limitation applies to linknames)
                                  This limit may be expanded in future
                                  without changing the method to record long names.

        deals with all 3 times  - stores/restores all 3 times of a file
                                  (even creation time)
                                  may reset access time after doing backup

        does not clobber files  - more recent copies on disk will not be
                                  clobbered from tape
                                  This may be the main advantage over other
                                  tar implementations. This allows
                                  automatically repairing of corruptions
                                  after a crash & fsck (Check for differences
                                  after doing this with the diff option).

        automatic byte swap     - star automatically detects swapped archives
                                  and transparently reads them the right way

        automatic format detect - star automatically detects several common
                                  archive formats and adopts to them.
                                  Supported archive types are:
                                  Old tar, gnu tar, ansi tar, star.

        fully ansi compatible   - Star is fully ANSI/Posix 1003.1 compatible.
                                  See README.otherbugs for a complete description
                                  of bugs found in other tar implementations.

This is the first source release of star that I put on the net.

Have a look at the manual page, it is included in the distribution.

Author:

Joerg Schilling
Seestr. 110
D-13353 Berlin
Germany

Email:  joerg@schily.isdn.cs.tu-berlin.de, js@cs.tu-berlin.de
        schilling@fokus.gmd.de

Please mail bugs and suggestions to me.
--
EMail:  joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
        js@cs.tu-berlin.de                (uni)  If you don't have iso-8859-1
        jes@fokus.gmd.de                  (work) chars my name is
URL:    http://www.fokus.gmd.de/usr/schilling    J"org Schilling

There are a few points which the reader may not have noticed:

Lists of obscure features (such as "Ustar") get little attention.
Numbers are what get readers' attention.
Here is one (cited in the usual source of misinformation), from Unix Backup and Recovery by W. Curtis Preston, O'Reilly, 1999:

A Really Fast tar Utility: star

The star utility is the fastest known implementation of tar. It has been tested at speeds exceeding 14 MB/s. (This is more than double the speed that dump gets.) star development started in 1982 and is still in progress. star's main advantages over other tar implementations are:

FIFO

This is a “double-buffering” system that keeps the tape streaming. This gives you faster backups than you can achieve with dump, if the size of the filesystem is > 1GB.

Sophisticated diff

It has a user-tailorable interface for comparing tar archives against file trees.

Longer pathname length

You may archive pathnames up to 1024 bytes, as you can with dump.

Does not clobber files

More recent copies on disk will not be clobbered from the backup volume. This may be the main advantage over other tar implementations. This allows automatic repair of a corrupted filesystem. (You can check for differences after doing this with the diff option.)

Automatic byte swap

star automatically detects swapped archives and transparently reads them the right way.

star is available from ftp://ftp.fokus.gmd.de/pub/unix/star.

Both the Schily tar release announcement and Preston's summary are quoted here to make it simpler for the reader to observe how the summary in the book is based on the release announcement. Preston made some adjustments:

The telling point is that Preston did not add a paragraph or two detailing how the performance was measured.

By the way, star (Schily tar) is not mentioned in the revised edition Backup & Recovery: Inexpensive Backup Solutions for Open Systems (2007). Instead, Preston says (page 106):

Use GNU tar if You Can

GNU tar is an extremely popular utility. Beside being able to read an archive written by any other version of tar, it adds a significant level of functionality. Here are some of its most popular advancements:

BSD tar

Finally (perhaps not the last work on the topic), is bsdtar, built upon libarchive (originally in Tim Kientzle's webpage).

My involvement

Tool fixes

From October 1992, though May 2005, I worked initially to collect useful development tools, for use by myself and other developers. I also provided fixes and feedback (e.g., cproto, mawk, vile). After a few years I was involved in development of these tools, to follow up on the fixes I had made, and became more selective about which to become involved with.

I gauged program quality by compiling candidates with gcc compiler warnings turned on, as well as doing test-builds with Unix compilers. For example, I used this script:

#!/bin/sh
# these are my normal development-options
OPTS="-Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion"
gcc $OPTS "$@"

That made it simpler:

For instance, that was what I had in mind when I sent mail to Paul Eggert in 1993, suggesting improvements to rcs. The discussion was inconclusive. A few years later, I read his Usenet postings (such as Re: Reverse function for gmtime()? ), with interest.

Still later (probably 1997 or 1998, though I am unable to locate it via Google), I was interested to note an exchange between Eggert and Schilling. Schilling was accusing Eggert of having deliberately implemented long-name support in GNU tar in a way designed to make it incompatible with POSIX. Schilling, of course, phrased his remarks in a more emphatic manner than I report here.

I examined the GNU tar source and read its change-log. According to that (reading it again):

I followed up by downloading a copy of Schilling's program. Of course, I screened it for compiler warnings. It was "in between" which calls for a collaborative effort. However viewing the episode with Eggert, it was obvious that Schilling was no improvement in comparison to Eric Raymond. Attempting to collaborate with Schilling would be comparable to Sindbad's adopting the Old Man of the Sea for a traveling companion.

So I deleted it.

Ncurses

I had occasion to revisit long filenames with tar for ncurses. Juergen Pfeifer added several filenames for the Ada95 binding which were long. That was because they (like Java class names versus filename), had to match package names which were long.

Despite my qualms, this was not initially a problem with tar. Later, that changed, and since problem reports were not frequent, it took a while to notice and address the problem. Here are a few mail interchanges to illustrate.

I tried untar'ing a file on ClarkNet's Solaris machine:

From florian@suse.de Sat Apr  3 01:19:48 1999
Received: from smtp-gw.vma.verio.net (smtp-gw.vma.verio.net [207.97.20.30])
        by loas.clark.net (8.8.8/8.8.8) with ESMTP id BAA29138
        for <dickey@clark.net>; Sat, 3 Apr 1999 01:19:48 -0500 (EST)
Received: from Cantor.suse.de (Cantor.suse.de [194.112.123.193])
        by smtp-gw.vma.verio.net (8.9.3/8.9.3) with ESMTP id BAA15725
        for <dickey@clark.net>; Sat, 3 Apr 1999 01:20:03 -0500 (EST)
Received: from Galois.suse.de (Galois.suse.de [194.112.123.130])
        by Cantor.suse.de (Postfix) with ESMTP id F084632CE2
        for <dickey@clark.net>; Sat, 03 Apr 1999 08:19:13 +0200 (MEST)
Received: from knorke.saar.de (knorke.suse.de [10.0.0.254])
        by Galois.suse.de (Postfix) with ESMTP id CDE529410
        for <dickey@clark.net>; Sat,  3 Apr 1999 08:19:12 +0200 (MEST)
Received: (from florian@localhost)
        by knorke.saar.de (8.8.8/8.8.8) id IAA08351
        for dickey@clark.net; Sat, 3 Apr 1999 08:19:12 +0200
From: Florian La Roche <florian@suse.de>
Date: Sat, 3 Apr 1999 08:19:12 +0200
To: dickey@clark.net
Subject: Re: progress?
Message-ID: <19990403081912.A8309@knorke.saar.de>
References: <19990403004604.A7286@knorke.saar.de> <199904030226.VAA10981@shell.clark.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.4i
In-Reply-To: <199904030226.VAA10981@shell.clark.net>; from dickey@clark.net on Fri, Apr 02, 1999 at 09:26:25PM -0500
Sender: florian@knorke.saar.de

Status: RO
Content-Length: 618
Lines: 19

> close - but there's a problem.  I can see the contents, but I get a directory
> checksum error trying to untar it.  Here's what I get
>
> -rw-------   1 dickey   ipusers  1378639 Apr  2  1999 ncurses-5.0-beta1.tar.gz
>
> sum:
> 60558 2693 ncurses-5.0-beta1.tar.gz
>
> sum -r:
> 31196   2693 ncurses-5.0-beta1.tar.gz

knorke:~/source $ sum -r ncurses-5.0-beta1.tar.gz
31196  1347

I cannot reproduce any problem with that file. I have also tried to
unpack it on the GNU machine and didn't get any error.
Can you try it on a Linux machine? (At least with GNU tar to unpack it?)

Florian La Roche

It worked for Potorti, but neither of us knew what the @LongLink was:

From dickey Fri Jul 30 09:56:17 1999
Subject: Re: File mode specification error on a tar.gz file
To: F.Potorti@cnuce.cnr.it (Francesco Potorti` <F.Potorti@cnuce.cnr.it>)
Date: Fri, 30 Jul 1999 09:56:17 -0400 (EDT)
In-Reply-To: <m11ABia-001i1aC@fly.cnuce.cnr.it>  from "Francesco Potorti` <F.Potorti@cnuce.cnr.it>" at Jul 30, 99 02:24:56 pm
X-Mailer: ELM [version 2.4 PL25]
Content-Type: text
Status: RO
Content-Length: 1067
Lines: 35

>
> emacs 20.4
>
> Download http://www.clark.net/pub/dickey/ncurses/ncurses.tar.gz and put
> it in your current directory.
>
> emacs -q
> M-x auto-compression-mode RET
> C-x d RET
> go to the ncurses.tar.gz line
> RET
> --> unzipping ncurses.tar.gz...done
>     Parsing tar file...done
>     File mode specification error: (wrong-type-argument integerp nil)
>
> The likely reason is that gnu tar 1.12, when run in listing mode over
> that archive, outputs one line like this:
>
> Lr--r--r-- root/root 103 1999-06-15 03:03 ././@LongLink unknown file type `L'

hmm (I have had occasional problems reading those tar files with non-GNU
tar, but not seen any thing that I can pinpoint).  I'll repack with Solaris
tar (which works, afaik).

-- did that, will see if I can identify the bogus 'L' entry.  thanks.

> Even after having read the tar docs I don't understand if that is normal
> or not.  Anyway, if possible, it would be nice for emacs to handle these
> errors.


--
Thomas E. Dickey
dickey@clark.net
http://www.clark.net/pub/dickey

The reason became clear after releasing ncurses 5.2 in 2000 with a few reports from people using Mac OS X and FreeBSD. Starting at that point, I changed my release process to use Solaris tar to create the release tar-balls for ncurses.

Alternatively, I could have used Schily tar. But I chose not to:

For more context,see

Using Solaris tar was only a stopgap fix. Fortunately, it turned out, on investigation, that only the development versions (with 8-digit year/month/day added to the pathname) produced pathnames long enough to pass the 100-character threshold.

The investigation was part of my check-list for ncurses6. Some of the results are interesting, hence this page.

Comparisons

I began collecting information for this investigation in 2014, creating an outline of this page.

Later, in June 2015, I built each of the GNU tar and Schily tar versions mentioned here using Debian 6 (gcc 4.4.5). I also wrote a test program to verify interoperability of the various tar formats with pathnames of different lengths.

In reviewing the initial results, I found that I should also include multiple versions of BSD tar, to comment on its influence vis-à-vis GNU and Schily tar.

Versions tested

In this study, I acquired for reference these versions of GNU tar:

For research, there is also a Git repository, but it is not very useful:

The earliest published version (1.07) cannot be found. I found it only mentioned as being on ccb.ucfs.edu in December 1989, and on prep.ai.mit.edu since May 1989.

I also obtained these versions of Schily tar from gd.tuwien.ac.at:

1.0, 1.1, 1.2, 1.3, 1.3.1, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.5, 1.5.1

There is no publicly-accessible source repository for Schily tar. There is a Mercurial repository for Schillix, which matches Illumos up to mid-2010 (when OpenSolaris ended, as reported by The Register), but Schily tar is not there, and comparing the two repositories, it is immediately apparent that Illumos is the ongoing reference implementation because its history continues well past that point, unlike Schillix.

For testing other programs, I used the packaged versions (mainly Debian 6–8 and Solaris 10):

Compression support

These tar implementations provide several features. For instance, the newer versions provide options for gzip (and other) compression. While I do use those features, they are not that important because they only simplify the use of compression, but do not enhance it, e.g., by making it faster.

Adding compression support to tar is simpler than it may seem, provided that all it does is run an external program. I did this for diffstat with little effort (initially in 2000, later adding a configure check in 2006, etc). Modifying a program to use compression libraries takes appreciably more effort.

Often, comments are made that the compression is “not really part of tar”, which may or may not be accurate:

Here is a table comparing the command-line support for compression in these tar implementations:

DateFormatProgramVersionFeature
1995-06compressGNU tar1.11.8-Z option
1995-06gzipGNU tar1.11.8-z option
2004-05bzip2GNU tar1.14-j option
2010-03xzGNU tar1.23-J option
2008-04compressSchily tar1.5-Z option
2002-05gzipSchily tar1.4-z option
2008-04bzip2Schily tar1.5-j,-bz options
2013-01xzSchily tar1.5.2-xz option
2010-03compressBSD tar2.8.3-Z option
2010-03gzipBSD tar2.8.3-z option
2010-03bzip2BSD tar2.8.3-j,-y options
2010-03xzBSD tar2.8.3-J option
2012-05compressSolaris tar5.11-Z option
2012-05gzipSolaris tar5.11-z option
2012-05bzip2Solaris tar5.11-j option
N/AxzSolaris tarN/AN/A
2010-11compressGNU tar1.25auto-sense
2004-12gzipGNU tar1.15auto-sense
2004-12bzip2GNU tar1.15auto-sense
2010-10xzGNU tar1.24auto-sense
2002-05compressSchily tar1.4auto-sense
2002-05gzipSchily tar1.4auto-sense
2002-05bzip2Schily tar1.4auto-sense
2013-01xzSchily tar1.5.2auto-sense
2010-03compressBSD tar2.8.3auto-sense
2010-03gzipBSD tar2.8.3auto-sense
2010-03bzip2BSD tar2.8.3auto-sense
2010-03xzBSD tar2.8.3auto-sense
2012-05compressSolaris tar5.11auto-sense
2012-05gzipSolaris tar5.11auto-sense
2012-05bzip2Solaris tar5.11auto-sense
N/AxzSolaris tarN/Aauto-sense

There are a few caveats:

To recap, tar compression is interesting to some extent, because it simplifies ad hoc commands involving tar. I do not use in the archive script, which I use for preparing tarballs to distribute to others. Rather, in the script, tar pipes to gzip (or bzip2).