http://invisible-island.net/
Copyright © 1996-2019,2022 by Thomas E. Dickey


DIFFSTAT – make histogram from diff-output

Synopsis

diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. It is useful for reviewing large, complex patch files.

History

I originally wrote this in 1992, along with an associated utility rcshist, to trace the change history of collections of files. Since then, I've found it most useful for summarizing source patches.

See the changelog for details:

Impact

Initially, I used diff and diffstat in a script named diff-patch. In 1994, I started using makepatch which gave more consistent results.

It was not until early 1996 that there was much attention by others to the tool. At that point, developers on both XFree86 and ncurses mailing lists started using it.

One of those developers (Tony Nugent) pointed it out to Linus Torvalds in July 1996, on linux.dev.kernel. Much later (in 2002), it was documented as part of the process for submitting Linux kernel patches for BitKeeper (BK) in Linux 2.4.20. Linus commented on the process:

Ok, pulled. But _please_ do this the regular way next time. There's even a script to help you do it in linux/Documentation/BK-usage/bk-mak-sum, which does it all for you for BK patches.

(many people end up doing their own thing, you don't have to use that particular script, of course. But the important thing I want is that the _email_ should contain enough information to make a good first pass judgement on what the patch does, and in particular it is important for me to see what a "bk pull" will actually change.)

That's why the "diffstat" is important to me if I do a BK pull – and why I want to see the patches as plaintext if I apply stuff to generic files..

Later, in 2005 Linus wrote git, which has the ability to generate a diffstat. There are some enhancements (git is able to track moves and renames of files).

Dependencies

Of course, I did not write diffstat as an isolated program. Rather, it provides a useful summary of the output of diff. That same output is typically processed by the patch program to apply changes to programs. Early on, this was the predominant method for distributing changes to programs. That was for two reasons:

For both of these reasons, I still provide diff's for the larger programs (in addition to complete sources):

Diff

Just “diff” requires some clarification. Early on, I had these issues in mind:

Other ...

While I used other systems (IBM, Univac) during the 1970s, the first where I recall a file-comparison program were the DEC systems where I developed (RT-11, TENEX or DEC System 10). The Utility Programs page by D A Duce is a useful summary for systems at the time—1976—noting (for DEC System 10):

A utility exists to compare files, and this may be used to print certain special types of file.

According to manuals and other summaries, the PDP-10 program was named “filcom” while RT-11 had “srccom”:

PDP-10's filcom and RT-11's srccom marked chunks of difference with asterisks, but did not number lines. Rather, they showed page numbers, and also used numbers 1 and 2 to refer to the two files.

Here is an example, from the assembly language manual:

.R FILECOM
*LPT:/4L=FILEA,FILEB

        FILE 1) DSK:FILEA              CREATED: 1456 17-JAN-1972
        FILE 2) DSK:FILEB              CREATED: 1456 17-JAN-1972

        1)1     FILE A
        1)      A
        1)      B
        1)      C
        1)      D
        1)      E
        1)      F
        1)      G
        ****
        2)1     FILE B
        2)      A
        2)      B
        2)      C
        2)      G
        **************
        1)1     K
        1)      L
        1)      M
        1)2     N
        ****
        2)1     1
        2)      2
        2)      3
        2)2     N
        **************
        1)2     W
        ****
        2)2     4
        2)      5
        2)      W
        **************

By the time I wrote diffstat in 1992, I had encountered examples such as this (from dead-systems) on VMS, showing line-numbers:

$ differences SYSTARTUP_VMS.COM SYSTARTUP_VMS.TEMPLATE
************
File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4
  360   $ @SYS$STARTUP:TCPIP$STARTUP.COM
  361   $!
******
File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1
  360   $!$ @SYS$STARTUP:TCPIP$STARTUP.COM
  361   $!
************
************
File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4
  408   $! Mounting additional disks
  409   $ MOUNT/SYSTEM DUA1 DATA1
  410   $ MOUNT/SYSTEM DUA2 DATA2
  411   $!
  412   $ EXIT
******
File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1
  408   $ EXIT
************

Number of difference sections found: 2
Number of difference records found: 5

DIFFERENCES /IGNORE=()/MERGED=1-
    SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4-
    SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1

The file-comparison on VAX/VMS was named “differences” rather than “filcom” largely because it was a redesign, discarding old cruft. Some of that hides quirks: the VMS command interpreter stored only 4 characters to make a unique command. Because it also permitted abbreviations, anything beginning "diff" would run the "differences" command.

Like the Unix command, the VMS file-comparison utility was designed with the notion that its output could be used to apply changes via another tool such as ed. The VMS equivalent is sumslp (originally slp), which was probably inspired by the Univac slp processor.

The fc program on MS-DOS was influenced by filcom. Infoworld (February 14, 1979) had a short article:

TEXT EDITOR FOR 8080 AND Z-80 DISK SYSTEMS AVAILABLE FROM MICROSOFT

Microsoft has announced the availability of Edit-80, a random access, line-oriented text editor for 8080 and Z-80 systems.

According to the company, Edit-80 features random line access to disk files, providing almost instantaneous access to any record of the file, even if the available memory is smaller than the file being edited.

In addition to the standard line commands to insert, delete, print or replace lines of text, the package offers such features as automatic line renumbering, global find and substitute, multiple page files and the ability to read files without line numbers. There is also an Alter Mode which allows editing portions of individual lines.

The edited files are not written to disk until a write command is given, and the original file is always saved as back-up.

The Edit-80 package includes a utility called Filcom, which compares source or binary files and outputs the differences between them. The editor runs on any 8080 or Z-80 system which uses the CP/M operating system. Price of the package is $120. The manual alone can be purchased for $10. Contact Microsoft, 819 Two Park Central Tower, Albuquerque, NM 87108.

The choice of name is not just a coincidence. Microsoft's 1980 catalog description shows the influence of DEC on their thinking:

EDIT-80 is a random access, line oriented text editor similar to those used on large computers like the DEC PDP-10 and IBM 360.

Roy Allen's A History of the Personal Computer mentions PDP-10s repeatedly.

In MS-DOS, the program was renamed “fc” and is still provided in Windows. It supports Unicode now. Any of the other documented options could have been in the initial version in 1979.

Here is the PDP-10 example using modern “fc”

Comparing files filea and FILEB
***** filea
C
D
E
F
G
***** FILEB
C
G
*****

***** filea
J
K
L
M
***** FILEB
J
1
2
3
*****

***** filea
V
W
***** FILEB
V
4
5
W
*****

There are still more file comparison programs which I had in mind. For instance

Original diff

Paul Heckel's paper A Technique for Isolating Differences Between Files (April 1978, CACM) described an algorithm which he stated he had used five years earlier. For comparison, Heckel cited filcom because it was a well-known implementation. He mentioned a few other algorithms in his paper (including Hunt and McIlroy An algorithm for differential file comparison, Computer Science Technical Report 41, Bell Telephone Labs, August 1976), but none had been used in a product such as filcom. Hunt and McIlroy's paper cited research results, described a prototype of diff and did not compare it with other implementations.

My interest here, is of course, on the output format used rather than nuances of the particular algorithm underneath. The original diff paper gave an example of its output which hints at the description in the Unix V6 manual page for diff:

0a1,1          1,1 d0
>w             <w
3,4 c 4,6      4,6 c 3,4
<c             <x
<d             <y
---            <z
>x             ---
>y             >c
>z             >d
6,7 d 7        7a6,7
<f             >f
<g             >g

Oddly enough, the manual page mentions everything about the output format except the use of brackets for the changed-lines.

The 3BSD source-code for /usr/src/cmd/diff.c (dated 1979, corresponding to Unix V7's diff) shows them used as “” and “” markers in the change() function. Although the manual page at that point mentioned the brackets, it did not mention the space which the program added after the brackets:

       Diff  tells  what lines must be changed in two files to bring them into
       agreement.  If file1 (file2) is `-', the standard input  is  used.   If
       file1 (file2) is a directory, then a file in that directory whose file-
       name is the same as the file-name of file2 (file1) is used.  The normal
       output contains lines of these forms:

            n1 a n3,n4
            n1,n2 d n3
            n1,n2 c n3,n4

       These lines resemble ed commands to convert file1 into file2.  The num-
       bers after the letters pertain to file2.  In fact,  by  exchanging  `a'
       for  `d'  and reading backward one may ascertain equally how to convert
       file2 into file1.  As in ed, identical pairs where n1 = n2 or n3  =  n4
       are abbreviated as a single number.

       Following  each  of these lines come all the lines that are affected in
       the first file flagged by `<', then all the lines that are affected  in
       the second file flagged by `>'.

That is the oldest version of diff for which I found source code (in the CSRG cd-images from myrnet.com). The comments in the code attributed the algorithm used to Harold Stone, and used backspace-sequences to underline keywords, e.g.,

*       The cleverness lies in routine stone_. This marches
*       through the lines of file0, developing a vector klist
*       of "k-candidates". At step i a k-candidate is a matched
*       pair of lines x,y (x in file0 y in file1) such that
*       there is a common subsequence of lenght k
*       between the first i lines of file0 and the first y
*       lines of file1, but there is no such subsequence for
*       any smaller y. x is the earliest possible mate to y
*       that occurs in such a subsequence.

Context diff

The 4.0BSD source code (1980) implemented context diff. The manual page described it:

       -c       produces a diff with lines of  context.   The  default  is  to
                present  3  lines of context and may be changed, e.g to 10, by
                -c10.  With -c the output format  is  modified  slightly:  the
                output beginning with identification of the files involved and
                their creation dates and then each change is  separated  by  a
                line  with  a  dozen  *'s.   The  lines removed from file1 are
                marked with `-'; those added to file2 are marked  `+'.   Lines
                which  are  changed  from  one file to the other are marked in
                both files with `!'.

It also added the recursive option:

     -r   causes application of diff recursively to common sub-
               directories encountered.

The only attribution in the code was for Harold Stone; the backspace-sequences were removed. Jonathan Gray has a Git repository of CSRG sources. For what it's worth, the initial check-in was by Bill Joy, but that source already implemented context diff (there is no useful source-history to determine who did what).

These changes were incorporated into Unix V8 (1985), as shown in the manual page.

Unified diff

Unified diff made a fourth format to consider (after ed-scripts, normal and context diffs).

Wayne Davison posted unidiff to comp.sources.misc in August 1990:

v14i070: Unified context diff tools

That posting appeared in volume 14, issue 70 (Davison apparently submitted this on August 22, the posting appeared early August 31, and it was archived September 6, 1990). From his announcement:

I've created a new context diff format that combines the old and new hunks into
one unified hunk.  The result?  The unified context diff, or "unidiff."

Posting your patch using a unidiff will usually cut its size down by around
25% (I've seen from 12% to 48%, depending on how many redundant context lines
are removed).  Even if the diffs are generated with only 2 lines of context,
the savings still average around 20%.

Keep in mind that *no information is lost* by the conversion process.  Only
the redundancy of having multiple identical context lines.  [...]

I've included:
   o    a patch to make gnudiff (v1.14) generate a unidiff.
   o    a patch to make patch (patchlevel 12) accept a unidiff.
   o    a versatile program called "unify" that can translate from a context
        diff (new- or old-style) into a unidiff, and from a unidiff into a
        true new-style context diff.
   o    a man page for unify.
   o    a 1.3k bandaid called "unipatch" that translates a unidiff into a
        context diff format that older versions of patch can understand.
        (It outputs a slightly degenerate form of a context diff (no '!'s)
        but it works great with patch.)
   o    a Makefile to get you going quickly.
--
 \  /| / /|\/ /| /(_)     Wayne Davison
(_)/ |/ /\|/ / |/  \      davison@dri.com
   (W   A  Y   N   e)     ...!uunet!drivax!davison

A motivation was to reduce the size of patches (the output of diff as used in the usenet source groups to distribute changes to a program.

Although few used “unidiff” as such, Davison's patches for GNU diff and Larry Wall's patch program were used:

Patch

Original patch

There were earlier programs named “patch” (for example, DEC's binary-patch program which I used with RT-11 in the mid-1970s). Here we are only concerned with the source-patching program starting with Larry Wall's version in the mid-1980s. He announced more than one version to net.sources:

On October 27, 1986, he announced version 2.0 (calling it “patch kit”) on mod.sources:

Although posted in October, the file check-in dates show September 17, 1986.
I found the posting and the twelve followup patches at

ftp://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/volume7/patch2/

The patches do not appear in the mod.sources or net.sources archives; I have established dates for those as I did for the other sources (by inspection), and reconstructed the different fully-patched versions. Two of the patches (7 and 12) do not apply cleanly:

Patch Date Notes
1 1986/10/29 problem with backward search
2 1986/10/29 problem with context diff
3 1986/11/03 fix for 4.3-style context diff
4 1986/11/29 realloc-fix, context-diff fix
5 1986/11/29 detecting diffs in a file
6 1987/01/06 incorrect free
7 1987/01/31 workaround mangled patches
8 1987/02/16 synchronization for short chunks
9 1987/06/04 incorrect free
10 1988/06/03 many fixes...
11 1988/06/03 new Configure script
12 1988/06/22 portability fixes

Unified patch

Besides being used in the usenet sources groups, Larry Wall's program was distributed as part of X11. The file sizes and dates indicate that there were ongoing improvements (data gleaned from the X distribution tarballs):

Release Date Diffstat Notes
Added Removed Modified Unchanged
net.sources 1984/11/09 N/A N/A N/A 1668 patch 1.1
net.sources 1984/11/29 217 12 182 1474 patch 1.2
mod.sources 1985/05/09 498 0 20 1600 patch 1.3
X11R1 1987/09/12 199 312 28 1778 patch 1.3, copyright 1984 by Larry Wall
X11R2 1987/12/31 4169 1328 224 453 patch kit 2.0 (patch level 9), copyright 1986 by Larry Wall
X11R3 1988/08/31 1278 203 520 3961 patch kit 2.0 (patch level 12), copyright 1988 by Larry Wall
X11R4 1988/08/30 0 1 2 5841  
X11R5 1988/08/30 0 0 0 5842  
X11R6 1993/05/28 1396 39 361 4736 added Wayne Davison's changes for unified diff
X11R6.1 1994/09/14 226 9 25 5924 Stephen Gildea added ifdef's for WIN32
X11R6.3 1994/09/14 0 0 0 7273  
X11R6.4 1994/09/14 0 0 0 7273  
X11R6.5.1 2000/08/21 0 0 20 7257 changed CVS identifier
X11R6.6 2000/08/17 0 0 0 7273  

That large change with X11R6 is the point of this section.

Because Larry Wall made no changes to patch after releasing patch 12 to version 2.0, David J. MacKenzie (who worked on GNU diff and patch) incorporated Davison's patch, and worked with Davison to make a followup fix:

Sun Jan 20 20:18:58 1991  David J. MacKenzie  (djm at geech.ai.mit.edu)

        * Makefile.SH (all): Don't make a dummy `all' file.

        * patchlevel.h: PATCHLEVEL 12u3.

        * patch.c (nextarg): New function.
        (get_some_switches): Use it, to prevent dereferencing a null
        pointer if an option that takes an arg is not given one (is last
        on the command line).  From Paul Eggert.

        * pch.c (another_hunk): Fix from Wayne Davison to recognize
        single-line hunks in unified diffs (with a single line number
        instead of a range).

        * inp.c (rev_in_string): Don't use `s' before defining it.  From
        Wayne Davison.

Mon Jan  7 06:25:11 1991  David J. MacKenzie  (djm at geech.ai.mit.edu)

        * patchlevel.h: PATCHLEVEL 12u2.

        * pch.c (intuit_diff_type): Recognize `+++' in diff headers, for
        unified diff format.  From unidiff patch 1.

Most of the changes (86%) listed in the X11R6 ChangeLog file for patch were from David J. MacKenzie. Wayne Davison's initial set of changes accounts for about 15 percent of that (177 lines added, 8 lines removed). At the same time, MacKenzie was the main developer working on GNU patch: 68% versus Paul Eggert with 26%. As MacKenzie noted in the README file for patch 2.1 (released June 11, 1993):

This version of patch contains modifications made by the Free Software
Foundation, summarized in the file ChangeLog.  Primarily they are to
support the unified context diff format that GNU diff can produce, to
support making GNU Emacs-style backup files, and to support the GNU
conventions for option parsing and configuring and compilation.  They
also include fixes for some bugs.

The FSF is distributing this version of patch independently because as
of this writing, Larry Wall has not released a new version of patch
since mid-1988.  I have heard that he has been too busy working on
other things, like Perl.

I distinguish contributors versus authors based on a 20% threshold. By this rule, patch had two authors: Larry Wall and David J. MacKenzie.

Here is a summary of changes made between Larry Wall's 2.0.12 (version 2.0 with patches 1-12 applied) and the updated version using MacKenzie's work:

 ChangeLog    |  290 +++++++++++
 Configure    | 1476 ++!!======================================================
 EXTERN.h     |   21 
 INTERN.h     |   19 
 MANIFEST     |   27 
 Makefile.SH  |  116 !==
 README       |  111 +-===
 backupfile.c |  338 +++++++++++++
 backupfile.h |   37 +
 common.h     |  199 +======
 config.H     |   33 =
 config.h.SH  |  146 =====
 inp.c        |  364 +=============
 inp.h        |   18 
 malloc.c     |  467 ==================
 patch.c      |  938 ++++!!==============================
 patch.man    |  554 +++!!================
 patchlevel.h |    1 
 pch.c        | 1311 +++++++============================================
 pch.h        |   36 =
 util.c       |  451 +++!=============
 util.h       |   88 ===
 version.c    |   28 
 version.h    |    9 
 19 files changed, 1379 insertions(+), 39 deletions(-), 359 modifications(!), 5301 unchanged lines(=)

The X11R6 version is (except for a small change by Steve Gildea) identical to 2.0.12u9 by MacKenzie and Eggert. I found the corresponding patches and tarballs here:

http://www.nic.funet.fi/index/gnu/funet/historical-funet-gnu-area-from-early-1990s/

Extended ...

Before releasing GNU patch 2.1, David MacKenzie provided patches for the non-GNU version of patch. From the timeline, it appears that patch12u8 and patch12u9 reflected a change in direction, or finishing, since the GNU patch changelog does not not mention these, calling them patch12g8 and patch12g9.

Patch Date Notes
patch12u 1990/12/02 Apply unidiff patches
patch12u2 1991/01/07 unidiff patch 1
patch12u3 1991/01/20 includes unidiff fixes
patch12u4 1991/06/27  
patch12u5 1991/12/03 includes unidiff fixes
patch12u6 1992/03/06 improve backup option
patch12u7 1992/07/06 improve backup option
patch12u8
patch12g8
1992/09/15 improve RCS/SCCS detection
patch12u9
patch12g9
1993/05/30 Paul Eggert commits start
patch12g10 1993/05/30  
patch12g11 1993/05/31  
2.1 release 1993/06/10  

Most of Eggert's changes past patch12u8 dealt with configuration (using the autoconf-generated script and adjusting the makefile). The README in these tarballs provides an explanation which is omitted in the 2.1 release:

There are two GNU variants of patch: this one, which retains Larry
Wall's interactive Configure script and has patchlevels starting with
`12u'; and another one that has a GNU-style non-interactive configure
script and accepts long-named options, and has patchlevels starting
with `12g'.  Unlike the 12g variant, the 12u variant contains no
copylefted code, for the paranoid.  The two variants are otherwise the
same.  They should be available from the same places.

Here is a comparison of the patch-2.0.12u9 and patch-2.0.12g11 tarballs:

 COPYING      |  339 +++++++++++++
 ChangeLog    |  358 ++-===========
 Configure    | 1475 -----------------------------------------------------------
 EXTERN.h     |   21 
 INSTALL      |  118 ++++
 INTERN.h     |   19 
 MANIFEST     |   27 -
 Makefile.SH  |  116 ----
 Makefile.in  |   88 +++
 NEWS         |   10 
 README       |   99 -!
 alloca.c     |  475 +++++++++++++++++++
 backupfile.c |  407 ++=============
 backupfile.h |   46 =
 common.h     |  201 =======
 config.H     |   33 -
 config.h.SH  |  146 -----
 config.h.in  |   80 +++
 configure    | 1118 ++++++++++++++++++++++++++++++++++++++++++++
 configure.in |   24 
 getopt.c     |  731 +++++++++++++++++++++++++++++
 getopt.h     |  129 +++++
 getopt1.c    |  176 +++++++
 inp.c        |  363 ==============
 inp.h        |   18 
 malloc.c     |  467 ------------------
 patch.c      |  960 +!!==================================
 patch.man    |  572 !=====================
 patchlevel.h |    1 
 pch.c        | 1305 !===================================================
 pch.h        |   36 =
 rename.c     |   51 ++
 util.c       |  462 -=================
 util.h       |   88 ===
 version.c    |   25 =
 version.h    |    9 
 29 files changed, 3554 insertions(+), 2398 deletions(-), 224 modifications(!), 4417 unchanged lines(=)

Both the u and g versions were mentioned in GNU's Bulletin:

In subsequent work (mostly by Paul Eggert), there are three features which I use:

Later versions added other improvments, but 2.5.4 is “good enough.” Here is a summary of change between patch-2.0.12g11 and patch-2.5.4, ignoring the generated configure script and its utilities config.guess, config.sub:

 AUTHORS                |    9 
 COPYING                |  340 ========
 ChangeLog              | 1918 ++++++++++++++++++++++++++++++++++++++++=======
 EXTERN.h               |   21 
 INSTALL                |  182 +!!!
 INTERN.h               |   19 
 Makefile.in            |  190 ++!
 NEWS                   |  198 ++++
 README                 |   53 !
 aclocal.m4             |  409 ++++++++++
 addext.c               |  105 ++
 alloca.c               |  475 ------------
 ansi2knr.1             |   36 
 ansi2knr.c             |  678 +++++++++++++++++
 argmatch.c             |  306 +++++++
 argmatch.h             |  129 +++
 backupfile.c           |  403 ---!!!!==
 backupfile.h           |   60 
 basename.c             |   55 +
 basename.h             |    9 
 common.h               |  328 +++!!!!
 config.h.in            |   80 --
 config.hin             |  169 ++++
 configure.in           |   59 !
 error.c                |  250 ++++++
 error.h                |   78 +
 getopt.c               | 1076 ++++++++-!!=================
 getopt.h               |  169 +===
 getopt1.c              |  188 ====
 inp.c                  |  483 +++!!!!!!!=
 inp.h                  |   18 
 install-sh             |  251 ++++++
 m4/ccstdc.m4           |   95 ++
 m4/d-ino.m4            |   42 +
 m4/inttypes_h.m4       |   22 
 m4/largefile.m4        |  115 ++
 m4/malloc.m4           |   35 
 m4/protos.m4           |   25 
 m4/realloc.m4          |   35 
 m4/utimbuf.m4          |   40 +
 maketime.c             |  501 ++++++++++++
 maketime.h             |   39 
 malloc.c               |   38 
 memchr.c               |  199 +++++
 mkdir.c                |  108 ++
 mkinstalldirs          |   40 +
 partime.c              |  956 ++++++++++++++++++++++++
 partime.h              |   77 +
 patch.c                | 1388 +++++++++++!!!!!!!!!!!!!!!========
 patch.man              | 1220 ++++++++++++++++--!!!!!!!!=====
 patchlevel.h           |    1 
 pc/chdirsaf.c          |   34 
 pc/djgpp/README        |   19 
 pc/djgpp/config.sed    |   41 +
 pc/djgpp/configure.bat |   27 
 pc/djgpp/configure.sed |   37 
 pch.c                  | 1923 +++++++++++++++-!!!!!!!!!!!!!!!===================
 pch.h                  |   36 
 quotearg.c             |  403 ++++++++++
 quotearg.h             |  109 ++
 quotesys.c             |  125 +++
 quotesys.h             |    9 
 realloc.c              |   44 +
 rename.c               |  113 +
 rmdir.c                |   87 ++
 util.c                 |  996 ++++++++++++++!!!!!!!!!=
 util.h                 |   88 -!
 version.c              |   30 
 version.h              |    9 
 xalloc.h               |   52 +
 xmalloc.c              |  113 ++
 71 files changed, 10938 insertions(+), 947 deletions(-), 3103 modifications(!), 3027 unchanged lines(=)

Given all of that background, you can see that while the X11R6 patch utility is the non-copylefted GNU patch referred to in the README file, the GNU developers continued to improve the copylefted version, leaving the X11R6 version behind.

Adopted ...

GNU patch has more of an impact on developers than the corresponding diff utility, because the patch program is expected to handle the output of diff no matter where it came from.

The modern BSDs adopted (with the usual variations) the non-copylefted GNU patch releases:

The BSD patch recognizes many of the GNU patch long options, although starting from patch12u8, because the corresponding patch12g8 supported long options (predating any modern BSD). Linux-based systems such as Debian as well as the BSD-derived OSX simply provide GNU patch.

FreeBSD is a special case:

The legacy Unix systems were slower to adopt patch at all, much less the improved version. For instance, Sun did not provide patch in SunOS 4, nor in Solaris 2.4, but only in the later Solaris releases starting around 1995. HP provided patch in HPUX 10.10 (February 1996).

The manual pages for AIX, HPUX and IRIX64 were adapted from Larry Wall's manual page from 2.0 patch 12 (2.0.12). The manual page for Solaris 5 has more substantial changes, by omitting the description of -s, -S and -x. It also shows a -i option (perhaps 2-3 lines of source code, using freopen and checking for error), which also is in SUSV2. Because OpenSolaris did not include Sun's version of the patch utility, it is only possible to gauge influence by comparing documentation (whether Sun's patch is a reimplementation or direct reuse is unknown).

POSIX reflects the changes in Unix:

After (sometimes lengthy) discussion, changes are submitted for review based largely upon existing practice. Sometimes the changes are limited to making the descriptions more precise, or providing explanations for the inclusion (or exclusion) of features. Compare the 1997 and 2004 descriptions to see how this applies to them.

Occasionally, new(er) features are added. Toward that end, Paul Eggert reported the omission of unified diff from the standard as a defect in June 2006 on the Austin review mailing list, proposing the changes to make to the document to rectify the problem. The original documentation for unified diff by Davison lacked sufficient detail to be useful; e.g., this:

Index: patch.man
@@ -81,5 +81,5 @@
=.SH DESCRIPTION
=.I Patch
-will take a patch file containing any of the three forms of difference
+will take a patch file containing any of the four forms of difference
=listing produced by the
=.I diff
@@ -102,8 +102,10 @@
=.BR -c ,
=.BR -e ,
+.BR -n ,
=or
-.B -n
+.B -u
=switch.
-Context diffs and normal diffs are applied by the
+Context diffs (old-style, new-style, and unified) and
+normal diffs are applied by the
=.I patch
=program itself, while ed diffs are simply fed to the
@@ -377,4 +379,9 @@
=.sp
=will ignore the first and second of three patches.
+.TP 5
+.B \-u
+forces
+.I patch
+to interpret the patch file as a unified context diff (a unidiff).
=.TP 5
=.B \-v

Eggert supplied the details. Eggert also reported problems in the document relating to empty files, special files, symbolic links (which are not of direct interest to me).

I have been a subscriber to the Austin Group mailing lists (review and general discussion) since June 3, 1999, and when I find the topic interesting, keep a copy.

There does not appear to be a usable mail-archive for the lists; I rely upon my mail-archives when discussing the Austin Group lists. However, some of the working notes are visible, e.g., these from October 2006. The Austin Group board finally adopted unified diff and the matching option in early 2007, for changes in Issue 7, using Eggert's proposed changes. You can read the rationale for diff:

The -u and -U options of GNU diff have been included. Their output format, designed by Wayne Davison, takes up less space than -c and -C format, and in many cases is easier to read. The format's timestamps do not vary by locale, so LC_TIME does not affect it. The format's line numbers are rendered with the %1d format, not %d , because the file format notation rules would allow extra <blank> characters to appear around the numbers.

and also note the changed status of the patch utility:

The patch utility is moved from the User Portability Utilities option to the Base. User Portability Utilities is now an option for interactive utilities.

Although the pace of standardization may seem slow (innovation in 1991, widespread use over the next ten years, proposal for standardization in 2006, approval in 2007, and publication in 2013), you have to remember who pays the bills: manufacturers of turnkey computer systems and systems integrators who are reluctant to have differences among these manufactured systems. New features are most readily accepted if all agree that adding (and documenting them) is trivial, e.g, the -i option added to patch in SUSv2.

To get new features, some manufacturers acknowledge that their users may choose to install programs which the manufacturers did not provide. For instance, in the comp.unix.solaris newsgroup thread in Which shell? (December 2005: alternate link), one of Sun's developers commented that

"Glenn" <eponymousalias@xxxxxxxxx> writes in comp.unix.solaris:
|More generally, I have some trouble believing that Sun is tracking
|current versions of a number of common utilities.  A good example
|is /usr/bin/patch, the version of which is *way* out of date.  There
|I definitely was forced to get my own updated version to make it
|useable.

Sun ships a newer one as "gpatch", which works the way you expect,
instead of the way the standards require - which is why /usr/bin/patch
seems broken.

--
Alan Coopersmith * alanc@xxxxxxxxxxxxxxxxxxxx * Alan.Coopersmith@xxxxxxx

and later in the thread

js@xxxxxxxxxxxxxxx (Joerg Schilling) writes in comp.unix.solaris:
|From my experiences, the program called "patch" is completely unusable
|in several cases on Solaris independelty of the -p option parameter
|you try out.

Right - which is why I just always use the included /usr/bin/gpatch
and ignore /usr/bin/patch.

--
Alan Coopersmith * alanc@xxxxxxxxxxxxxxxxxxxx * Alan.Coopersmith@xxxxxxx

Later (2010), Coopersmith commented on the OpenSolaris developer's list:

I think GNU patch (which is the descendant of the original Larry Wall patch)
is the only sane choice here - it's what /usr/bin/patch points to in OpenSolaris
now, after many many years of customers complaining our patch was incompatible
with many patch files in the wild.

with a followup from Christopher Bergström:

To intentionally open a can of worms here.. GNU patch on osunix was
moved to /usr/bin.. I dont' know if it's POSIX compliant, but I can say
that's the route we went.. I'm happy to revise it if other feature-rich
tools were available. I think despite my own philosophical beliefs GNU
patch is what a lot of people coming to OSUNIX/OpenSolaris would like to
see..

That explains why OpenSolaris did not provide source for patch. It was

  1. unnecessary because it was published in another place, and
  2. a problem due to licensing.

The change was not limited to OpenSolaris: Solaris itself uses GNU patch:

thomas@vbx-solaris11:~$ uname -a
SunOS vbx-solaris11 5.11 11.1 i86pc i386 i86pc
thomas@vbx-solaris11:~$ /usr/bin/patch --version
patch 2.5.9
Copyright (C) 1988 Larry Wall
Copyright (C) 2003 Free Software Foundation, Inc.

This program comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of this program
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

written by Larry Wall and Paul Eggert
thomas@vbx-solaris11:~$

That version was published in mid-2004 according to these comments:

Regress ...

Not all progress is “forward” — some prefer to reinvent the past.

Consider Jörg Schilling's version, whose announcement in April 2011 asserted

-       A new program was added: patch
        This is based on the last patch(1) implementation from Larry Wall
        and in contrary to the GNU fork from the same software, it tries to be
        closer to the POSIX standard requirements.

None of that is true:

Here is a summary of the initial version (perhaps an afternoon's work):

Here is a diffstat of the initial version:

 ChangeLog    |  100 ----
 Configure    | 1425 -----------------------------------------------------------
 EXTERN.h     |   15 
 INTERN.h     |   15 
 MANIFEST     |   24 
 Makedist     |   11 
 Makefile     |   28 +
 Makefile.SH  |  102 ----
 Makefile.man |   18 
 README       |   83 ---
 common.h     |  169 !!!!=
 config.H     |   33 -
 config.h.SH  |  136 -----
 inp.c        |  322 !!!!!!!!!!===
 inp.h        |   18 
 malloc.c     |  467 -------------------
 patch.1      |  558 +++++++++++++++++++++++
 patch.c      |  990 +++++!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=====
 patch.man    |  472 -------------------
 patchlevel.h |    1 
 pch.c        | 1394 +++-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!===========
 pch.h        |   36 !
 util.c       |  416 ++!!!!!!!!!!!===
 util.h       |   74 --
 version.c    |   28 -
 version.h    |    9 
 26 files changed, 878 insertions(+), 3025 deletions(-), 2486 modifications(!), 555 unchanged lines(=)

Here is an amended diffstat ignoring (most of) the whitespace changes, as well as the removed files:

 Makefile     |   28 +
 Makefile.man |   18 
 common.h     |  166 !!!!=
 inp.c        |  363 !!===========
 inp.h        |   17 
 patch.1      |  558 +++=================
 patch.c      | 1160 +++++++--!!!!!!===========================
 patchlevel.h |    1 
 pch.c        | 1575 ++-!!!!====================================================
 pch.h        |   36 !
 util.c       |  471 +!!!============
 util.h       |   74 -!
 12 files changed, 453 insertions(+), 175 deletions(-), 636 modifications(!), 3203 unchanged lines(=)

Thereafter, Schilling copied selected features from GNU patch. However, he has not noticed all of the fixes which apply to his version. The next version, 12u6, included a bug-fix for fetchname which is still not present in any of Schilling's updates as of March 2017:

Mon Mar 16 14:10:42 1992  David J. MacKenzie  (djm@wookumz.gnu.ai.mit.edu)

        * patchlevel.h: PATCHLEVEL 12u6.

Sat Mar 14 13:13:29 1992  David J. MacKenzie  (djm at frob.eng.umd.edu)

        ...

        * util.c (fetchname): Test of stat return value was backward.
        From csss@scheme.cs.ubc.ca.

There are other changes, but they are mostly pointless, cosmetic changes. Such is authorship.

Nuisances

Bash-dependency

Ubuntu #209537 introduced a misfeature. Briefly, it checks if a COLUMNS environment variable is set, and uses whatever value atoi decodes to override the default of 80 columns for the report width. My advice was overruled (the bug report offers a disingenuous reason—see this for the context in which the remarks were made).

There is more than one reason why that is not a suitable change:

The change was applied to the Debian package two years later, without discussion immediately after a change of package maintainers, (see Debian #588876). A user pointed out part of the problem with the change in Debian #697696.

In diffstat 1.63, I added a check if the output is to a terminal (versus a pipe or file), and set the default width to match the terminal's width. That eliminated the need for the patches in the Debian package, as seen here.

Licensing

I changed the copyright notice of diffstat to use MIT-X11 licensing at the beginning of 1998 (version 1.26). Before that, I had used the same wording as I did in other works distributed from 1994 onward, e.g., the resizeterm patch. The reason for this change was likely prompted by my work to relicense ncurses, but also taking into account an old (October 1996) discussion with Joey Hess.

The license is (of course) given in full as a comment at the top of the files which comprise the program. Nothwithstanding this, some packagers find it inconvenient to cite the license properly. Here are a few examples:

Documentation


Download

Packages for diffstat

Version control systems

Version control systems which have implemented diffstat's include

Some are slower:

A few tools extend one or more of the version control systems, enabling their diffstat features to be used via the tool:

Other Uses

Besides imitating diffstat, there are embedded uses of the original tool:

Other implementations