http://invisible-island.net/
Copyright © 1996-2014,2015 by Thomas E. Dickey


DIFFSTAT – make histogram from diff-output

Synopsis

diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. It is useful for reviewing large, complex patch files.

History

I originally wrote this in 1992, along with an associated utility rcshist, to trace the change history of collections of files. Since then, I've found it most useful for summarizing source patches.

See the changelog for details:

Impact

Initially, I used diff and diffstat in a script named diff-patch. In 1994, I started using makepatch which gave more consistent results.

It was not until early 1996 that there was much attention by others to the tool. At that point, developers on both XFree86 and ncurses mailing lists started using it.

One of those developers (Tony Nugent) pointed it out to Linus Torvalds in July 1996, on linux.dev.kernel. Much later (in 2002), it was documented as part of the process for submitting Linux kernel patches for BitKeeper (BK) in Linux 2.4.20. Linus commented on the process:

Ok, pulled. But _please_ do this the regular way next time. There's even a script to help you do it in linux/Documentation/BK-usage/bk-mak-sum, which does it all for you for BK patches.

(many people end up doing their own thing, you don't have to use that particular script, of course. But the important thing I want is that the _email_ should contain enough information to make a good first pass judgement on what the patch does, and in particular it is important for me to see what a "bk pull" will actually change.)

That's why the "diffstat" is important to me if I do a BK pull – and why I want to see the patches as plaintext if I apply stuff to generic files..

Later, in 2005 Linus wrote git, which has the ability to generate a diffstat. There are some enhancements (git is able to track moves and renames of files).

Dependencies

Of course, I did not write diffstat as an isolated program. Rather, it provides a useful summary of the output of diff. That same output is typically processed by the patch program to apply changes to programs. Early on, this was the predominant method for distributing changes to programs. That was for two reasons:

For both of these reasons, I still provide diff's for the larger programs (in addition to complete sources):

Diff

Original diff

Context diff

Unified diff

Patch

Original patch

Besides being used in the usenet sources groups, Larry Wall's program was distributed as part of X11. The file sizes and dates indicate that there were ongoing improvements (data gleaned from the X distribution tarballs):

Release Date Diffstat Notes
Added Removed Modified Unchanged
net.sources 1984/11/09 N/A N/A N/A 1668 patch 1.1
net.sources 1984/11/29 217 12 182 1474 patch 1.2
mod.sources 1985/05/09 498 0 20 1600 patch 1.3
X11R1 1987/09/12 199 312 28 1778 patch 1.3, copyright 1984 by Larry Wall
X11R2 1987/12/31 4169 1328 224 453 patch kit 2.0 (patch level 9), copyright 1986 by Larry Wall
X11R3 1988/08/31 1278 203 520 3961 patch kit 2.0 (patch level 12), copyright 1988 by Larry Wall
X11R4 1988/08/30 0 1 2 5841  
X11R5 1988/08/30 0 0 0 5842  
X11R6 1993/05/28 1396 39 361 4736 Wayne Davison added support for unified diff 1990/05/01
X11R6.1 1994/09/14 226 9 25 5924 Stephen Gildea added ifdef's for WIN32
X11R6.3 1994/09/14 0 0 0 7273  
X11R6.4 1994/09/14 0 0 0 7273  
X11R6.5.1 2000/08/21 0 0 20 6155 changed CVS identifier
X11R6.6 2000/08/17 0 0 0 0  

I distinguish contributors versus authors based on a 20% threshold. By this rule, patch had two authors: Larry Wall and Wayne Davison.

Unified patch

Nuisances

Bash-dependency

Ubuntu #209537). introduced a misfeature. Briefly, it checks if a COLUMNS environment variable is set, and uses whatever value atoi decodes to override the default of 80 columns for the report width. My advice was overruled (the bug report offers a disingenuous reason—see this for the context in which the remarks were made).

There is more than one reason why that is not a suitable change:

The change was applied to the Debian package two years later, without discussion immediately after a change of package maintainers, (see Debian #588876). A user pointed out part of the problem with the change in Debian #697696, but made no headway with yet another maintainer.

Licensing

I changed the copyright notice of diffstat to use MIT-X11 licensing at the beginning of 1998 (version 1.26). Before that, I had used the same wording as I did in other works distributed from 1994 onward, e.g., the resizeterm patch. The reason for this change was likely prompted by my work to relicense ncurses, but also taking into account an old (October 1996) discussion with Joey Hess.

The license is (of course) given in full as a comment at the top of the files which comprise the program. Nothwithstanding this, some packagers find it inconvenient to cite the license properly. Here are a few examples:

Documentation


Download

Packages for diffstat

Version control systems

Version control systems which have implemented diffstat's include

Some are slower:

A few tools extend one or more of the version control systems, enabling their diffstat features to be used via the tool:

Other Uses

Besides imitating diffstat, there are embedded uses of the original tool:

Other implementations