http://invisible-island.net/
Copyright © 1996-2019,2022 by Thomas E. Dickey
diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. It is useful for reviewing large, complex patch files.
I originally wrote this in 1992, along with an associated utility rcshist, to trace the change history of collections of files. Since then, I've found it most useful for summarizing source patches.
See the changelog for details:
Initially, I used diff and
diffstat in a script named diff-patch
.
In 1994, I started using makepatch which gave
more consistent results.
It was not until early 1996 that there was much attention by others to the tool. At that point, developers on both XFree86 and ncurses mailing lists started using it.
One of those developers (Tony Nugent) pointed it out to Linus Torvalds in July 1996, on linux.dev.kernel. Much later (in 2002), it was documented as part of the process for submitting Linux kernel patches for BitKeeper (BK) in Linux 2.4.20. Linus commented on the process:
Ok, pulled. But _please_ do this the regular way next time. There's even a script to help you do it in linux/Documentation/BK-usage/bk-mak-sum, which does it all for you for BK patches.
(many people end up doing their own thing, you don't have to use that particular script, of course. But the important thing I want is that the _email_ should contain enough information to make a good first pass judgement on what the patch does, and in particular it is important for me to see what a "bk pull" will actually change.)
That's why the "diffstat" is important to me if I do a BK pull – and why I want to see the patches as plaintext if I apply stuff to generic files..
Later, in 2005 Linus wrote git, which has the ability to generate a diffstat. There are some enhancements (git is able to track moves and renames of files).
Of course, I did not write diffstat as an isolated program. Rather, it provides a useful summary of the output of diff. That same output is typically processed by the patch program to apply changes to programs. Early on, this was the predominant method for distributing changes to programs. That was for two reasons:
For both of these reasons, I still provide diff's for the larger programs (in addition to complete sources):
Just “diff” requires some clarification. Early on, I had these issues in mind:
diff
was not
the first file-comparison program which I had encountered. Some
of those were close enough that I thought I might extend
diffstat to analyze those.diff
programs
on different systems gave different results. Some of these
differences were later addressed by standardization; however
diffstat has a few quirks to handle some of
these variations.While I used other systems (IBM, Univac) during the 1970s, the first where I recall a file-comparison program were the DEC systems where I developed (RT-11, TENEX or DEC System 10). The Utility Programs page by D A Duce is a useful summary for systems at the time—1976—noting (for DEC System 10):
A utility exists to compare files, and this may be used to print certain special types of file.
According to manuals and other summaries, the PDP-10 program was named “filcom” while RT-11 had “srccom”:
filcom
, including the
/u
switch which shows change-bars.PDP-10's filcom
and RT-11's srccom
marked chunks of difference with asterisks, but did not number
lines. Rather, they showed page numbers, and also used
numbers 1 and 2 to refer to the two files.
Here is an example, from the assembly language manual:
.R FILECOM *LPT:/4L=FILEA,FILEB FILE 1) DSK:FILEA CREATED: 1456 17-JAN-1972 FILE 2) DSK:FILEB CREATED: 1456 17-JAN-1972 1)1 FILE A 1) A 1) B 1) C 1) D 1) E 1) F 1) G **** 2)1 FILE B 2) A 2) B 2) C 2) G ************** 1)1 K 1) L 1) M 1)2 N **** 2)1 1 2) 2 2) 3 2)2 N ************** 1)2 W **** 2)2 4 2) 5 2) W **************
By the time I wrote diffstat in 1992, I had encountered examples such as this (from dead-systems) on VMS, showing line-numbers:
$ differences SYSTARTUP_VMS.COM SYSTARTUP_VMS.TEMPLATE ************ File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4 360 $ @SYS$STARTUP:TCPIP$STARTUP.COM 361 $! ****** File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1 360 $!$ @SYS$STARTUP:TCPIP$STARTUP.COM 361 $! ************ ************ File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4 408 $! Mounting additional disks 409 $ MOUNT/SYSTEM DUA1 DATA1 410 $ MOUNT/SYSTEM DUA2 DATA2 411 $! 412 $ EXIT ****** File SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1 408 $ EXIT ************ Number of difference sections found: 2 Number of difference records found: 5 DIFFERENCES /IGNORE=()/MERGED=1- SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.COM;4- SYS$COMMON:[SYSMGR]SYSTARTUP_VMS.TEMPLATE;1
The file-comparison on VAX/VMS was named “differences” rather than “filcom” largely because it was a redesign, discarding old cruft. Some of that hides quirks: the VMS command interpreter stored only 4 characters to make a unique command. Because it also permitted abbreviations, anything beginning "diff" would run the "differences" command.
Like the Unix command, the VMS
file-comparison utility was designed with the notion that its
output could be used to apply changes via another tool such as
ed
. The VMS equivalent is
sumslp
(originally slp), which
was probably inspired by the Univac slp
processor.
The fc
program on MS-DOS was influenced by
filcom
. Infoworld (February
14, 1979) had a short article:
TEXT EDITOR FOR 8080 AND Z-80 DISK SYSTEMS AVAILABLE FROM MICROSOFT
Microsoft has announced the availability of Edit-80, a random access, line-oriented text editor for 8080 and Z-80 systems.
According to the company, Edit-80 features random line access to disk files, providing almost instantaneous access to any record of the file, even if the available memory is smaller than the file being edited.
In addition to the standard line commands to insert, delete, print or replace lines of text, the package offers such features as automatic line renumbering, global find and substitute, multiple page files and the ability to read files without line numbers. There is also an Alter Mode which allows editing portions of individual lines.
The edited files are not written to disk until a write command is given, and the original file is always saved as back-up.
The Edit-80 package includes a utility called Filcom, which compares source or binary files and outputs the differences between them. The editor runs on any 8080 or Z-80 system which uses the CP/M operating system. Price of the package is $120. The manual alone can be purchased for $10. Contact Microsoft, 819 Two Park Central Tower, Albuquerque, NM 87108.
The choice of name is not just a coincidence. Microsoft's 1980 catalog description shows the influence of DEC on their thinking:
EDIT-80 is a random access, line oriented text editor similar to those used on large computers like the DEC PDP-10 and IBM 360.
Roy Allen's A History of the Personal Computer mentions PDP-10s repeatedly.
In MS-DOS, the program was renamed “fc” and is still provided in Windows. It supports Unicode now. Any of the other documented options could have been in the initial version in 1979.
Here is the PDP-10 example using modern “fc”
Comparing files filea and FILEB ***** filea C D E F G ***** FILEB C G ***** ***** filea J K L M ***** FILEB J 1 2 3 ***** ***** filea V W ***** FILEB V 4 5 W *****
There are still more file comparison programs which I had in mind. For instance
diff
does).filcom
).diff
-like program using the algorithm presented in
A File Comparison Program, by Webb Miller and
Eugene W. Myers published in Software: Practice and
Experience, Vol 15 (11, 1025-1040 November 1985). GNU diff
(written about the same time) also used this, according to its
documentation. Our motivation for writing this was the same
as GNU diff: to have an implementation free of the AT&T
license. I later added a copy of that code to my utilities
library as part of the work I did on rcshist
.Paul Heckel's paper A Technique for
Isolating Differences Between Files (April 1978, CACM)
described an algorithm which he stated he had used five years
earlier. For comparison, Heckel cited filcom
because
it was a well-known implementation. He mentioned a few other
algorithms in his paper (including Hunt and McIlroy
An
algorithm for differential file comparison, Computer
Science Technical Report 41, Bell Telephone Labs, August 1976),
but none had been used in a product such as filcom
.
Hunt and McIlroy's paper cited research results, described a
prototype of diff
and did not compare it with other
implementations.
My interest here, is of course, on the output format used
rather than nuances of the particular algorithm underneath. The
original diff
paper gave an example of its output
which hints at the description in the Unix V6 manual page for
diff
:
0a1,1 1,1 d0 >w <w 3,4 c 4,6 4,6 c 3,4 <c <x <d <y --- <z >x --- >y >c >z >d 6,7 d 7 7a6,7 <f >f <g >g
Oddly enough, the manual page mentions everything about the output format except the use of brackets for the changed-lines.
The 3BSD source-code for /usr/src/cmd/diff.c
(dated 1979, corresponding to Unix V7's diff) shows them used
as “<
” and
“>
” markers in the
change()
function. Although the manual page at that
point mentioned the brackets, it did not mention the space which
the program added after the brackets:
Diff tells what lines must be changed in two files to bring them into agreement. If file1 (file2) is `-', the standard input is used. If file1 (file2) is a directory, then a file in that directory whose file- name is the same as the file-name of file2 (file1) is used. The normal output contains lines of these forms: n1 a n3,n4 n1,n2 d n3 n1,n2 c n3,n4 These lines resemble ed commands to convert file1 into file2. The num- bers after the letters pertain to file2. In fact, by exchanging `a' for `d' and reading backward one may ascertain equally how to convert file2 into file1. As in ed, identical pairs where n1 = n2 or n3 = n4 are abbreviated as a single number. Following each of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected in the second file flagged by `>'.
That is the oldest version of diff
for which I
found source code (in the CSRG cd-images from myrnet.com). The comments in the code
attributed the algorithm used to Harold Stone, and used
backspace-sequences to underline keywords, e.g.,
* The cleverness lies in routine stone_. This marches * through the lines of file0, developing a vector klist * of "k-candidates". At step i a k-candidate is a matched * pair of lines x,y (x in file0 y in file1) such that * there is a common subsequence of lenght k * between the first i lines of file0 and the first y * lines of file1, but there is no such subsequence for * any smaller y. x is the earliest possible mate to y * that occurs in such a subsequence.
The 4.0BSD source code (1980) implemented context diff. The manual page described it:
-c produces a diff with lines of context. The default is to present 3 lines of context and may be changed, e.g to 10, by -c10. With -c the output format is modified slightly: the output beginning with identification of the files involved and their creation dates and then each change is separated by a line with a dozen *'s. The lines removed from file1 are marked with `-'; those added to file2 are marked `+'. Lines which are changed from one file to the other are marked in both files with `!'.
It also added the recursive option:
-r causes application of diff recursively to common sub- directories encountered.
The only attribution in the code was for Harold Stone; the backspace-sequences were removed. Jonathan Gray has a Git repository of CSRG sources. For what it's worth, the initial check-in was by Bill Joy, but that source already implemented context diff (there is no useful source-history to determine who did what).
These changes were incorporated into Unix V8 (1985), as shown in the manual page.
Unified diff made a fourth format to consider (after ed-scripts, normal and context diffs).
Wayne Davison posted unidiff
to
comp.sources.misc in August 1990:
That posting appeared in volume 14, issue 70 (Davison apparently submitted this on August 22, the posting appeared early August 31, and it was archived September 6, 1990). From his announcement:
I've created a new context diff format that combines the old and new hunks into one unified hunk. The result? The unified context diff, or "unidiff." Posting your patch using a unidiff will usually cut its size down by around 25% (I've seen from 12% to 48%, depending on how many redundant context lines are removed). Even if the diffs are generated with only 2 lines of context, the savings still average around 20%. Keep in mind that *no information is lost* by the conversion process. Only the redundancy of having multiple identical context lines. [...] I've included: o a patch to make gnudiff (v1.14) generate a unidiff. o a patch to make patch (patchlevel 12) accept a unidiff. o a versatile program called "unify" that can translate from a context diff (new- or old-style) into a unidiff, and from a unidiff into a true new-style context diff. o a man page for unify. o a 1.3k bandaid called "unipatch" that translates a unidiff into a context diff format that older versions of patch can understand. (It outputs a slightly degenerate form of a context diff (no '!'s) but it works great with patch.) o a Makefile to get you going quickly. -- \ /| / /|\/ /| /(_) Wayne Davison (_)/ |/ /\|/ / |/ \ davison@dri.com (W A Y N e) ...!uunet!drivax!davison
A motivation was to reduce the size of patches (the
output of diff
as used in the usenet source groups
to distribute changes to a program.
Although few used “unidiff
” as such,
Davison's patches for GNU diff and Larry Wall's
patch program were used:
Mon Dec 3 14:23:55 1990 Richard Stallman (rms at mole.ai.mit.edu) * diff.c (longopts, usage): unidiff => unified. Sun Sep 23 22:49:29 1990 Richard Stallman (rms at mole.ai.mit.edu) * context.c (print_context_script): Handle unidiff_flag. (print_context_header): Likewise. (print_unidiff_number_range, pr_unidiff_hunk): New functions. * diff.c (longopts): Add element for +unidiff. (main): Handle +unidiff and -u. (usage): Mention them.
There were earlier programs named “patch” (for example, DEC's binary-patch program which I used with RT-11 in the mid-1970s). Here we are only concerned with the source-patching program starting with Larry Wall's version in the mid-1980s. He announced more than one version to net.sources:
On October 27, 1986, he announced version 2.0 (calling it “patch kit”) on mod.sources:
Although posted in October, the file check-in dates show
September 17, 1986.
I found the posting and the twelve followup patches at
ftp://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/volume7/patch2/
The patches do not appear in the mod.sources or net.sources archives; I have established dates for those as I did for the other sources (by inspection), and reconstructed the different fully-patched versions. Two of the patches (7 and 12) do not apply cleanly:
Patch Date Notes 1 1986/10/29 problem with backward search 2 1986/10/29 problem with context diff 3 1986/11/03 fix for 4.3-style context diff 4 1986/11/29 realloc-fix, context-diff fix 5 1986/11/29 detecting diffs in a file 6 1987/01/06 incorrect free 7 1987/01/31 workaround mangled patches 8 1987/02/16 synchronization for short chunks 9 1987/06/04 incorrect free 10 1988/06/03 many fixes... 11 1988/06/03 new Configure script 12 1988/06/22 portability fixes
Besides being used in the usenet sources groups, Larry Wall's program was distributed as part of X11. The file sizes and dates indicate that there were ongoing improvements (data gleaned from the X distribution tarballs):
Release Date Diffstat Notes Added Removed Modified Unchanged net.sources 1984/11/09 N/A N/A N/A 1668 patch 1.1 net.sources 1984/11/29 217 12 182 1474 patch 1.2 mod.sources 1985/05/09 498 0 20 1600 patch 1.3 X11R1 1987/09/12 199 312 28 1778 patch 1.3, copyright 1984 by Larry Wall X11R2 1987/12/31 4169 1328 224 453 patch kit 2.0 (patch level 9), copyright 1986 by Larry Wall X11R3 1988/08/31 1278 203 520 3961 patch kit 2.0 (patch level 12), copyright 1988 by Larry Wall X11R4 1988/08/30 0 1 2 5841 X11R5 1988/08/30 0 0 0 5842 X11R6 1993/05/28 1396 39 361 4736 added Wayne Davison's changes for unified diff X11R6.1 1994/09/14 226 9 25 5924 Stephen Gildea added ifdef's for WIN32 X11R6.3 1994/09/14 0 0 0 7273 X11R6.4 1994/09/14 0 0 0 7273 X11R6.5.1 2000/08/21 0 0 20 7257 changed CVS identifier X11R6.6 2000/08/17 0 0 0 7273
That large change with X11R6 is the point of this section.
Because Larry Wall made no changes to patch
after
releasing patch 12 to version 2.0, David J. MacKenzie (who worked
on GNU diff and patch) incorporated Davison's patch, and worked
with Davison to make a followup fix:
Sun Jan 20 20:18:58 1991 David J. MacKenzie (djm at geech.ai.mit.edu) * Makefile.SH (all): Don't make a dummy `all' file. * patchlevel.h: PATCHLEVEL 12u3. * patch.c (nextarg): New function. (get_some_switches): Use it, to prevent dereferencing a null pointer if an option that takes an arg is not given one (is last on the command line). From Paul Eggert. * pch.c (another_hunk): Fix from Wayne Davison to recognize single-line hunks in unified diffs (with a single line number instead of a range). * inp.c (rev_in_string): Don't use `s' before defining it. From Wayne Davison. Mon Jan 7 06:25:11 1991 David J. MacKenzie (djm at geech.ai.mit.edu) * patchlevel.h: PATCHLEVEL 12u2. * pch.c (intuit_diff_type): Recognize `+++' in diff headers, for unified diff format. From unidiff patch 1.
Most of the changes (86%) listed in the X11R6
ChangeLog
file for patch
were from
David J. MacKenzie. Wayne Davison's initial set of changes
accounts for about 15 percent of that (177 lines added, 8 lines
removed). At the same time, MacKenzie was the main developer
working on GNU patch: 68% versus Paul Eggert with 26%. As
MacKenzie noted in the README
file for patch 2.1
(released June 11, 1993):
This version of patch contains modifications made by the Free Software Foundation, summarized in the file ChangeLog. Primarily they are to support the unified context diff format that GNU diff can produce, to support making GNU Emacs-style backup files, and to support the GNU conventions for option parsing and configuring and compilation. They also include fixes for some bugs. The FSF is distributing this version of patch independently because as of this writing, Larry Wall has not released a new version of patch since mid-1988. I have heard that he has been too busy working on other things, like Perl.
I distinguish contributors versus authors based on a 20% threshold. By this rule, patch had two authors: Larry Wall and David J. MacKenzie.
Here is a summary of changes made between Larry Wall's 2.0.12 (version 2.0 with patches 1-12 applied) and the updated version using MacKenzie's work:
ChangeLog | 290 +++++++++++
Configure | 1476 ++!!======================================================
EXTERN.h | 21
INTERN.h | 19
MANIFEST | 27
Makefile.SH | 116 !==
README | 111 +-===
backupfile.c | 338 +++++++++++++
backupfile.h | 37 +
common.h | 199 +======
config.H | 33 =
config.h.SH | 146 =====
inp.c | 364 +=============
inp.h | 18
malloc.c | 467 ==================
patch.c | 938 ++++!!==============================
patch.man | 554 +++!!================
patchlevel.h | 1
pch.c | 1311 +++++++============================================
pch.h | 36 =
util.c | 451 +++!=============
util.h | 88 ===
version.c | 28
version.h | 9
19 files changed, 1379 insertions(+), 39 deletions(-), 359 modifications(!), 5301 unchanged lines(=)
The X11R6 version is (except for a small change by Steve Gildea) identical to 2.0.12u9 by MacKenzie and Eggert. I found the corresponding patches and tarballs here:
http://www.nic.funet.fi/index/gnu/funet/historical-funet-gnu-area-from-early-1990s/
Before releasing GNU patch 2.1, David MacKenzie provided patches for the non-GNU version of patch. From the timeline, it appears that patch12u8 and patch12u9 reflected a change in direction, or finishing, since the GNU patch changelog does not not mention these, calling them patch12g8 and patch12g9.
Patch Date Notes patch12u 1990/12/02 Apply unidiff patches patch12u2 1991/01/07 unidiff patch 1 patch12u3 1991/01/20 includes unidiff fixes patch12u4 1991/06/27 patch12u5 1991/12/03 includes unidiff fixes patch12u6 1992/03/06 improve backup option patch12u7 1992/07/06 improve backup option patch12u8
patch12g81992/09/15 improve RCS/SCCS detection patch12u9
patch12g91993/05/30 Paul Eggert commits start patch12g10 1993/05/30 patch12g11 1993/05/31 2.1 release 1993/06/10
Most of Eggert's changes past patch12u8 dealt with configuration (using the autoconf-generated script and adjusting the makefile). The README in these tarballs provides an explanation which is omitted in the 2.1 release:
There are two GNU variants of patch: this one, which retains Larry Wall's interactive Configure script and has patchlevels starting with `12u'; and another one that has a GNU-style non-interactive configure script and accepts long-named options, and has patchlevels starting with `12g'. Unlike the 12g variant, the 12u variant contains no copylefted code, for the paranoid. The two variants are otherwise the same. They should be available from the same places.
Here is a comparison of the patch-2.0.12u9 and patch-2.0.12g11 tarballs:
COPYING | 339 +++++++++++++
ChangeLog | 358 ++-===========
Configure | 1475 -----------------------------------------------------------
EXTERN.h | 21
INSTALL | 118 ++++
INTERN.h | 19
MANIFEST | 27 -
Makefile.SH | 116 ----
Makefile.in | 88 +++
NEWS | 10
README | 99 -!
alloca.c | 475 +++++++++++++++++++
backupfile.c | 407 ++=============
backupfile.h | 46 =
common.h | 201 =======
config.H | 33 -
config.h.SH | 146 -----
config.h.in | 80 +++
configure | 1118 ++++++++++++++++++++++++++++++++++++++++++++
configure.in | 24
getopt.c | 731 +++++++++++++++++++++++++++++
getopt.h | 129 +++++
getopt1.c | 176 +++++++
inp.c | 363 ==============
inp.h | 18
malloc.c | 467 ------------------
patch.c | 960 +!!==================================
patch.man | 572 !=====================
patchlevel.h | 1
pch.c | 1305 !===================================================
pch.h | 36 =
rename.c | 51 ++
util.c | 462 -=================
util.h | 88 ===
version.c | 25 =
version.h | 9
29 files changed, 3554 insertions(+), 2398 deletions(-), 224 modifications(!), 4417 unchanged lines(=)
Both the u and g versions were mentioned in GNU's Bulletin:
does not mention patch
, however the initial
patch12u file was (re)constructed June 4, 1991.
The dates inside the other patch12u files correspond
to commit dates from GNU patch's change-log.
diff 1.15, grep/egrep 1.5, fgrep 1.1, and patch 2.0.12u5 The diff and [ef]grep programs are GNU's versions of the Unix programs of the same name. They are much faster than their traditional Unix versions. patch is Larry Wall's program to take diff's output and apply those differences to an original file to generate the patched version.
diff 1.15, grep/egrep 1.5, fgrep 1.1, and patch 2.0.12u6 The diff and [ef]grep programs are GNU's versions of the Unix programs of the same name. They are much faster than the traditional Unix versions. patch is Larry Wall's program to take diff's output and apply those differences to an original file to generate the patched version.
patch 2.0.12g8 patch is our version of Larry Wall's program to take diff's output and apply those differences to an original file to generate the modified version.
mentions patch
, but without mentioning the
version.
In subsequent work (mostly by Paul Eggert), there are three features which I use:
The --dry-run
option (1997).
The feature is useful; the name (and the lack of a short
option) are problematic. Calling it -n
would
have been preferable, like the make
program.
However, Larry Wall's original program provided options for
overrriding the patch
program's detection of the
type of patch: -c
, -e
and
-n
for context, ed-script and normal diffs,
respectively.
In writing this investigation, I am reminded that
Pohl-Henning Kamp added a -C
(--check
)
option to FreeBSD's patch
program in mid-1993.
However, GNU patch
is used in many more places
than FreeBSD's patch
program.
Improved pathname resolution (1999).
This did not happen all at once, though Eggert spent a
couple of years working on patch
. My note in the
ncurses FAQ indicates that the initial changes worsened the
behavior. But finally, I saw that 2.5.4 in 1999 was an
improvement over all of the preceding versions.
My experience here is illustrated by my reply to a user
who was having trouble applying a set of diffs with NetBSD's
patch
program:
On Thu, 29 Dec 2005, Stef Caunter wrote: > Hi, sorry I am not aware of a mailing list for ncurses. > > I built ncurses-5.5 on NetBSD2.0.2 and had to make a couple of small changes. > I did not use pkgsrc. > > in ncurses/base/lib_newterm.c "make" complained about 'value' being > undeclared on line 118, so I added > > int value; > > before line 118 to get around that. I'm not sure about this one - lib_newterm.c was last touched on 26 November, and looks ok there. > Make stopped later in the compile saying it didn't know how to build > base/legacy_coding.c, which I noticed was a patch target in some of the > recent patches (all patches were applied in order from the 5.5 directory, I > think successfully). What version of patch are you using? iirc (noticed a month or two ago), the BSDers don't like the notion of using GNU patch, so /usr/bin/patch is some variation that has minimal features copied from it, but has the original's problems resolving pathnames. (At least - unlike Solaris - it can read unified diffs - Solaris's patch is a real antique). hp-testdrive's copy asserts (it was here that I was testing) Patch version 2.0, patch level 12u8 though a look at the cvs here http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/patch/ confirms that they're more interested in its license than how well it works. I gave up on that version several years ago: http://invisible-island.net/ncurses/ncurses.faq.html#applying_patches > I copied legacy_coding.c from the top-level directory to ncurses/base and the > compile succeeded. sounds familiar. The original patch would (under some situations that I don't recall) ignore the -p option, using its own guess and getting confused by files (such as "Makefile") that could appear in different levels of the patched tree. patch 2.1 was more cautious about guessing, and was usually ok. Since I generate almost all of my patches in the same way, it was easy to see that some were mistreated. ( hmm - you can always report it as a bug, and get the same sort of treatment as before ;-)
Regarding Solaris, I see that Solaris 9 provided unified diffs. At that point in time, I was still using Solaris 8.
Ability to handle long lines of text (1993).
According to the change-log, this was done in 1993—after release 2.1. The next release (2.2) was in 1997. The Git repository has no detailed information going back that far.
However, my updates for the ncurses FAQ show that I
modified the scheme for doing rollup
patches early in 1998 to work around line-length problems
with patch
2.1, by putting "wide" files into a
tar-file and applying patches as such only to files whose
lines were no longer than 1024 characters. At the time, I
noted that I tested the rollup patches with versions 2.1 and
2.5 (and found that the interim versions were not
suitable).
In later releases, I dropped the too-wide files (generated HTML in the Ada95 tree), and that problem became moot.
A few years later,
in 2002, a user reported on the bug-ncurses mailing list
that patch
had dumped core while applying one of
my patches. Paul Eggert sent mail pointing to a newer release
without that problem. On the whole, however, bug reports
implicating GNU patch
are rare.
Later versions added other improvments, but 2.5.4 is
“good enough.” Here is a summary of change between
patch-2.0.12g11 and patch-2.5.4, ignoring the generated
configure
script and its utilities
config.guess
, config.sub
:
AUTHORS | 9
COPYING | 340 ========
ChangeLog | 1918 ++++++++++++++++++++++++++++++++++++++++=======
EXTERN.h | 21
INSTALL | 182 +!!!
INTERN.h | 19
Makefile.in | 190 ++!
NEWS | 198 ++++
README | 53 !
aclocal.m4 | 409 ++++++++++
addext.c | 105 ++
alloca.c | 475 ------------
ansi2knr.1 | 36
ansi2knr.c | 678 +++++++++++++++++
argmatch.c | 306 +++++++
argmatch.h | 129 +++
backupfile.c | 403 ---!!!!==
backupfile.h | 60
basename.c | 55 +
basename.h | 9
common.h | 328 +++!!!!
config.h.in | 80 --
config.hin | 169 ++++
configure.in | 59 !
error.c | 250 ++++++
error.h | 78 +
getopt.c | 1076 ++++++++-!!=================
getopt.h | 169 +===
getopt1.c | 188 ====
inp.c | 483 +++!!!!!!!=
inp.h | 18
install-sh | 251 ++++++
m4/ccstdc.m4 | 95 ++
m4/d-ino.m4 | 42 +
m4/inttypes_h.m4 | 22
m4/largefile.m4 | 115 ++
m4/malloc.m4 | 35
m4/protos.m4 | 25
m4/realloc.m4 | 35
m4/utimbuf.m4 | 40 +
maketime.c | 501 ++++++++++++
maketime.h | 39
malloc.c | 38
memchr.c | 199 +++++
mkdir.c | 108 ++
mkinstalldirs | 40 +
partime.c | 956 ++++++++++++++++++++++++
partime.h | 77 +
patch.c | 1388 +++++++++++!!!!!!!!!!!!!!!========
patch.man | 1220 ++++++++++++++++--!!!!!!!!=====
patchlevel.h | 1
pc/chdirsaf.c | 34
pc/djgpp/README | 19
pc/djgpp/config.sed | 41 +
pc/djgpp/configure.bat | 27
pc/djgpp/configure.sed | 37
pch.c | 1923 +++++++++++++++-!!!!!!!!!!!!!!!===================
pch.h | 36
quotearg.c | 403 ++++++++++
quotearg.h | 109 ++
quotesys.c | 125 +++
quotesys.h | 9
realloc.c | 44 +
rename.c | 113 +
rmdir.c | 87 ++
util.c | 996 ++++++++++++++!!!!!!!!!=
util.h | 88 -!
version.c | 30
version.h | 9
xalloc.h | 52 +
xmalloc.c | 113 ++
71 files changed, 10938 insertions(+), 947 deletions(-), 3103 modifications(!), 3027 unchanged lines(=)
Given all of that background, you can see that while the X11R6
patch
utility is the non-copylefted GNU
patch
referred to in the README file, the
GNU developers continued to improve the copylefted version,
leaving the X11R6 version behind.
GNU patch
has more of an impact on developers
than the corresponding diff
utility, because the
patch
program is expected to handle the output of
diff
no matter where it came from.
The modern BSDs adopted (with the usual variations) the
non-copylefted GNU patch
releases:
The BSD patch
recognizes many of the GNU
patch
long options, although starting from
patch12u8, because the corresponding patch12g8
supported long options (predating any modern BSD). Linux-based
systems such as
Debian as well as the BSD-derived
OSX simply provide GNU patch
.
FreeBSD is a special case:
patch
2.1 (GPL) in June 1993.patch
through
FreeBSD
release 9.3 (2004).patch
program was
copied early in 2013 from DragonFly,/usr/bin/patch
; the other would be renamed to
gnupatch
or bsdpatch
,
respectively.patch
was
dropped.-C
(a
dry-run option) added to FreeBSD in mid-1993,
copied by OpenBSD in 1998 (while NetBSD copied GNU in
2005).-I
, -index-first
(modified by
FreeBSD), -S
, and -skip
.The legacy Unix systems were
slower to adopt patch
at all, much less the improved
version. For instance, Sun did not provide patch
in
SunOS 4, nor in Solaris 2.4, but only
in the later Solaris releases starting around 1995. HP provided
patch
in
HPUX 10.10 (February
1996).
The manual pages
for AIX, HPUX and IRIX64 were adapted from Larry Wall's manual
page from 2.0 patch 12 (2.0.12). The manual page for
Solaris 5 has more substantial changes, by omitting the
description of -s
, -S
and
-x
. It also shows a -i
option (perhaps
2-3 lines of source code, using
freopen
and checking for error), which also is
in SUSV2. Because OpenSolaris did not include Sun's version of
the patch
utility, it is only possible to gauge
influence by comparing documentation (whether Sun's
patch
is a reimplementation or direct reuse is
unknown).
POSIX reflects the changes in Unix:
After (sometimes lengthy) discussion, changes are submitted for review based largely upon existing practice. Sometimes the changes are limited to making the descriptions more precise, or providing explanations for the inclusion (or exclusion) of features. Compare the 1997 and 2004 descriptions to see how this applies to them.
Occasionally, new(er) features are added. Toward that end, Paul Eggert reported the omission of unified diff from the standard as a defect in June 2006 on the Austin review mailing list, proposing the changes to make to the document to rectify the problem. The original documentation for unified diff by Davison lacked sufficient detail to be useful; e.g., this:
Index: patch.man @@ -81,5 +81,5 @@ =.SH DESCRIPTION =.I Patch -will take a patch file containing any of the three forms of difference +will take a patch file containing any of the four forms of difference =listing produced by the =.I diff @@ -102,8 +102,10 @@ =.BR -c , =.BR -e , +.BR -n , =or -.B -n +.B -u =switch. -Context diffs and normal diffs are applied by the +Context diffs (old-style, new-style, and unified) and +normal diffs are applied by the =.I patch =program itself, while ed diffs are simply fed to the @@ -377,4 +379,9 @@ =.sp =will ignore the first and second of three patches. +.TP 5 +.B \-u +forces +.I patch +to interpret the patch file as a unified context diff (a unidiff). =.TP 5 =.B \-v
Eggert supplied the details. Eggert also reported problems in the document relating to empty files, special files, symbolic links (which are not of direct interest to me).
I have been a subscriber to the Austin Group mailing lists (review and general discussion) since June 3, 1999, and when I find the topic interesting, keep a copy.
There does not appear to be a usable mail-archive for the
lists; I rely upon my mail-archives when discussing the Austin
Group lists. However, some of the working notes are visible,
e.g., these from October
2006. The Austin Group board finally adopted unified diff and
the matching option in early
2007, for changes in Issue 7, using Eggert's
proposed changes. You can read the rationale for
diff
:
The
-u
and-U
options of GNU diff have been included. Their output format, designed by Wayne Davison, takes up less space than-c
and-C
format, and in many cases is easier to read. The format's timestamps do not vary by locale, so LC_TIME does not affect it. The format's line numbers are rendered with the%1d
format, not%d
, because the file format notation rules would allow extra <blank> characters to appear around the numbers.
and also note the changed status of the patch
utility:
The patch utility is moved from the User Portability Utilities option to the Base. User Portability Utilities is now an option for interactive utilities.
Although the pace of standardization may seem slow
(innovation in 1991, widespread use over the
next ten years, proposal for standardization in 2006,
approval in 2007, and publication in 2013), you
have to remember who pays the
bills: manufacturers of turnkey computer systems and systems
integrators who are reluctant to have differences among these
manufactured systems. New features are most readily accepted if
all agree that adding (and documenting them) is trivial, e.g, the
-i
option added to patch
in SUSv2.
To get new features, some manufacturers acknowledge that their users may choose to install programs which the manufacturers did not provide. For instance, in the comp.unix.solaris newsgroup thread in Which shell? (December 2005: alternate link), one of Sun's developers commented that
"Glenn" <eponymousalias@xxxxxxxxx> writes in comp.unix.solaris: |More generally, I have some trouble believing that Sun is tracking |current versions of a number of common utilities. A good example |is /usr/bin/patch, the version of which is *way* out of date. There |I definitely was forced to get my own updated version to make it |useable. Sun ships a newer one as "gpatch", which works the way you expect, instead of the way the standards require - which is why /usr/bin/patch seems broken. -- Alan Coopersmith * alanc@xxxxxxxxxxxxxxxxxxxx * Alan.Coopersmith@xxxxxxx
and later in the thread
js@xxxxxxxxxxxxxxx (Joerg Schilling) writes in comp.unix.solaris: |From my experiences, the program called "patch" is completely unusable |in several cases on Solaris independelty of the -p option parameter |you try out. Right - which is why I just always use the included /usr/bin/gpatch and ignore /usr/bin/patch. -- Alan Coopersmith * alanc@xxxxxxxxxxxxxxxxxxxx * Alan.Coopersmith@xxxxxxx
Later (2010), Coopersmith commented on the OpenSolaris developer's list:
I think GNU patch (which is the descendant of the original Larry Wall patch) is the only sane choice here - it's what /usr/bin/patch points to in OpenSolaris now, after many many years of customers complaining our patch was incompatible with many patch files in the wild.
with a followup from Christopher Bergström:
To intentionally open a can of worms here.. GNU patch on osunix was moved to /usr/bin.. I dont' know if it's POSIX compliant, but I can say that's the route we went.. I'm happy to revise it if other feature-rich tools were available. I think despite my own philosophical beliefs GNU patch is what a lot of people coming to OSUNIX/OpenSolaris would like to see..
That explains why OpenSolaris did not provide source for
patch
. It was
The change was not limited to OpenSolaris: Solaris
itself uses GNU patch
:
thomas@vbx-solaris11:~$ uname -a SunOS vbx-solaris11 5.11 11.1 i86pc i386 i86pc thomas@vbx-solaris11:~$ /usr/bin/patch --version patch 2.5.9 Copyright (C) 1988 Larry Wall Copyright (C) 2003 Free Software Foundation, Inc. This program comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of this program under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. written by Larry Wall and Paul Eggert thomas@vbx-solaris11:~$
That version was published in mid-2004 according to these comments:
Not all progress is “forward” — some prefer to reinvent the past.
Consider Jörg Schilling's version, whose announcement in April 2011 asserted
- A new program was added: patch This is based on the last patch(1) implementation from Larry Wall and in contrary to the GNU fork from the same software, it tries to be closer to the POSIX standard requirements.
None of that is true:
Schilling copied 12u5 (calling it 12u10, implying that it is the successor to 12u9), rather than the last implementation from Larry Wall. That ignores a year's work by MacKenzie and others.
The term “fork” might be applicable if there had been an alternative line of development. There was none. GNU initiated the development of both BSD and “GNU” patch programs.
Whether it "tries to be closer to the POSIX standard" is misleading, and unsubstantiated. The statement bears more on the vendors who have been certified (with versions derived from 2.0.12, the last implementation from Larry Wall), rather than at GNU patch,
Here is a summary of the initial version (perhaps an afternoon's work):
README
file.POSIXLY_CORRECT
environment
variable, adding checks for this as well as whether the program
was run as “opatch
” to decide what
options are permitted.long
to
off_t
. This affected 10 lines of code, and is the
"large file support" which he cited in the manual page as his
specific contribution:
AUTHORS Larry Wall wrote the original version of patch. Wayne Davison added unidiff support. Joerg Schilling added modern portability code, large file support and code to support POSIX compliance. December 3, 1990 PATCH(1)
POSIXLY_CORRECT
and stdarg changes.
indent
.#include
's to make
the program rely upon his library rather than standard C.Here is a diffstat of the initial version:
ChangeLog | 100 ----
Configure | 1425 -----------------------------------------------------------
EXTERN.h | 15
INTERN.h | 15
MANIFEST | 24
Makedist | 11
Makefile | 28 +
Makefile.SH | 102 ----
Makefile.man | 18
README | 83 ---
common.h | 169 !!!!=
config.H | 33 -
config.h.SH | 136 -----
inp.c | 322 !!!!!!!!!!===
inp.h | 18
malloc.c | 467 -------------------
patch.1 | 558 +++++++++++++++++++++++
patch.c | 990 +++++!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!=====
patch.man | 472 -------------------
patchlevel.h | 1
pch.c | 1394 +++-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!===========
pch.h | 36 !
util.c | 416 ++!!!!!!!!!!!===
util.h | 74 --
version.c | 28 -
version.h | 9
26 files changed, 878 insertions(+), 3025 deletions(-), 2486 modifications(!), 555 unchanged lines(=)
Here is an amended diffstat ignoring (most of) the whitespace changes, as well as the removed files:
Makefile | 28 +
Makefile.man | 18
common.h | 166 !!!!=
inp.c | 363 !!===========
inp.h | 17
patch.1 | 558 +++=================
patch.c | 1160 +++++++--!!!!!!===========================
patchlevel.h | 1
pch.c | 1575 ++-!!!!====================================================
pch.h | 36 !
util.c | 471 +!!!============
util.h | 74 -!
12 files changed, 453 insertions(+), 175 deletions(-), 636 modifications(!), 3203 unchanged lines(=)
Thereafter, Schilling copied selected features from GNU patch.
However, he has not noticed all of the fixes which apply to his
version. The next version, 12u6,
included a bug-fix for fetchname
which is still not
present in any of Schilling's updates as of March 2017:
Mon Mar 16 14:10:42 1992 David J. MacKenzie (djm@wookumz.gnu.ai.mit.edu) * patchlevel.h: PATCHLEVEL 12u6. Sat Mar 14 13:13:29 1992 David J. MacKenzie (djm at frob.eng.umd.edu) ... * util.c (fetchname): Test of stat return value was backward. From csss@scheme.cs.ubc.ca.
There are other changes, but they are mostly pointless, cosmetic changes. Such is authorship.
Ubuntu #209537
introduced a misfeature. Briefly, it checks if a
COLUMNS
environment variable is set, and uses
whatever value atoi
decodes to override the default
of 80 columns for the report width. My advice was overruled (the
bug report offers a disingenuous reason—see
this for the context in which the remarks were made).
There is more than one reason why that is not a suitable change:
The change modifies existing behavior—silently.
The change is redundant (the "-w
option
already provides the desired functionality). Assuming that
COLUMNS
were set reliably to a useful value, one
could do
I noted this in my initial response on the topic.diffstat -w$COLUMNS
The change does no error-checking. If that variable happens to be set (even to an empty string) then it will use that value.
The patch hardcodes STDOUT
as a variable in
the main function, rather than using
STDOUT_FILENO
(POSIX).
The COLUMNS
and LINES
environment variables are set by only a few applications
(resize being one),
and bash being another. A third does not
come to mind (certainly not another shell).
The resize program's environment variables are generally discarded (not applied to the shell); it is useful for making system calls to tell the the computer the actual size of the terminal window. On the other hand, bash does set the variables.
In some configurations (Debian), bash
sets a shell variable (which is not exported to
subprocesses). In others (apparently the case with OpenSuSE),
bash exports environment variables. This is
without the complication of scripts which do an
export
of the shell variables.
Some programs use these variables. In ncurses for
instance, this is a standard legacy feature, useful for cases
where the operating system cannot provide the required
information (see use_env,
compare with use_tioctl).
Otherwise it is a nuisance because it interferes with
programs that are able to obtain the screensize without this
crutch (xterm sets the
variable only on a few very old platforms for this reason). A
few other programs which do not use ncurses (such as
ps as noted in Novell #793536)
can be overridden by COLUMNS
.
This behavior of bash's has been seen as a nuisance, e.g., Novell #828877, along with these threads from bug-bash in 2013-01, 2013-07. After reading several reports, e.g., ArchLinux #32821, Debian #628638 (and blogs), a common thread emerges: bash has tied two behaviors together:
As long as the two behaviors (making bash work properly, and telling applications to use that information) are tied together, the feature is going to be a nuisance.
Because there is no relevant standard (for the behavior of
shell programs), users with scripts which happen to set the
variable would be impacted. Debian for instance switched from
bash to dash years ago. The
latter does nothing with COLUMNS
, so that
scripts which work properly with dash would
behave differently on a machine where bash
was preferred.
The change was applied to the Debian package two years later, without discussion immediately after a change of package maintainers, (see Debian #588876). A user pointed out part of the problem with the change in Debian #697696.
In diffstat 1.63, I added a check if the output is to a terminal (versus a pipe or file), and set the default width to match the terminal's width. That eliminated the need for the patches in the Debian package, as seen here.
I changed the copyright notice of diffstat to use MIT-X11 licensing at the beginning of 1998 (version 1.26). Before that, I had used the same wording as I did in other works distributed from 1994 onward, e.g., the resizeterm patch. The reason for this change was likely prompted by my work to relicense ncurses, but also taking into account an old (October 1996) discussion with Joey Hess.
The license is (of course) given in full as a comment at the top of the files which comprise the program. Nothwithstanding this, some packagers find it inconvenient to cite the license properly. Here are a few examples:
A bug report for Haiku in 2010 commented that the packager had trouble finding the license, and referred to it as “DEC”, apparently unfamiliar with MIT-X11. A followup patch for the package script referred to it as the “diffstat” license.
On the other hand, OpenPkg labeled luit (also MIT-X11) as GPL. I notified them about that (having seen the 20130217 package), as well as misattribution in their pdcurses package in February 2014 without receiving a response.
Next, (going down the scale), there are instances where
the packager labels it “gpl-like” in the license
field.
That is analogous to a pet-shop owner who puts a sign saying
“dog-like” in front of a feline. Some people
might object.
SOT Linux did that with diffstat 1.28 in 2003.
Possibly related, Mageia did this with diffstat 1.57 as of 2014. I notified them in February 2014, received no response. Finally, in July 2015, a different developer reported that the problem was addressed with diffstat 1.60 (on updating this page July 26, 2015, the released-packages page still shows 1.59 with “GPL-like” — the correction shows up in the "cauldron" page).
Version control systems which have implemented diffstat's include
Some are slower:
A few tools extend one or more of the version control systems, enabling their diffstat features to be used via the tool:
Besides imitating diffstat, there are embedded uses of the original tool: