http://invisible-island.net/personal/
Copyright © 2009–2020,2023 by Thomas E. Dickey
This is an overview of the guidelines which I use in maintaining change-logs and similar information for computer programs.
One of the things that the maintainer does (or used to) is to keep the change-log up-to-date. Though I have been developing software for some time, it wasn't until 1992 that a combination of circumstances (declining in-house development opportunities, and the Internet) prompted me to provide fixes for "free" software.
By 1994, I had contributed changes to about 65 programs. In that process, I had of course encountered various personalities. But the worst of those were simply slow to incorporate the changes.
Starting in 1994, I arranged to have the programs which I had
been developing for my personal use excluded from my employee
agreement. These included ded (the
motivation for the resizeterm
function), vile and tin as well as related programs. One of the
related programs was ncurses.
The case with ncurses was ... different. Rather than a single developer, there were two. And they used a mailing list, unlike most. The nominal maintainer was Zeyd Ben-Halim, who was rather nonresponsive. The result of submitting patches was not good—it seems that they intended to copyright everything for themselves. That's a workable situation if they wrote everything themselves. They did not.
For instance, they incorporated Juergen Pfeifer's libraries in 1995, which greatly increased the size of ncurses. After incorporating it added 11,183 lines of code (pcurses was just under 10,000 lines of code before it became ncurses). In 1.9.7a, Juergen's name appeared in 3 places in those libraries (two pro-forma README's and one comment in a makefile noting that optimization did not work properly). Zeyd and Eric's copyright notice appeared in the same files 36 places.
The NEWS file notes:
* integrated Juergen Pfeifer's forms library. * integrated Juergen Pfeifer's menu code into the distribution.
I noticed that patches were sent to the mailing list (including my own) and that the NEWS file would include the change, but not mention the contributor. My name appears in the NEWS file twice, as well, for that time period, though—as I pointed out later—I had done about half of the work. Not all of my changes were mentioned, and most of them were unattributed. The casual reader would assume that Eric and Zeyd did almost all of the work.
Zeyd, being the nominal maintainer, appears to have done most of the edits to NEWS. However ESR also sent changes to the mailing list incorporating changes from others without mentioning this in his announcements.
After I stopped sending patches to Eric and Zeyd in April 1996 (and providing ncurses, myself), I resolved to maintain the NEWS file with attribution for each contributor. That's the way we were doing it in vile and tin, for example. Philippe De Muyter suggested that I also note who reported the problem to be fixed as well. I did that. I began doing that a few weeks later, in late April.
Of course you're keeping your project in some type of revision control system. You can extract that information with various tools and render it as a change-log. Any idiot can do that.
Unfortunately, many change-logs are automatically generated, and indeed appear to have been generated by "any idiot".
What is missing in many automatically-generated change-logs is the information which is typically not supplied by developers:
One advantage of automatically-generated change-logs is that it is possible to get the dates on which changes were made. Not all automatically-generated logs show this, but it is a strong possibility.
Whether or not the change-logs are automatically-generated, there is an additional problem if changes are collected and applied by a project maintainer—recording the contributors consistently.
Change-logs should have dates, to establish when a change was made.
Developers who do not supply dates on their changelogs have been known to “fix” problems with a release without noting the fact. Besides that nuisance, developers who omit dates tend to be sloppy about facts in other ways.
There are of course changes by primary contributors.
The patch is usable without rework required.
Often, for conciseness, the "patch by" is left out and only the name of the contributor given. They are equivalent.
As a rule, if I am applying a contributor's patch which (aside from formatting details) works properly, I use the rcs "-w" option to mark that revision as originating from that person. It is rare that patches good enough for this come from completely anonymous developers, so an appropriate string is seldom lacking.
Most patches require rework or adaptation.
The patch requires work, e.g,. it is not ifdef'd as required for all optional features.
The patch has some logic flaw, requires modification to build and work.
Someone told how to go about fixing the problem, or else they provided a detailed enough report that the solution was apparent to the developer. This may/may not be the same person who reported the problem.
A discussion with someone brought out an idea, but it is unclear who was the source.
Talking to someone prompted me to realize a bug or solution. Without their input, the idea/fix would not have been apparent.
Occasionally their report and discussion is completely incorrect, but the "prompt" was useful. This does not apply to hostile or untruthful contributors of course.
In some cases, someone provides a suggested patch, but if it is unsuitable beyond illustrating the problem which was being discussed, then the changelog may read “prompted by patch...” while the actual implementation is different.
Someone reported the problem, but did not provide the solution. That is, most people would not regard these as contributors, but a source of information which has to be investigated. When computing metrics, I do not count these, nor the closely related "prompted by", etc.
These categories are oriented toward direct communication with the program's maintainers. Accounting for indirect contributions is not as straightforward.
There are a few basic problems to address:
Bug-tracking systems are a major source of indirect contributions.
If all of the report is within the bug-tracking system, and there is no analysis by other people, nor proposed (useful) fixes, then I will cite only the bug-tracking system and its number for the bug.
On the other hand, if there are useful direct contributions toward the solution (reports without analysis are indirect), then I will cite those individuals in addition to the bug-tracking information.
A few files (such as config.guess
and
config.sub
) are maintained by other developers. The
changelog for these says "updated", and if the origin is volatile
(the config.* scripts are a good example of this) or relatively
obscure, says where it was found. Read their changelog for
credits.
Bear in mind that I am not a public service.
I get some reports indirectly, via web-searches in various forums. Some of the comments are useful, others partly (because they point out details for an issue). However, it is not uncommon for those to be mixed in with secondhand comments. As is usual with hearsay, much of it is inaccurate, and much of the repetition in public forums is not intended to be constructive commentary.
Still, an occasional comment is useful.
Of course, in this case, I will categorize it as "adapted from", etc., noting that it makes it automatically an indirect contribution rather than a direct contribution.
If the information is from a discussion between different individuals, none of whom appears to be knowledgeable about the issue, I will simply cite the group where the information was given.
People who attempt to use bug-reporting systems as a soapbox fall into this category, of course. For those unfamiliar with the term, this refers to a variety of misbehavior, including:
insisting on raising the criticality of a bug report to attempt to bludgeon the developer into making some proposed change. Because I will not work on a bug report before agreeing on what the problem is, and how important it is, the report is dead right at that moment.
making speechs in the bug-tracking system to the effect that some aspect of the program's design should have been done differently. The speech (might be) ignored, provided that there is a workable patch provided by that individual which addresses the issue without impacting other users.
As a caveat, not all “bug-tracking” systems are equal. Granted, bug-reports are not always welcome. But the bug-tracking system has to be reliable:
The issues-tracking systems provided with github and gitlab (writing this in May, 2019) are not reliable because changes to comments are not visible to others. In some cases, the project maintainers can (and some do) readily delete and modify comments to adjust a story to their advantage.
Anonymous reports are not uncommon. Useful fixes from anonymous people are much less common (see discussion). When considering these, there are several factors to consider:
For the pen names, I cite the actual (or apparent) name.
On occasion I get suggested fixes which are neither from a readily identified person, or fit into the design of whatever program is being discussed. For those, I may adapt the change.
Anonymous or “not” I do not use bug reports containing information from Wikipedia:
In general, we would assume that developers submit their own work. This is not always true.
When reviewing a change, I do take the time to scrutinize it, attempt to determine a proper attribution for the change. It happens that I may notice (or recall, if I am subscribed to a given mailing list) that the change was originally developed by a different individual. In that case, I will amend the description to cite the actual developer. If the code has a comment citing the developer, that suffices, though even that has been a matter of dispute on occasion, when the intermediary insists on sharing the credit.
Individuals who do this repeatedly (there are a few) will either be banned, or subject to scrutiny on every change. In either case, they generally go away and provide their services to a different project. Rather than leave, some of these use the public bug-tracking systems as a forum.
Other forums are problematic. Although the site technically has a policy (which confuses copyright infringement with plagiarism), StackExchange for instance promotes plagiarism, with people copying answers from each other as well as from unspecified sources, just to get points in its “reputation” ranking (some high-ranking individuals have copied their answers from my FAQs or documentation). Still others are known to cheat by various schemes of voting for themselves. The answers are very rarely useful for development, but some questions are essentially bug reports. I cite those according to the other guidelines mentioned above.
Not all of the change-logs are in the same textual format. I wrote a script which handles the most common cases, and have massaged some change-logs to follow the format which it recognizes, to collect information about contributors. Essentially, it reads the text, looking for the markers which I use to denote direct- and indirect-contributions, and gives totals and names for the direct contributions.
For some (lynx and vile) I have not reformatted the older change-logs. In those cases, the dates below correspond to the beginning of the change-logs that I have reformatted.
With vile, I may do this (reformat the logs) sometime, since I have software archives to its beginning in August 1990, and the changelogs identify all contributions.
Lynx is harder, since the changelogs for 2.4 through 2.6 have 5-20 percent of their entries without an identifiable author. Most of the entries in the 2.3 changelog are unattributed. Also, there was no software archive in use until Klaus Weide put one together in 1997, using PRCS.
As in ncurses, an attempt to give statistics for those changelogs would probably be unfair to the contributors whose work was not deemed a major change.
Here is a list (as/of May 2010) of the change-logs for which I have useful metrics, noting the percentage for my own contributions, and the number of other contributors (disregarding "external", since there is no active involvement).
Program | Percent | Other | Date |
---|---|---|---|
diffstat | 81 | 12 | June 1994 |
xterm | 83 | 150 | January 1996 |
ncurses | 76 | 176 | April 1996 |
vttest | 96 | 3 | June 1996 |
lynx | 45 | 136 | February 1997 |
vile | 76 | 36 | November 1999 |
dialog | 78 | 64 | December 1997 |
cdk | 85 | 24 | May 1999 |
byacc | 97 | 4 | February 2002 |
luit | 91 | 0 | August 2006 |
mawk | 73 | 6 | September 2008 |
I use rcs2log
for
a few programs (ded, (byacc, autoconf macros,
etc), which did not have a history of other contributors, and/or
which are very stable.
The number of changes shown by rcs2log
is
different from the conventional change-logs:
In practice, there are many minor changes which would be just clutter in a change-log.
Changes which are adapted or otherwise not accepted as is do not use the contributor's name on the check-in.
Of course, contributors keep their own records, which differ in granularity as well. A good change-log is a good compromise which tells the story.
Here is a more complete table, from May 2017 which shows both
sets of data where applicable (and excluding programs such as
ded
which have no other contributors):
Program | Log started | Manually edited | rcs2log generated | ||||||
---|---|---|---|---|---|---|---|---|---|
Changes | By me | Percent | Others | Changes | By me | Percent | Others | ||
diffstat | 1994/06 | 135 | 101 | 74.8% | 17 | 584 | 566 | 96.9% | 10 |
cproto | 1994/08 | 153 | 139 | 90.8% | 3 | 935 | 889 | 95.1% | 4 |
xterm | 1996/01 | 2271 | 1897 | 83.5% | 185 | 12106 | 11848 | 97.9% | 89 |
ncurses | 1996/04 | 5045 | 3926 | 77.8% | 235 | 19483 | 18365 | 94.3% | 175 |
vttest | 1996/06 | 215 | 203 | 94.4% | 5 | 1338 | 1333 | 99.6% | 3 |
bcpp | 1996/10 | 175 | 163 | 93.1% | 9 | 496 | 495 | 99.8% | 1 |
lynx | 1997/01 | 3565 | 1874 | 52.6% | 135 | 4323 | 4185 | 96.8% | 48 |
dialog | 1997/12 | 830 | 645 | 77.7% | 81 | 4068 | 3974 | 97.7% | 42 |
cdk | 1999/05 | 470 | 372 | 79.1% | 38 | 2334 | 2261 | 96.9% | 24 |
vile | 1999/11 | 2889 | 2717 | 94.0% | 48 | 15140 | 14008 | 92.5% | 51 |
cdk-perl | 2001/01 | 39 | 36 | 92.3% | 2 | 189 | 185 | 97.9% | 3 |
byacc | 2002/02 | N/A | N/A | N/A | N/A | 826 | 737 | 89.2% | 13 |
luit | 2006/08 | 114 | 103 | 90.4% | 2 | 960 | 957 | 99.7% | 3 |
mawk | 2008/09 | 234 | 179 | 76.5% | 10 | 1534 | 1485 | 96.8% | 7 |
There are other ways to measure contributions. Not all of them work as well as inspecting the change-log.
For instance, the Orbiten survey several years ago ignored the change-logs and RCS identifiers in my projects, and credited virtually all of my work to other people. Some of those credited were never contributors. Rather, Orbiten noted the mention of various individuals and organizations in README's and comments, and credited them with the entire work.
Other people have pointed out that Orbiten also did not factor
out programs such as libtool
, which are bundled with
other programs.
Any metric requires inspection and tuning to validate the results. Lacking that step, the metric is worthless.
Unsurprisingly enough, my change-logs cite contributions from people who also maintain change-logs. They do not necessarily reciprocate, e.g., some developers who borrow from my work. I don't work with those people.