http://invisible-island.net/personal/
Copyright © 2016-2020,2022 by Thomas E. Dickey

Git-exports – comments

(top)
Background
Problems
Solution
Process
Other stuff

Background

I have been tracking my projects using RCS since the late 1980s. I used SCCS for a few years before then. Developers who use my programs are accustomed to using patches and tarballs to track changes to these files, using their own procedures.

On the other hand, there are other people who would like to casually browse the source history. Not all of them are developers.

While I was working with XFree86, occasionally someone would ask where they could find a source-history for xterm and I would point to XFree86's CVS. I stopped committing to that in mid-2006. That finally became defunct in 2016 (ten years later).

Providing my own CVS (or whatever) web-accessible respository would take away development time and be expensive. On rare occasions, someone else would offer to do some part of this:

in 2009, Jonathan Nieder set up a git repository for mawk on Debian's server, and discussed with me the possibility of getting my changes for mawk into git.
in 2010, Alejandro R. Sedeño set up a cron job which would collect my weekly ncurses patches, checking those into a git repository.

Git is a little more flexible than tar-balls:

given a git-ball, it is relatively simple (compared to CVS) to make a web-accessible project, for browsing purposes.
the same interface provides the ability to get a tar ball.

Problems

Those were helpful, but did not complete the task. In the case of ncurses, those made web-accessible my weekly patches for ncurses starting with 5.6 (in 2006). During a given week, I make several changes, and the result is the weekly patch. Since most of the information would be omitted, I disregarded that.

Mawk seemed more promising. However, I ran into a few problems:

Initially, I thought that I could replay my changes onto Neider's initial git repository. That effort used a few weeks finding that git's merge capability was not flexible enough to make a reliable/scriptable procedure.
Nieder had suggested rcs-fast-export as a tool.
That script did not provide a complete solution, as I noted in vile #28487.
My RCS wrappers (like CVS) can be used for lazy commits, i.e., no lock is needed for a file. Unlike CVS, I use the file's actual modification time for the identifiers. On checkout, the wrappers construct the identifiers.
rcs-fast-export.rb (a Ruby script) knows nothing about lazy commits. I modified the script to call my RCS wrappers to do the actual checkouts (which slows things down). If I were to replace those wrappers with additional scripting, another month or so would be needed to get the same resulting identifiers.

In case someone questions why I want the identifiers, remind them that git uses its own identifiers, and if those were removed, git would not be useful.
The Ruby script uses a lot of memory, and was written for Ruby 1.9. Using it with Ruby 2.0 did not work well.
Different versions of git did not interoperate. Really. That's a step backward from RCS, where I can work with my archives on machines across a wide range of operating systems and versions.
The Ruby script writes directly to git's internal data structures. Those are undocumented.
The Ruby script handles only RCS archives with no branches. I use RCS branches in several of my projects.
Each export of my RCS archive using the script generates a new set of hash codes, making it impossible to transfer updates to an older export of the same archive.

Because of these problems, the Ruby script (while interesting) was not useful. I could not use it on the larger projects, even for one-shot uses such as I did for mawk:

Solution

In May 2016, someone commented that there were several forks from my byacc tar-balls on github, and that was a problem. Actually, since none of the “forks” had been improved, there was little to discuss since there were no potential changes to merge back. In any case, I would merge into my RCS archives.

But that prompted me to think about just (as in ncurses) constructing a git-ball with the labeled revisions from my RCS archives. As of mid-November 2016, I had (using rcs2log):

18997 commits for ncurses on
2805 days, with
1145 labeled revisions

Sedeño's git repository has about half of those labeled revisions: 48% in mid-November 2016.

I label things when I reach a milestone, whether or not that is when I decide to release a set of changes. By exporting the labeled revisions (rather than a complete archive), the result would still be useful, as well as being practical. But keep in mind that the number of labels is far smaller than the actual number of changes.

In my revised approach, it is possible to do incremental exports from the archive onto a git-ball. Exporting all of the labels in the ncurses archive takes hours; exporting the latest revision takes a minute or so.

Process

I wrote a script release2git to do most of the work, and run that from a script r2g which knows how to manage the git-balls which I create and update. In some projects (such as xterm, the MANIFEST file is generated using the manifest script. That and other special cases are handled by release2git.

Not all of my projects had labels. I wrote another script tag-cutoff to label some of those using a cutoff-date. That worked for most, but not for the older vi-like-emacs archives which I received from Paul Fox in 1996. I labeled that by writing a custom script (like tag-cutoff) and used that to label those files.

I have generated git-balls for all of the projects which I share with others (as well as a few which I do not). You can see the result here:

https://github.com/ThomasDickey

While doing this, I realized that rcs-fast-export.rb would not be suitable, since using its approach none of the exported revisions would be signed. See the pgf-vile-snapshots repository versus the corresponding fast export for an example of this.

Scripts are a different matter. Those all live in common directories, from which I generate the tar-balls that you can download from my scripts page. I suppose that I could generate git-balls for those as well.

When I release changes to one of my projects, I do this:

label it in RCS,
run r2g to generate an updated git-ball (the old one is renamed, keeping a backup copy), and
run push2github to update Github's copy of my git-ball.

The two steps use different keys, just in case.

Just as a reminder: these are snapshots. If someone wants to make a change, that will be done the same way as other projects, by first integrating into the RCS archives, and then exporting, updating the snapshots.

Other stuff

In a few instances, I work on programs which are only in Git:

esctest. This was written by George Nachman, originally bundled with iTerm2. He created a separate repository after incorporating my changes in April 2018. Nachman's version has been largely neglected after he switched from Github to GitLab).

Neither GitHub nor GitLab provides a way to maintain clones or forks from other sites. Some other developers mirror to both sites (as I could, since the repository that I actually use resides on my machines).

Nachman chose not to do that, removing most of the files from the original. I renamed my repository to “esctest2” to avoid damage from an unintended pull from the original.
Xorg (freedesktop.org):
- Xorg libraries which are used by xterm:
  - libX11
  - libXaw
  - libXaw3d
  - libXft
  - libXt
- Xorg applications which are related to those libraries:
  - twm
  - xorgproto
The Xorg files started from a copy of XFree86 in November 2003 (and were updated more than once during 2004 with additional copying from XFree86). All of those revisions were stored in CVS. The Xorg CVS was in turn imported into Git during 2013. Initially, that was hosted on freedesktop.org, where the CVS repository was. As summarized in the wiki, the Xorg developers moved that activity to GitLab in 2018 (but also provide a mirror on GitHub).

Some of my changes, e.g., converting Xt to ANSI C, had been incorporated, while others had not (such as my fixes for rman). While rman was no longer relevant (Xorg started using docbook in 2010), the work on the X libraries was not completed. I made improvements for that starting early in 2019. In a sense, rman was still relevant, since the work by others to convert documentation to docbook was not completed, either.

I cloned the Xorg libxt, and made changes to address both aspects
- modified libXt to work properly with const (making the FAQ How do I build XTerm? obsolete), and
- updated the documentation (fixing numerous errors introduced by the docbook conversion). You can see the result here.

In practice, updating a clone (using Git pull, merge, resolve, push) and applying changes back to the original repository (via a pull request) is just as much work as the process of exporting RCS to Git.