Copyright © 2009-2013,2014 by Thomas E. Dickey
Here is a brief description of some of the more interesting programs that I worked on, long ago. This does not count the programs that I was told to produce—some of those are interesting as well.
The source code is no longer available for these (and as was the practice, was then available only on a limited basis generally with peers). Others have claimed that source-code was just given away in the 1970s and 1980s. On the contrary, all of the vendor-related source code (DEC, Univac, IBM) which I saw was available on a limited basis — non-disclosure. Perhaps others' experience differ, or else they chose to ignore the agreement.
I wrote this in Fortran on an IBM 1620 computer. It was not the first computer program that I wrote, but it was the first interesting one.
This used a 4x4x4 array.
In my initial programming class, we used a 1620 model I, with 12,000 characters of memory. Midway through that year, the college purchased a model II from a neighboring college which purchase an IBM 1130. We continued using the model II through my graduation.
The model II computer had 20,000 characters of memory (making this a low-end system, according to IBM's datasheet). I used a lookup table to store the association between positions and the ways to win, i.e., filling in a line of four positions. The combination of the program to fill the table and the game itself was too large to fit in the available memory. So I split it into two parts:
The first program was generated (about 800 assignments) by a separate program. The game was about 200 statements. So a thousand-line program was too large for that computer.
In context, that meant in the realm of feedback control systems. But first, I needed more experience than I had at that point. So I set out to build software which could be used for constructing block-diagram modules useful for control systems. Like many first attempts, there was a lot of learning involved, both for the technology as well as learning what the conventions were. I did some odd things, before learning how valuable conventional behavior can be:
Perhaps related, at that time I used to go grocery shopping once or twice a week. The cash amounts were small (about $10), and I could walk through the store, adding up the total in my head and come to the register with the correct amount (counting tax). I noticed that I was making errors occasionally, and realized that it was more likely when there was an 8 or a 9 in the result. Those are non-octal, of course.
These were written in assembler on a PDP-11. My department acquired two PDP-11's, funded by research money. Initially they ran CAPS-11 (cassette tapes), but shortly after (early 1975), ran RT-11 (2.5Mb RK05 disks).
I named the cross assembler micro (a twist on macro, the name of DEC's macro assembler), and the simulator simodt (from simulated ODT, DEC's "octal debugging tool"). Combined, they were named simic. Besides making the names like DEC's, but different, I preferred to make the program's names match the names of the executable files. On RT-11 that meant no more than six characters (because filenames were stored in radix-50 format, which packed three characters into 16-bits). My initial choice for the simulation module's name (simula) was unsatisfactory since a peer pointed out that there was a well-known computer language with that name.
The cross assemblers supported macros, conditional assembly, all of the interesting features that DEC's assemblers did. Both were modular, most of the code being in common (about 80% for the simulators and above 95% for the cross-assemblers).
Using this modular approach, I wrote 11 simulators and 14 assemblers. Basically, if I was able to obtain a data sheet for it, an assembler took less than a day. Simulators took 3-4 days to write, since there was less opportunity for reusing functions.
Here is a list of microprocessors from that period for which I have a data sheet or manual:
I did most of this work during summer and fall of 1975. There are a few which have no data sheets because they were not for commercially available microprocessors:
PDP-11's had a 56Kb address space (plus 8Kb for I/O, for 16-bit addresses). That made them cheap research machines. Not all PDP-11's even had 56Kb of core memory (semiconductor memory came later). Programming them mostly in assembler was not unusual.
I redid the macro processing after a while (in 1977 or 1978), when I was working on txt. The initial version stored the parsed data (a series of tokens). However, that proved to be hard to maintain and extend. So I changed this to work purely with character strings. That made it simpler to concatenate names, perform substitutions, etc.
I do not recall what I named this.
Someone in the lab brought in a book with a StarTrek program (think "minesweeper", on a printing terminal, at 30 characters/second). But the program was too large to run on our machine (in 24Kb). The Basic interpreter could load sections of a program at run-time, e.g., like overlays. To exploit this feature (and squeeze the program into roughly half the space), it was necessary to resequence the overlaid chunks so they reused the same line numbers.
I spent a couple of days getting the program to work – probably the first nontrivial Basic program I wrote. Then I spent a half hour or so playing with the StarTrek program. That was not as interesting, so I put StarTrek away.
RT-11 was a single-user system without any frills.
Hierarchical directories came later, in other systems.
RT-11's directory program simply read the directory
blocks and printed their contents. With a half-dozen
people sharing a minicomputer, and a thousand files, it
is hard to keep track. I wrote an improved directory
program dirsrt which made an index to
the directory entries and printed them, sorted by any
combination of name, extension, size, date. It also had a
switch for writing a blank line between groups, e.g.,
DIRSRT /A/X/J would show files sorted
by name within extensions, skipping a line between
As noted, RT-11 had only a flat directory structure. But several researchers shared a few machines, and there was a lot of clutter. I wrote a program which simulated RT-11's directory structure inside a large file.
It was interesting, but did not really solve the problem, since RT-11 did not have a way to create virtual device drivers. I could copy files in/out, and do directory listings; but to use the files, they had to be outside the container.
At the time, an associate commented that the concept was something like interchange-format, a reference to a program which has been obscured by other uses.
Another person said it was like a partitioned dataset.
In either case, I did learn something.
Users of my microprocessor simulators complained that they were missing a useful feature: the ability to do I/O. I designed a simple language to describe ports, data transfer, manipulation of bits (masking, shifting), and called that IDL (for "Interface Description Language").
Some of the bit-manipulation was inspired by the SMS Microcontroller, which addressed memory in bits.
The following year, I gave a short presentation on that at one of the Computer Architecture workshops, though the fact that the interpreter was written in assembly language was a shortcoming.
Larry Tesler happened to be at that meeting, though by then he was interested in things other than pub.
Discussing my presentation with an associate at the time, he commented that there was at that time another “IDL” (a commercial product called “Interactive Data Language” introduced in 1977). The two are unrelated (mine was “Interface Description Language”). On the other hand, my language might have been the first to use the acronym, because I developed it in August 1976 (yes, I am aware of the much later use by Sun in the 1990s—probably unrelated).
There were other system programs – including one which was obsolete within a day after I wrote it. In our lab we had no line printer – just printing terminals. The RT-11 text editor (same syntax as teco, but simpler) could be made to repeat commands to read a buffer, add extra lines, delete those past line 58, print the page, delete the page from memory, etc., to print a listing file. The command to do this was most of a line, and annoying to type in. So I wrote a program to do this. While I was at it, I considered it a waste of paper to print only 58 lines (on a 66-line page), so I made the program smart enough to recognize the header lines on a listing and move them, making the real page size larger, e.g., 62 lines.
The lab manager (Gary Leive) found that solution unappealing, and (having the sources for the system), made the terminal driver recognize form-feeds. Just a small change was needed. I discarded the repaginator.
During much of 1975-1976, I sublet another researcher's PDP-10 account to write manuals for simic. That featured teco, sos – and pub (a text formatter written by Larry Tesler at Stanford AI Lab around 1973). That came to an abrupt end (it was an informal agreement), and I rescued my files.
But it was a year spent learning to appreciate a nice (if not entirely efficient or consistent) tool. So I walked back to my lab and spent 15 hours writing the first thousand lines of txt. While pub had libraries to format for a wide variety of output devices, txt would "only" go to printing terminals. But it was a workable clone of txt, recognizing all of the formatting controls, implementing macros, footnotes, headers, floating sections, table of contents, etc. Five years later, at ten thousand lines (of assembler, of course), I was done with txt.
At the time, the term "word processing" was not used much. I first encountered it a couple of years later, and was confused at first, supposing it to be related to natural language processing. Instead, we referred to it as "text compiling", which was misleading. (I once overheard a comment about that fellow over in the other building writing compilers, and realized it was about me).
In 1976, I started writing a program for my degree which would compute a complexity metric based on a description of a computer's instruction set. Unlike the previous work I'd done as a graduate student, this would be for publication (or review, whatever – I did not know how the process would work at that point). On the minicomputer, I had only three languages to choose from: assembler, Basic or Fortran. The first two were unsuitable for this purpose (perhaps all three, in retrospect). I started writing my program in Fortran.
The minicomputer was small (with more memory, then
32kb), and I found that even a moderate-sized Fortran
program would exceed its memory. This was DEC's RT-11
Fortran, which had a few extensions over Fortran 66, The solution was to split the
program into overlay sections, which would share memory
COMMON-blocks. Soon, I had more than a
dozen overlay sections.
Here, I ran into a new problem: splitting the program
up into pieces like that made it less maintainable. Each
part of the program would have its own copy of the
COMMON-blocks. I spent three days isolating
a problem due to an editing error which made one of the
parts of the program have a different array size.
However, in my day job (I had one by then, since student grants do not last forever), I had encountered a new variety of Fortran, on Univac 1100.
Univac Fortran had many of the features which later
appeared in Fortran 77. Later,
Univac's Fortran 77 compiler was known as ASCII Fortran.
Of special interest were
parameter statements. Fortran 77 added the
At the end of 1976, I started writing a Fortran preprocessor bfor, which had these features:
Parsing Fortran properly is difficult: it uses statements rather than tokens. Blanks in the middle of a statement (other than strings, e.g., for formatted I/O) are ignored. These two lines are equivalent:
DOI=10,20 DO I=10,20
Unless you read up to the comma, the first line looks
like an assignment statement. But it is not; it is the
first statement of a
These two lines are also equivalent:
END E N D
I wrote bfor in assembler, of course, to fit in the small machine.
bfor, txt and simic were large chunks of what I was developing. The end-application written in Fortran was larger than any of these.
Just to try to keep track of what I was developing, I wrote a metrics program which showed a report of what was on the disk. The report was broken down by file-type (one or more suffixes per group), and showed the number of files, total file-size and number of lines. Initially I called it lincnt (a six-character name meaning "line count"). Later I renamed it systat (system statistics), to reflect my objective of managing all of the files on my disk.
Later, when I began programming in C (see spasm), I made a simplified version and reused the name lincnt for that.
At work, the computer center had a Calcomp. plotter. Paper was fed from a long roll; the plotter's pen could go on all directions. But the width of the paper was only a couple of feet.
I used it to plot data for a computer simulation of a three-phase. power system. Divided into three parts (plus margins), that sounds like enough room for ample resolution. However, the Calcomp plotting software's axis function was limited. The axis annotation showed numbers and tic-marks. However, it had no provision for scaling the range shown to make the best use of the space available.
I designed my own axis function which optimized the display of tic-marks and numbers. Each axis would end with a tic-mark. Depending on the range, there might be large tic-marks (1,2,5,10,...) or small tic-marks (1,2,3,4,5,...). The numbers would appear on the large tic-marks. There were also special cases where those rules did not apply, just to make the result look better.
In practice, this was much harder than it sounds, since there is roundoff to consider (5 decimal digits was "good enough") and many special cases. The function was about 300 lines of Fortran, and took many hours to perfect.
My work on the complexity metric hit a snag: I ran into a mysterious problem with the Fortran compiler. There was no high-level debugger; the only useful tool was to write trace files. However, that gave no insight.
DEC's Fortran compiler generated threaded code. I "knew" that from reading the description. But there was no explanation of what that meant, and the compiler listings just showed something that suggested what the compiled code did. However, something was wrong with it.
The previous year, I'd spent one night disassembling a chunk of the macro assembler, to determine how to patch it to allow comments to be printed in mixed-case. That was manually, starting with an octal dump (in a text-file) and splitting, substituting pieces. I did get to my goal, but found it was hard.
Looking at the amount of code to analyze, I decided to
write a disassembler to do the steps I'd done manually.
That took about three weeks to write. It sounds like a
long time, but it paid off in the first session that I
used it. The problem was that the compiler omitted one of
the instructions if it happened to be compiling an
assignment between variables at the beginning of a
Seeing that meant that I had to understand what the threaded code meant, and how it was used. There was a substantial amount of code in the thread interpreter to deal with. Just that part was a few thousand instructions.
My PDP-11 disassembler was interesting because it allowed for the machine's variable-length instruction set, and a mixture of inline data and instructions. That's essentially what the threaded code was.
Most disassemblers that I've seen are not interactive. I could assign symbols to memory addresses, and choose how to display the current location, stepping through a sequence until there was a branch instruction. On a branch (or jump), dis could jump to the branch destination, or fall-through to the next instruction. It also had a go-back command.
Incidentally, the last time I used dis was in 1980, to determine how to apply the form-feed modification to a MINC-11 system, driving a Daisy-wheel printer.
With the Calcomp plotter constantly present, something jogged my mind, and I realized that mathematical formulae (as printed in books) were naturally block-structured. Both the final form and a description of them could be expressed as nested blocks.
I designed and implemented an interpreter (in Fortran...) which could read a description, and made the Calcomp plotter show radicals, integrals, summations, fractions, Greek letters (I made my own font, using graph paper).
All very nice, but a digression from power electronics.
This was a different topic that I got interested in then, and start writing programs to collect.
One of the pitfalls in modifying the plotting portion of the simulation that I worked on, was if I misjudged the size of the plot. If this happened, then the pen would run up against the side of the drum, and (not allowing for the motion which was not successful when it returned, the plot would be drawn in the wrong place. Seeing the original error was hard, since the result was very confusing.
I decided to fix this by changing the Calcomp plotter.
The computer center had the source for the Calcomp library. But it was provided under a non-disclosure agreement, and the person in charge did not permit me to view it.
Talking to Henry Bowlden, he explained that I could use the Univac linker to rename entrypoints in an object file, and use that to intercept calls to the Calcomp library.
I used that approach to write a wrapper for the Calcomp plotting library, to analyze the parameters of the calls which it made to raise, lower, move the pen.
Then, I devised a replacement for the plotting function, which could limit the coordinates of the pen as it moved around. A few months later, one of my associates pointed out that this sort of thing was already done, e.g., showing me a book by Sutherland which had a simpler algorithm.
Meanwhile, my research project was progressing. But I ran into a serious problem: the computer that I was using was beginning to fail. Even "small" computers (then...) required a cool environment, to get rid of waste heat. This was running in a room around 60 degrees Fahrenheit, rather uncomfortable for us humans. Anytime the temperature got comfortable, the machine started to malfunction. Finally, its power supply broke. The department that owned it didn't have funds (and didn't want my help). So I was forced to move.
There was another machine available, in a different department. However, it was not as capable as the one I'd been using. The main deficiency was that it lacked the extended instruction set (e.g., shifting, and floating point operations). The Fortran compiler on the broken computer used those instructions. The Fortran compiler on the working computer was an older revision, with too many bugs to be useful.
I decided to move the better compiler to the working machine by writing a trap emulator for the extended instruction set. Working at my desk at lunchtime, I wrote this in about three weeks. That was slightly over a thousand lines of PDP-11 assembler, and it worked properly when I finished transcribing my notes into the computer. (I did spend some time making a test driver as well).
I spent part of that year working on a project to control a power inverter with an 8086-based single-board computer. This was actually an embedded application: I wrote a monitoring application (displaying on an ADM-3A) to show its status information. The monitor, which ran on the development system, used shared memory to obtain results, polling for keyboard input (to accept commands to show different information). To get reasonably fast updates, I used cursor-addressing.
But the controller was only part of it.
The development system which I used was running ISIS-II, and provided a compiler for PL/M as well as both a native (8080) and cross-assembler. That sounded good at the time.
However (just like RT-11) that's like moving into a house which has a stove and refrigerator and a couple of chairs. If you want it nice, you'll have to add some parts.
I wrote 18 utility programs for that system, of which all but 1-2 were in PL/M. One, for instance, was a program to list (floppy) disk directories sorted by name. The ISIS-II directory utility would only list files in the order they were stored on the disk. With a few dozen files, this was not easy to read the listing. So I wrote a program that would read the directory, sort it, and show the result.
You're thinking: so what?
Well, those computers were slow. It took twenty seconds before my program started showing the directory. The ISIS-II program started almost at once. However, even with that head start, my program finished showing the directory first (something like 35 seconds total, compared to 40 seconds).
Before those PL/M programs, I wrote my longest 8080 assembly-language program: a text-editor. The motivation for this was that the ISIS-II had two choices for a text editor:
At the beginning of that project, my organization was being reorganized, and I had no direction (sic) for almost a month.
On the other hand, I had an idea, from the style of assembly language programming that I was doing in my research project. I'd found a useful way to embed a jumptable inline in the program. It made decoding and parsing much simpler. That worked nicely with the PDP-11, whose registers could be used as index registers. I'd also experimented with making the skeleton of a text editor in my simulators for the 8080. So I wrote a text editor for the 8080, modeling it after the editor for RT-11 (which in turn was modeled after TECO).
The initial version of the editor was 4Kb, which left most of the 40Kb memory free for holding the file. (This was smaller than Intel's dumb editor, which I assumed must be written in PL/M). Like the PDP-11 editor, mine stored a "page" (up to a form-feed). Eventually, the program grew near 8Kb (still much smaller than CREDIT, which used 32Kb).
I improved on the design, by adding the ability to see some of the editor's state. For instance, a "v" command would show a "^" under the current cursor position. Another command would show the contents of an internal buffer (used for copying text from one point to another).
About twelve years later, I became involved in developing vi like emacs, and set about extending that.
The research center where I worked then was much like any other company. They spent money, and were always looking for sources of revenue. One of the hurdles we had on a yearly basis was a quota of patent disclosures for each department. These were essentially material from which the legal department could select and decide which to use for filing patents.
One of my coworkers pointed to some of the features of the programs which I'd written for the controller and suggested those would be suitable grist for this mill. In particular, there was a feature that I'd developed to achieve a performance goal:
I came up with a table-scheme which I said was like a fireman's pole, visualizing someone running to the pole, and sliding down (with some horizontal oscillation). By jumping into the table at a given point (computed with fixed timing cost, using the difference between the desired and actual time as a factor), the program would exit from the table at a precise time.
Of course this was software. For a patent, the disclosure had to recast the description as implemented by a machine. I submitted a description for this. (Later, I found that my coworker had also submitted a disclosure for the same "invention").
Actually, I thought the approach was reasonably obvious, and probably not novel enough for a patent. I did not hear anything more about that.
The reason why this is interesting is that it is essentially the same thing later used in C, called Duff's device. According to some sources, there was prior art. My interrupt handler was an example of that.
txt for my graduate research. At
that time, I was also using the Univac system at the
research center for producing memos and reports. After I
had been there about a year, my manager commented that
the way to get ahead there was to write reports. They
need not be long. In practice, memos would work just as
well. The main difference between the two was that memos
would be up to ten pages (typically 5), while reports
would be longer. I took the hint, and wrote memos and
reports. By the time I left the research center, I had
written 45 memos and reports – more than one a
The completion of my research (and need to publish the results) came around the time that I became involved with DPS. I wrote programs for the Univac system and the PDP-11 (actually MINC-11 at that time) to solve these problems:
docprogram on the Univac to make working drafts (on the line printer). After that (probably 1979) I found there were a few high-quality printing terminals, which I could use to print the final version – except for the cover sheet.
txtwas not complete. I needed a program to make nice printouts (no DECwriter dot-matrix stuff). Nice printouts were available using AJ-832 (Anderson Jacobsen) daisy wheel terminals, using individual sheets, hand-fed.
That was the initial plan: to make a program that would wait for a carriage return, then print a page (down to the next form feed), and wait for a carriage return.
The cover sheet (a requirement for memos and reports) was a special problem, because the document number had to be printed on the very top of the page. When printing from the Univac, I was unable to print on that line because I had to enter a carriage return to initiate the printing. So I usually relied on the department secretary for cover sheets (but typed my own during her absence).
Almost immediately, I found that there were problems
printing underlined text. Like nroff,
printed underlined and bold text using overstriking.
overstriking). On the systems
that I used, however:
Solving the first problem took some investigation. I consulted with Henry Bowlden, who pointed out that the APL interpreter was capable of printing overstruck text wider than 80 columns. But he (having access to the Univac source) discovered that Univac handled this by setting a global variable in the terminal driver rather than making the feature usable by ordinary programs. Ultimately, he suggested a solution: use an escape sequence which would stop output to the AJ-832's just before the operating system sent a carriage return. I wrote a program to do this. That worked, but was not very interesting, since it was useful only for the AJ-832.
The second problem was interesting. By altering the output to reduce the number of changes of direction, I could make the AJ-832 print much faster.
txt could produce multiple
levels of overstriking, corresponding to bold, bolder,
etc. On the PDP-11, I wrote a program
“unover” to collect each line with its
overstrikes into a matrix. The program minimized the
number of switches between forward and backward movement
of the daisy wheel carriage by changing sequences such
The program col (which
seems to have been written later), does something like
col, my program also handled
the multiple levels of overstrikes emitted by
txt. On finding an overstruck segment, it
would recur looking for overstrikes on top of that. (I
have a mental image of the program zig-zagging up and
down hills which is hard to depict in HTML).
While I was the only user for the PDP-11 program, the Univac program was useful to other people. I wrote a memo describing it (most of my memos described useful or innovative programs which I had written). While my memo was still in review, one of the computer center staff wrote a guide to using the program, omitting my role. After some discussion, he sent out a correction to the user guide—as a one-line strip of paper citing me as the author.
At the time, I was working on a project which kept me waiting for output for long periods of time (as long as two or three days). The computing center, to allow themselves to use the program, made it free to use for everyone in the research center (an overhead cost). Given that my project's computing bill was comparable to my salary, this was a good deal.
I investigated DPS, and quickly found that it was much more powerful than Univac's bundled document formatter. Among other nice features, it had associative arrays which I used to make macros to generate a keyword index for the research reports that I was writing. I also wrote macros to generate table contents, list of tables, footnotes, etc. DPS had a feature which was like a queue (or FIFO). I used that to store footnotes until they were needed.
At the time, I was also working on a fairly long report (for a different project).
There was no way to make pictures. My reports had pictures on about 10% of the pages. But I wrote macros which printed the captions on the page. One picture which had two parts, e.g., 2.5a and 2.5b, came out on two successive pages on my first version. This incited a loud argument with an associate who said that it was one picture. I pointed out that it was two parts and that I would change it, but found that simply walking away and changing the macro made the argument go away-—like too many arguments no amount of polite discussion had any effect on my associate.
I finished the report, and moved to a different project. While it was in review, it came back for a minor correction and was given to another person who took it to a secretary to apply scissors and glue to modify the first page. We've come a long way since that point, technically speaking.
Our system used CMS for text-editing and printing, as
a front end for SPF.
SPF had its own text editor. At the time, I
the advantage of
SPFwas that one could learn everything about it in one day;
the disadvantage of SPF was that one could learn everything about it in one day.
XEDITwas interesting because it was not part of SPF. It was less structured, and could be used for ad hoc editing.
XEDIT was programmable. This was
interesting, since the IBM 3270 terminals were awkward to
I wrote some macros to do the shifting:
Editing comments was cumbersome in
We used boxed comments, e.g.,
/************************* * * * This is a comment box * * * *************************/
SPF had a text-formatting command which could wrap long lines. But there were problems with it:
After trying the feature, finding that occasional
errors wiped out hours of work, I went back to
XEDIT to make a macro:
XEDIT had no text formatter. I wrote a
macro to do this, again using the cursor position as a
hint to the macro, telling it the starting line. Later, I
rewrote this into an assembler module using
XEDIT's feature for loading small programs
to process text in a FIFO. Incidentally, CMS (which
supported the FIFO/LIFO feature) was developed before
Unix. You can think of it as Unix pipes, however. But the
LIFO aspect is something extra.
By the time I was done, I had several dozen editing
macros. All of the
XEDIT macros could be
assigned to a function key. But the 3270 terminals had
(only) 12 function keys. I solved that problem by making
a macro to page through the sets of useful macros.
Aside from a handful of modules written in assembler,
all of these macros were written in
When I started writing these, we had both EXEC and
EXEC2 was nicer –
initially because it could hold 255-character words
compared to EXEC's 8-character tokens – and about
the same level as
What made it interesting is that
has only integers and strings. Variables held strings,
and could be used in integer calculations.
Like the curses-based add, this was screen-oriented. But it only displayed the last four lines entered. Previous lines could not be edited, but could be cancelled.
By the way, the calculator was used for adding expenses. It displayed the results not as an integer (the underlying representation), but in dollars and cents. I used EXEC2's string operations for doing the conversion.
I took over maintenance of a set of development
utilities written (or acquired) by an associate who moved
to a different division. Our development environment used
IBM VM/SP CMS for editing, and MVS/TSO for batch
compiles. There were no add-on tools for the front-end
CMS; it had
XEDIT – and assembler.
One of the utility programs was just an executable which the other developer happened to have: prtrcvr (print-receiver). This was originally from one of the IBM user group tapes, and those were in the domain of the system support staff. I wanted to make some improvement to the utility – and the keeper of the tape denied the request.
Recalling the PDP-11 disassembler, I made a hexadecimal (text) dump of the utility and started examining it. Then I wrote some macros for XEDIT to select the text at the current position and (using another new program), lookup and disassemble the file, replacing the hexadecimal strings with assembler mnemonics. Piece by piece, I disassembled the 4Kb of the utility, made my improvements.
I wrote this early in 1983. Someone told me about
superc (super compare) program which had
just been installed on the VM/CMS system. One of the
interesting features of
superc was that it
could be told to ignore specific columns of the files
which were compared. I used that feature to make a
preprocessor for it which parsed CHILL program text into
tokens with line- and column-values, and a postprocessor
which reassembled the tokens into a listing of the
differences between two program sources. That let it
ignore whitespace changes (including split lines).
The current documentation for
super compare states that it can now do word
comparisons, which sounds like free-format. (It also has
parsers for various comment-styles). At the time that I
wrote my program, these features were unavailable in
superc, and were novel to my associates.
In mid-1983, there was an internal corporate symposium. I sent abstracts for this and a few other programs. The reviewer rejected this one, stating that it had been done before. I have not found an earlier free-format compare. There are later, more well-known examples (spiff in 1988 and wdiff in 1992).
As an aside, I did present at the symposium. It was a
discussion of usage monitoring that I had done for a
script which made the
I had modified the script to send a copy of command-lines
and error/success codes to another account, so that I
could collect frequency statistics. While I was
maintaining that account, one of my coworkers was having
trouble remembering to supply the "(" which delimited
options from parameters on the CMS command-line. Seeing
that he was repeating the same command many times, I sent
a message back from the monitoring account telling him to
add the "(". Shortly after, he came by my desk, telling
me "That's some program you have there".
The project lead had done some initial investigation (perhaps motivated by E. Skordalakis' recent paper in IEEE Micro) but had found none which appeared satisfactory.
Given that vague need, and my previous work on assemblers, I proposed to construct a program that would handle whatever this group came up with for an instruction set. The program read a description of the instruction set, and interpreted its input according to the description. Nowadays, one might use a tool such as yacc to do this.
I made descriptions and testcases for a half-dozen computer types. The program worked well enough, but ran about ten times more slowly than I had expected.
Like "freec", there was a given acronym to explain it:
system of programs for assembler generation.
One of my co-workers suggested that "asmgen" would be more apt, but no one had a nicer name.
Another co-worker suggested that rather than attempting to construct a parse-tree for different assembler syntax, that I should just use Lisp syntax. He pointed to someone's dissertation as a starting point, in which the author had used Lisp as an assembly language.
This was the first large program (about 20,000 SLOCs, 4 months of work) which I wrote in C. I started writing it on a VMS system with DEC C version 1.0, then updated to version 2.0. After the program was working, I ported it to the department's BSD VAX system.
The group in which I was working had other interests besides software development; they spent their time writing emails discussing a better "design system" (whether for software or some other topic was never elucidated).
When I heard about that, I discussed it with some of the people who used the printer. It seemed simple (just a few letters). But the available fonts were wrong. The Imagen had only Times-Roman available. The logo used Univers 57, which was noticeably different (a sans-serif font). There were no free fonts available; Univers 57 would have been expensive. However, the Imagen printer could accept a font defined by a text file.
It seemed simple. And my working terminal was the BBN Bitgraph, which was essentially a vt100 with graphical capability. I started by writing a program which could read an Imagen font file, and display its glyphs using curses, e.g., as an array of asterisks and dots. Then I improved it by using the Bitgraph's escape sequences to show the actual character (as bits, of course). Curses does not do graphics, but with some care, it is possible to modify the display in an "unused" area of the screen. After that, I transformed the display-only program into an editor, with straight lines, circle, fill-operation, etc.
Drawing a font with a tool like that is time-consuming. Rather than draw just the characters I needed, I drew the entire font, as Univers 58 (the unitalicized form), and then added a translation feature to the font-editor which would slant the letters.
The task was not complete until I met with the local graphic arts person (the head of the print shop), and got him to agree that the printed logos looked good enough.
deroff, I wrote a filter for
Scribe, to strip its markup. Someone did comment that
Scribe could write to dumb devices, but removing markup
was a different thing, showing text that Scribe might
diredon a BSD 4.2 system. I modified it, and found that it was not only useful (I had been using
fliston IBM VM/CMS for the previous two years), but put the idea into my mind that I could write one too. In the spring of 1984, I was drafted for a new department, focusing on CAD software development. The new department used VMS, Prime and Apollo workstations.
I did most of my work on an Apollo DN300; the VMS and PrimeOS systems were where the larger CAD programs might run.
It was relatively simple to render the lines using the Imagen's graphics, but wasted a lot of paper (since the users wanted to select an area on the screen and print that).
I solved the problem by making a graphics program (on the Apollo workstation) which would display the file to be sent to the printer. It allowed the user to zoom in and out (by factors of two) and pan.
Thinking that the raster display would act something like grayscale, I made an elaborate scaling algorithm based on that assumption. But I found that it was not as legible for scaled-down pictures as just filling in a cell where any pixel in the original was filled in.
Scaling by factors of two sounds unambitious. However, my users were very demanding, and would complain about lines misplaced by a single pixel. Numerical rounding was forbidden. In the layout editor, I devised a scaling algorithm which avoided rounding errors, by restricting the possible viewing sizes. In turn, that provoked a lot of discussion.
There was a simple solution (about 3 days of work, 2700 lines of code). I wrote a preprocessor (pronounced "foo-tran") in Fortran which would perform the include feature. What was interesting about it was that it did not use an extra file for the included text, but commented/uncommented the included text. Since an included "file" might appear in more than one location, futran would check, report and repair inconsistencies. It used the first occurrence of included text as the master.
Oddly, that short program gained more recognition from my coworkers and management than the work I was doing on flist.
Contrasted with the SVr2 development systems we used for our application, it was rather elegant. I used the feature to store frequently-used commands.
I wrote a program that would read a set of key assignments from a data file, and send the appropriate escape sequences to program the function keys and make the status reflect those settings.
Also (because both unshifted and shifted function keys had separate definitions), I stored frequently-used passwords on the shifted keys. We do not do that now (though ssh private keys are not that different). But it was useful at the time.
The disassembler was about a thousand lines of C. What made it interesting was that we had only limited access to the board's memory. Its application programming interface provided a view of only a dozen or so bytes, given a starting address. The starting address had to be word-aligned as well, due to addressing limitations of the models of the 68000 which we used (some were 68010's for instance, others were not).
The original developers were consultants (read as hackers), who were untidy in their habits. Before making serious changes, I preferred to add comments and cleanup the formatting. This phase would supposedly not change the program, but there were no tools which could be used to ensure the absence of change.
The development environment (SVr2) used COFF object
files. Their content would include line-numbering and
other debugging information. I wrote a utility named
ldcmp which read files, and compared the
sections for program and data. Debugging information was
As part of that project, we ported the programs to VMS. One of the problems we encountered was the apparent lack of a sed program. John pointed out that DEC provided an analogous program; I countered by noting that the Gould server had sources would might be adaptible. In studying the sources, I noticed two odd features:
exitbefore the notice. There must have been some subtle problems due to that treatment.
A separate development group had provided graphical widgets, e.g., for line- and bar-charts. Then, as an afterthought, one of their developers had written a widget to display a table of numbers. It was not in the project plan, was not documented, and had not been reviewed.
I needed the table widget for my application. It was about 5,000 lines of C, and needed work to be usable. The developer had used indentation levels of 1 or 2 characters throughout. So it needed some initial cleanup.
I wrote a new version of
ldcmp, for the
a.out format used on our workstations. Using
this tool, I was able to cleanup the widget without
modifying the object code. I have used this approach in
other ways (and on systems with more tolerant object
formats), e.g., in ansification of C
Most of the project used Ada. We developed on VMS (using DEC Ada), and on Sun- and Apollo workstations using Verdix Ada.
Eric Marshall wrote some simple tools (in C of course) to help with productivity. I recall that they did things such as list the package names, list the with'd packages, etc. They were named according to their use, e.g., a.list, to list package names. There were perhaps four of these utilities (he does not recall exactly either), which used the same parser, which he named "Ada PITS":
Ada Programming In The Small
Given that starting point, I extended it by adding five more utility programs and (of course) cleaning up compiler warnings, makefiles, etc.:
This compared Ada source files, ignoring
whitespace. The program simply reported (like the
cmp utility) if the two files were
This counted lines and statements for Ada source files. I made it produce a report much like my C counter.
I wrote this last, not being satisfied with a.compare. This was a more ambitious free-format diff of Ada source files, which attempted (like freec) to show differences ignoring both comments and whitespace.
It did work, but I was not satisfied with the way
the results were shown. Reconstructing the sources
for display in the lex/yacc configuration was much
harder than using
This generated a makefile which could be used to build an Ada program.
Verdix Ada's build system (which most of us used) was slow. The workstation took about 3 minutes to build a program. Most of that time was spent analyzing the source files. On the other hand, bypassing the build system and using the compiler directly took one tenth of the time.
My lex/yacc program took only a few seconds to generate a makefile. Even with the performance advantage, my associates did not approve of the approach, saying that it would not work with generics (though I did that), and that there were some types of programs that it would not work with (I did not find any in our project).
This trimmed comments from Ada source, which is fairly simple to do. I wrote this first.
I maintained Ada PITS for a few years, before it was taken up by another group and made into a deliverable.
At that point, I was doing software prototyping and evaluations to support a variety of actitivies. I used my spare time to start helping external developers with tools which would be useful if they resolved code quality issues (i.e., portability and maintainability). By the time I moved on in 1994, I had evaluated and suggested improvements for about 65 programs. The first large program that I became involved with is the text editor vile. My involvement with cproto and mawk also dates from this period. Other programs that I touched then, but have not since include c2man and par.
Not all of the improvements were successful; the GNU maintainers were (aside from one who incorporated my changes without credit) singularly unreceptive. I may elaborate on those in another venue.
As noted in my page on change-logs, I have become the maintainer for some programs where I was not the original developer. In that process, I have also become the principal developer.
There are other programs to which I have contributed changes. For example, I helped with the sharutils program during 1994-1996.
There are several terms used in this page, for which I have not found a reliable source.