http://invisible-island.net/ncurses/
Copyright © 2011–2013, 2014 by Thomas E. Dickey


Synopsis

tctest is a tool which helps analyze termcap implementations, i.e., the runtime library. It uses the termcap runtime library to retrieve terminal descriptions, which it reports in a standard format. The termcap library may alter or lose information from the original textual terminal descriptions. To understand what tctest is reporting, you need more information than is provided by other sources of information.

History

Outside of manual pages, the O'Reilly book termcap & terminfo by Strang et al (1986) is the usual reference for this material. However, it does not present differences among implementations, and covers only the very beginning of the story. I have provided more information.

Termcap evolved in stages:

BSD Termcap

While termcap is said to have been created in 1978, the oldest sources that are still available are part of the BSD releases beginning a year later (with 3BSD, files dated November 1979 to January 1980). This, and several later releases, can be downloaded from the Unix Archive.

3BSD (code dated December 1979)

The initial release of the termcap library uses syntax which would (mostly) work with current implementations. The tgoto function was later revised for these changes:

The 3BSD tarball includes manpages termlib.3 (termcap library) and termcap.5 (termcap format), which describes most of the syntax.

BSD 4.2 (code dated July 1983)

This is the reference version for termcap syntax (noting that features such as the TERMCAP environment variable are not syntax, but configurability).

Terminal data is read from a text file (or from the TERMCAP environment variable) using the tgetent function. It has two parameters:

buffer
this is a user-supplied buffer able to hold at least 1024 bytes (1023 bytes of data plus a terminating null byte). The terminal description will be returned in this buffer.
name
this is the name of the terminal entry to find. In the BSD 4.2 implementation, the name point to a nonempty string.

Loading the terminal data is subject to both the parameters and environment variables:

One interesting quirk of the termcap library is that tgetent can be called recursively to include a parent description marked at the end of any description. On each recursion, the name parameter is set to the parent's name, which in turn is matched against TERM. In this way, the TERMCAP variable could be used (instead of providing the entry to match the caller's given name) to override any single entry in the chain of recursion.

The basic syntax is as follows:

To allow entering special characters, the termcap library recognizes a limited set of escapes (character sequences beginning with "\" backslash which are substituted at runtime in tgetstr:

Source Result
\E escape
\\ backslash
\b backspace
\f form-feed
\n newline
\r return
\t tab

The termcap library also recognizes special characters encoded as octal numbers, again using backslash to denote an escape. For example, the ASCII escape character can be represented in a termcap as "\E" or "\033". Oddly, the BSD 4.2 library accepts \8 and \9 (any decimal digit), though the computation assumes the base is 8.

Escaped characters not in the table are passed through as the character itself. An escaped colon requires special handling to work around a design defect in the tskip function that splits up the runtime data returned by tgetstr: it ignores the backslash character.

There are also control character substitutions using "^". Those mask (logical AND) the value of the next character to 5 bits, making the result in the range 0-31.

Conventionally, "^" markers use the uppercase alphabetic characters plus the punctuation characters in the same range (of 32) which map to controls by stripping all but the low 5 bits, i.e., "@", "[", "\", "]", "^" and "_". For instance, there are 41 occurrences of "^_" in the BSD 4.2 termcap.

One pitfall when comparing termcap and terminfo is that DEL (127, represented in terminfo by "^?") is not treated specially by termcap; it must be given as "\177". A "^?" seen by a BSD 4.2 termcap library gives the same result as "^_", i.e.,. 31.

The termcap library handles more than just special characters. It provides support for cursor-addressing via the tgoto function. Given two numbers and a string, tgoto checks for functions (marked with "%"), and performs those functions using the two numbers. In the BSD 4.2 tarball, a list of these functions is found in comments in the source code:

 * The following escapes are defined for substituting row/column:
 *
 *      %d      as in printf
 *      %2      like %2d
 *      %3      like %3d
 *      %.      gives %c hacking special case characters
 *      %+x     like %c but adding x first
 *
 *      The codes below affect the state but don't use up a value.
 *
 *      %>xy    if value > x add y
 *      %r      reverses row/column
 *      %i      increments row/column (for one origin indexing)
 *      %%      gives %
 *      %B      BCD (2 decimal digits encoded in one byte)
 *      %D      Delta Data (backwards bcd)
 *
 * all other characters are ``self-inserting''.

The comments do not mention "%n", though it is documented in the BSD 4.2 manpage. Strang did mention it, but made an error, stating that the value is exclusive-OR'd with (octal) 01400. The code uses 0140, a single-byte value, and the operation is performed on both row/column values.

Given an unrecognized "%" function, tgoto returns "OOPS".

The same functions could be used for other capabilities than cm (cursor-movement). The BSD 4.2 termcap file uses tgoto functions for several other capabilities (shown with their more familiar terminfo names):

Long terminfo name Short name Termcap Description
parm_insert_line il AL insert #1 lines (P*)
parm_dch dch DC delete #1 characters (P*)
parm_delete_line dl DL delete #1 lines (P*)
parm_ich ich IC insert #1 characters (P*)
parm_left_cursor cub LE move #1 characters to the left (P)
parm_right_cursor cuf RI move #1 characters to the right (P*)
parm_up_cursor cuu UP up #1 lines (P*)
column_address hpa ch horizontal position #1, absolute (P)
cursor_address cup cm move to row #1 columns #2
change_scroll_region csr cs change region to line #1 to line #2 (P)
row_address vpa cv vertical position #1 absolute (P)
to_status_line tsl ts move to status line, column #1

BSD 4.3 (1986)

This made no change to the syntax of termcap.

In Tahoe, the section 3 "termcap" manpage (dated September 1987) provided more ways to find the termcap file.

The handling of the TERMCAP environment variable was modified to take into account a new TERMPATH environment variable:

The section 5 "termcap" manpage (dated November 1985) notes

termcap was replaced by terminfo in UNIX System V Release 2.0. The transition will be relatively painless if capabilities flagged as "obsolete" are avoided.

That is, termcap capabilities were from that point derived from terminfo, and those that had no counterpart in terminfo were deemed obsolete.

The manpage listed 179 capabilities, marking 25 obsolete. One ("ma") is still a termcap name, but used for a different purpose. The others are gone (except as recognized by ncurses).

Generally, releases of termcap databases increase with time, but the number of obsolete entries did not decrease immediately just because it was documented in the manpage:

Release Entries Total Capabilities Distinct Capabilities
    Total Obsolete Total Obsolete
3BSD 81 1289 137 99 8
BSD4.2 328 10957 954 140 14
Solaris 10 470 15838 1051 195 17
BSD4.3 539 19404 1224 248 17
BSD4.4 552 20152 1256 268 18
termcap 2.0.8 900 49407 521 352 9
termcap 1.3.1 1274 88103 861 374 13
ncurses 1583 95961 1011 399 13

Considering the size, Solaris 10's termcap is likely based on one of the earlier BSD 4.3 releases, with minor updates. It is not directly related to Solaris' terminfo database, which dates from the mid-1990s.

BSD 4.4 (code dated April 1994)

To improve performance, the developers changed storage from a flat text-file to hashed databases. This was new work done by Casey Leedom, called "getcap". Incidentally, the rewrite got rid of most of the problems from BSD 4.2's buffer limit-checks.

The use of TERMCAP and TERMPATH environment variables was unchanged from BSD 4.3.

More important for portability, BSD 4.4 termcap supports multiple "tc=" capabilities in an entry (like terminfo). Unlike terminfo, this termcap implementation does no merging of the capabilities. It simply does a depth-first traversal of the entry, replacing each "tc=" capability with the text from the corresponding entry.

None of the entries in the BSD 4.4 termcap file use this feature, however.

It also extended the syntax of termcap, adding some redundant escapes:

Source Result
\B backspace
\C colon
\F form-feed
\N newline
\R return
\T tab
\c colon
\e escape

Only one entry in the BSD 4.4 termcap file uses any of those (two instances of "\L" in tek4113, which happens to match the BSD 4.3 entry in that instance).

It also allows numbers to be hexadecimal or octal, using C-style "0x" or "0" prefixes respectively. Likewise, hexadecimal values are unused in the BSD 4.4 termcap.

BSD 4.4 termcap eliminates a quirk of preceding releases. Octal escapes are different, i.e., cgetstr ignores "\8" and "\9", passing through "8" and "9" respectively.

BSD 4.4 termcap discards a "^" control sequence which is followed by a colon or by the end of the entry.

BSD 4.4 termcap also treats "\:" as the end of a capability, making it consistent with BSD 4.2's termcap library (the design defect noted previously). Six entries in BSD 4.4's termcap file used "\:" as data:

One (f100) used it in the sense that cgetstr assumed: it was leftover from a stray edit which deleted a newline, e.g., to save bytes.

Which is correct? That is hard to determine. Neither IBM nor ncurses show status-line features for the ibm3163 entry. The Wyse-60 entries in ncurses are from a different source, and do not use the particular initialization string which was modified. The dg460-ansi entry differs, using kf9=\E[010z versus BSD 4.4's kf9=\E[00\:z. That was due to one of Raymond's changes (likely a guess):

# fixed garbled ":k9=\E[00\:z:" capability -- esr)

VT100.net has a user manual for the terminal, but while the manual promises that the user function keys are documented in Appendix C, the information is not there, either. On the other hand, it has a manual for the 411/461 models which documents "\072" for that key.

Finally, BSD 4.4 extends the way capabilities can be cancelled. Termcap capability names are not (except for special cases such as tc) predefined. That opens up the possibility of having the same name for a boolean capability, as well as a number and a string. BSD 4.4 takes note of this, providing a way to cancel just a numeric capability or a string capability without affecting the (hypothetical) alternate values. The library does this based on whether a "#" or "=" character precedes "@".

Terminfo support for Termcap

According to Strang (1986), the termcap functions were available via the (terminfo-based) curses library. Aside from that, there are few sources which tell when those were added (or whether they were present in the first version of terminfo).

Strang states that Bill Joy wrote the first version of the termcap library, and that Mark Horton wrote the terminfo library. The latter was announced at USENIX in 1982 .

There are no copies of the paper online; no useful documentation on this before Strang. Pavel Curtis reimplementation of terminfo (1983) did not provide a termcap interface. Lacking something more concrete, it is uncertain whether the initial release of Horton's terminfo library provided this either. In 1986, Mark Horton stated on Usenet that he had written the version of terminfo used in SVr2, and that SVr3 "had input" from him. In other words, other people (such as Tony Hansen, author of infocmp) were doing the work from that point. Horton also commented

The SVr2 tic was just a modified version of the termcap file reading code, which also doesn't notice syntax errors. The SVr3 tic is completely redone (it's based on Pavel Curtis's tic) and is fairly fussy about syntax errors. It's also more complete, uses the existing binary database, and is much faster.

and

(For those who are not impressed with tic's error messages, the SVr2 tic, which was frozen for SVr2 in April 1983 along with the rest of curses, is essentially the termcap parser. The SVr3 tic is completely redone, by Pavel Curtis, and it's as fussy as pcc.)

OpenSolaris has sources for captoinfo and infocmp, which give 1984 as the date of creation. Likewise, it has sources for tic, citing Pavel Curtis in 1982.

Strang does not mention color (none of the listed capabilities do color). I may conclude that SVr2 did not support color in spite of subsequent commentary which claims that it supported all of the advanced features such as color, line-drawing, multiple video attributes.

Strang's focus throughout is on termcap, discussing terminfo as an alternative. For instance, he comments (chapter 15) that BSD 4.3 termcap requires that the "tc=" capability be last in a description, noting that it implies that there will be only one, there is no complementary discussion of "use=" (terminfo) with any limitations on position and number.

The terminfo implementations at hand (the SVR4's such as Solaris) all have the same approach to supporting termcap:

FreeBSD and its Kindred

Although BSD 4.2 is the reference for syntax, most termcap users rely on the later BSD 4.4 libraries. In turn, those have evolved to either use ncurses directly, or have added features for compatibility with it.

FreeBSD

FreeBSD's CVS history started in May 1994 (some information was lost in converting to SVN).
These sections are of interest:

CVS for src/share/termcap
This is the termcap database.
It has evolved from the BSD 4.4 termcap file, adding/changing items, and is not directly derived from ncurses.
CVS for src/lib/libc/gen
The getcap source code and documentation live here.
CVS for src/contrib/ncurses
As of mid-2011, this is the current source for FreeBSD's termcap library interface. Like src/lib/libncurses, it is modified to use getcap.
src/lib/libncurses (dropped)
Peter Wemm marked this obsolete in August 1999, in favor of src/contrib/ncurses (at that point, a pre-release of ncurses 5.0).
However, that work was not completed until April 2007.
Rong-En Fan consulted with me, and finished it using ncurses 5.6 (which supported hashed databases).
Andrey A Chernov moved it from the FreeBSD ports to the base system in October 1994.
For the next five years, it evolved as a FreeBSD-oriented fork.
src/lib/libtermcap (dropped)
obsolete since November 1999, in favor of src/lib/libncurses.
This used tparm from mytinfo (import in December 1994).
src/lib/libmytinfo (dropped)
obsolete since November 1999, in favor of src/lib/libncurses.
This was Ross Ridge's mytinfo, which provided termcap and terminfo support.
Andrey A Chernov moved it from the FreeBSD ports to the base system in October 1994.
There were some subsequent changes, but more work was done with src/lib/libncurses.
src/lib/libterm (dropped)
obsolete since July 1997, in favor of src/lib/libtermcap.
This was the "BSD 4.4 Lite Lib" source, imported in May 1994.
It uses getcap to retrieve capability information.

FreeBSD getcap was modified to support "^?" as alias for DEL in May 1995.
Its buffer-size is still 1024 (1023 bytes of data plus a terminating null).

The base system ncurses is configured to support only termcap; a port supports terminfo.

NetBSD

NetBSD's CVS history starts in March 1993. These sections are of interest:

CVS for src/share/terminfo
This is the terminfo from ncurses.
CVS for src/share/termcap
obsolete since January 2011 in favor of src/share/terminfo.
Like FreeBSD's termcap, this started from BSD 4.4's termcap file.
But from 1995 to 1997, there were a half-dozen imports from Eric Raymond's version.
All told, the NetBSD CVS reflects about 2/3 as many commits as FreeBSD.
I reviewed this late in 2003 using ncurses' tic, sent a (170-line) patch to improve it.
CVS for src/lib/libc/gen
The getcap source code and documentation live here.
There are no changes to BSD 4.4 syntax, apparently only performance and portability fixes.
CVS for src/lib/libterminfo
This is a new implementation of terminfo by Roy Marples, starting February 2010.
Like other terminfo implementations, it provides a termcap interface.
Also (like libterm which it replaced), it reads from a hashed database.
CVS for src/lib/libterm
obsolete since February 2010 in favor of src/lib/libterminfo.
Again, this uses getcap to retrieve capability information.
This provided some extensions versus BSD 4.4 libterm:

OpenBSD

OpenBSD moved away from BSD termcap early, using terminfo to provide similar functionality.

CVS for src/share/termcap/termcap.src
This was originally 9.8.3 from Eric Raymond in 1995.
It was made obsolete by termtypes.master in December 1998.
CVS for src/share/terminfo/terminfo.src
This was originally a slightly modified copy of 9.13.8 from Eric Raymond in July 1996.
It was made obsolete by termtypes.master in December 1998.
CVS for src/share/termtypes/termtypes.master
This was originally an import of termtypes.master from Eric Raymond's site in December 1999.
It has been imported periodically from ncurses since January 2000.
CVS for src/lib/libcurses/
This is ncurses 5.7, which provides a termcap calling interface to a terminfo database.
Todd C Miller and other OpenBSD developers added a feature for reading terminfo data from a hashed database similar to the way termcap information is stored in BSD 4.4 termcap. This is in the read_bsd_terminfo.c file. The termcap implementation itself is much the same, aside from using OpenBSD-specific functions for guarding against setuid abuse, etc.
CVS for src/lib/libocurses/
The "old" curses library is derived from BSD 4.4.
The src/lib/libterm termcap interface was merged with it in October 1999.
The OpenBSD developers made further improvements to safeguard against environment variable problems with TERMCAP and HOME.
These affect its behavior, but usually not in a noticeable way.
CVS for src/lib/libterm/
This is the BSD 4.4 libterm, imported from NetBSD in October 1995.
Lockert modified tgetent to accept a null pointer for the buffer parameter.
Other than that, the OpenBSD developers made no changes to the behavior other than fixing possible buffer overflows.
It was merged with src/lib/libocurses in October 1998.
CVS for src/lib/libtermlib/
Begun in mid-1996 by Thorsten Lockert, this library read terminfo data from a hashed database, and provided both terminfo and termcap calling interfaces.
It was moved to the Attic in December 2000.

Legacy Users of BSD 4.2/4.3 Termcap

The mainstream of development left BSD 4.2/4.3 behind around 1990. There are still some legacy users of the old version, just as there are still developers in 2011 using K&R C or the related "legacy C". This section describes a few examples, all derived from 4.2 or 4.3 code.

Ingres Database Terminal Library

The Ingres terminal library is derived from BSD 4.2 code. Comments in the source code indicate that changes started in June 1985, by renaming the termcap file.

While the entrypoints have been renamed, most of the original comments are present without change, even when obsolete. For example

**  Essentially all the work here is scanning and decoding escapes
**  in string capabilities.  We don't use stdio because the editor
**  doesn't, and because living w/o it is not hard.
while the Ingres version uses stdio for reporting errors. Most features (such as escaping) are unchanged. It provides a few extensions:

There are a few odd differences in tgoto; the %i and %2 cases have been moved (an unnecessary change).

OpenSolaris UCB Library

Unlike the other Unix vendors (reduced to HP and IBM), Sun (now Oracle) has long provided a compatibility library based on BSD source. This is from BSD 4.3 rather than the more common BSD 4.4 version. Because it is provided for compatibility and is not actually a supported product, there is no documentation.

OpenSolaris has a few legacy uses of termcap (UCB curses of course), as well as programs in ucbcmd such as tset:

Sources /usr/src/cmd/captoinfo/
This has a customized copy of the UCB termcap.c file, The comment at the top gives the reason (working around buffer size):
/* Copyright (c) 1979 Regents of the University of California   */
/* Modified to:                                                 */
/* 1) remember the name of the first tc= parameter              */
/*      encountered during parsing.                             */
/* 2) handle multiple invocations of tgetent().                 */
/* 3) tskip() is now available outside of the library.          */
/* 4) remember $TERM name for error messages.                   */
/* 5) have a larger buffer.                                     */
/* 6) really fix the bug that 5) got around. This fix by        */
/*              Marion Hakanson, orstcs!hakanson                */

Note also that tskip is made available (for use in the main captoinfo program). It does not improve on the original BSD 4.1 implementation, which lacks a check for the buffer size. Fixing tskip by itself would not make the termcap code safe from buffer overflows; the logic used "tc=" resolution also has multiple issues. But exporting tskip without providing for buffer-limit checks compounds the problem.

Sources /usr/src/ucbcmd/tset/
This is an example of a program using termcap.
Sources /usr/src/ucblib/libtermcap/
This is Solaris' version of BSD 4.3 termcap. It adds one feature: the tgetent function asks the operating system for the terminal's current size, and sets the li (lines) and co (columns) capabilities in the returned data. Linux termcap 2.0.8 by the way does the same thing (but OpenSolaris has no history before 2005, making it impossible to gauge which implementation had an effect on the other).

Schilling's "Extended" Termcap Library

Jörg Schilling uses BSD 4.3 termcap with some minor enhancements (see ftp site). It provides support for TERMPATH which was introduced in BSD 4.3, but the implementation is slightly different, to avoid using BSD-specific names.

The extensions (there is no documentation other than the C source) include:

Like all variants before BSD 4.4, it has bugs in the checks for buffer-overflow (including the longstanding problem with tskip). I made these changes for example to eliminate core-dumps from the library while investigating it with tctest:

--- tgetent.c.orig      2010-10-12 18:10:20.000000000 -0400
+++ tgetent.c   2011-08-04 20:57:40.000000000 -0400
@@ -91,6 +91,7 @@
 EXPORT BOOL    tgetflag        __PR((char *ent));
 EXPORT char    *tgetstr        __PR((char *ent, char **array));
 EXPORT char    *tdecode        __PR((char *ep, char **array));
+LOCAL  char    *mytdecode      __PR((char *base, char *ep, char **array));
 #if    defined(TIOCGSIZE) || defined(TIOCGWINSZ)
 LOCAL  void    tgetsize        __PR((void));
 LOCAL  void    tdeldup         __PR((char *ent));
@@ -351,6 +352,7 @@
                        BOOL    needfree;
                        char    *xtbuf;
                        int     ret;
+                       int     tst;
 
        if (tbuf == NULL)
                return (0);
@@ -404,7 +406,8 @@
                /*
                 * Add nullbyte and 14 bytes for the space needed by tgetsize()
                 */
-               ret = ep - otbuf + strlen(np) + 1 + TSIZE_SPACE;
+               tst = strlen(np);
+               ret = ep - otbuf + tst + 1 + TSIZE_SPACE;
                if (ret >= (unsigned)(tbufsize-1)) {
                        if (tbufmalloc) {
                                tbufsize = ret;
@@ -422,7 +425,8 @@
                                ret = tbufsize - 1 - (ep - otbuf);
                                if (ret < 0)
                                        ret = 0;
-                               np[ret] = '\0';
+                               if (ret < tst)
+                                       np[ret] = '\0';
                        }
                }
                strcpy(ep, np);
@@ -600,7 +604,7 @@
                if (!ep || *ep == '@')
                        return ((char *) NULL);
                if (*ep == '=') {
-                       ep = tdecode(++ep, array);
+                       ep = mytdecode(tbuf, ++ep, array);
                        if (ep == buf) {
                                ep = tmalloc(strlen(ep)+1);
                                if (ep != NULL)
@@ -620,10 +624,11 @@
  * Note that old 'vi' implementations limit the total space for
  * all decoded strings to 256 bytes.
  */
-EXPORT char *
-tdecode(pp, array)
-                       char    *pp;
-                       char    *array[];
+LOCAL char *
+mytdecode(
+                       char    *base,
+                       char    *pp,
+                       char    *array[])
 {
                        int     i;
        register        Uchar   c;
@@ -633,7 +638,7 @@
 
        bp = (Uchar *)array[0];
 
-       for (; (c = *ep++) && c != ':'; *bp++ = c) {
+       for (; ((ep - (Uchar *)base) <= 1023) && (c = *ep++) && c != ':'; *bp++ = c) {
                if (c == '^') {
                        c = *ep++ & 0x1f;
                } else if (c == '\\') {
@@ -662,6 +667,17 @@
        return ((char *)ep);
 }
 
+/*
+ * Workaround to let the various callers work no worse than before...
+ */
+EXPORT char *
+tdecode(
+                       char    *pp,
+                       char    *array[])
+{
+       return mytdecode(pp, pp, array);
+}
+
 #if    defined(TIOCGSIZE) || defined(TIOCGWINSZ)
 
 /*

Incidentally, I noticed this comment by Schilling while researching the two-character termcap quirk (present here as well) for Debian #698299:

        At the same time, hundreds of bugs in the Dickey termcap file
        have been fixed. It seems that Mr. Dickey now uses our termcap
        program to verify the content of the file for correctness.

However, I did not incorporate any aspect of Schilling's test-program into tctest. It was not useful.

GNU Termcap

In discussing GNU termcap, I am considering three versions:

The 2.0.8 and 1.3.1 versions competed for at least ten years. Your system may have either, depending on the packager's preferences and ambition.

The former provides both shared and static libraries for Linux; the latter only provides a static library. The 2.0.8 version also (like Solaris) returns the terminal's size in the data from tgetent, while the 1.3.1 version only mentions in its documentation that an applicaton ought to do this.

The 1.3.1 and 1.3 versions are very similar (aside from updating the termcap file). The documentation describes 1.3; this discussion focuses on the extensions.

GNU termcap

At one point, ncurses had a tparam function (from changes by Eric Raymond in January 1996). But this symbol conflicted with emacs, and Eric removed it.

NCurses

ncurses reads either termcap or terminfo source files, compiling those to terminfo format. It has been more forgiving of differences from BSD 4.3 syntax than some other implementations. For example, I added fixes early in 1998 to fill in missing parts of the terminfo syntax (the \a and ^0 items noted here. Those also affected the termcap parsing. Much later, I added a strict option to tic which suppresses those translations.

As terminfo supports multiple "use=" capabilities (the same as "tc=") capabilities), ncurses also supports multiple "tc=" capabilities.

ncurses recognizes the GNU termcap %m, but none of the other extensions for the simple reason that no substantial termcap source was ever written using the GNU extensions. GNU termcap has always distributed either Eric Raymond's (mostly generated) termcap source, or one wholly or partly derived from ncurses.

My Involvement...

I have used termcap since 1983. At the time, I was more interested in curses than termcap, since curses (poorly documented) was the visible interface used for dired. However, in my lab, I had a BitGraph terminal, and at home an Ann Arbor Ambassador terminal. Both had some VT100-compatibility, but both had interesting extensions that could be used by customizing a termcap entry. Initially, this was for simple things, such as setting the screen size. I was interested in using termcap to support the graphics work that I did with the BitGraph terminal, but on asking advice, was told "termcap doesn't do that sort of thing".

Shortly after, I moved to a different project. I was allowed to retain the Ann Arbor terminal but most of my work was using Apollo workstations, with some tie-ins to VAX/VMS and PrimeOS. None of that involved termcap.

Later (in 1986), I used Wyse-50 terminals in development on an SVr2 system. At the time, I knew only about termcap. The SVr2 system supported terminfo, but I did not modify it. The terminal database's entry for the Wyse50 (probably "wy50") was good enough for vi. It did not mention that the terminal has programmable function keys (and labels). So I wrote a special-purpose (C) program to set up the terminal.

It was not until the mid-1990s that I really got involved in the development of termcap, rather than being a user. That was with ncurses, of course. Even still, until mid-1996 I refrained from doing much with the tools (tic, infocmp) which manipulated terminfo and termcap. At that point, I realized that making ncurses successful required improving all parts of the system.

It helped that I got useful feedback—mostly from various BSD developers. My email shows these for instance:

I have improved ncurses' support for termcap in three areas:

extensible terminfo
Until ncurses 5.0 in 1999, people used to (with some justification) claim that termcap was better than terminfo because one could add whatever capabilities they might want to an entry, without regard to whether it was a standard capability.

I addressed this by making ncurses able to define new capabilities using the terminal description. Standard capabilities are unaffected; new capabilities are optional.

Quoting from my email to Florian La Rouche (1999/2/21):

> > I have a couple of minor changes also (I overlooked one item in define_key,
> > and am considering adding a small change to allow us to extend the terminfo
> > format later without causing the existing applications to refuse to recognize
> > the new format).
>
> That sounds like a very good feature to add before a release.
(as long as it doesn't break old programs ;-)

I am considering adding a 5th table to the file format and making the
terminfo reader smart enough to "see" it in what would be unused space
after the existing tables.  Several people have complained that terminfo
cannot be extended; allowing it to store extended capabilities would
alleviate that.

This made the TERMTYPE structure binary-incompatible. It is implicitly used by any low-level application that includes <term.h>. This feature, together with some interface corrections to match the X/Open Curses specification were the reason why the ncurses release numbering jumped from 4.2 to 5.0 (the release numbering is determined by binary compatibility).

The reason why this change improves ncurses' support for termcap is that there is only one source for terminal descriptions in ncurses. Eric Raymond had three sources, relying on manual fix-ups to get usable termcaps:

master
Some conventional termcap capabilities have no counterpart in terminfo.
terminfo
This was generated from the "master" file using tic. Essentially, tic would omit capabilities not part of standard terminfo.
termcap
Raymond used ncurses' tic program, then followed up with shell scripts and manual edits.
better translation between terminfo and termcap formats
Translating from termcap to terminfo is much simpler than translating from terminfo to termcap, because termcap is much less capable. The main issue is runtime expressions stored in strings. Termcap has a limited repertoire of special functions which can be reimplemented as terminfo expressions. Compatibility of capability names is almost a negligible concern, since termcap names are defined for each terminfo name.

Raymond reused code from Ross Ridge's public domain mytinfo package (comp.sources.unix, volume 26, issue 77, December 1992) for these features:

However mytinfo did not convert from terminfo to termcap format. This was an area that Raymond started, which I have continued, making mechanically generated termcaps usable in most instances.

The changelog in the termcap 1.3.1 package states that it uses termcap.src regenerated from (Raymond's) 11.0.1 master file. The "regenerated" part was done using ncurses' tic program, to resolve the "tc=" references. The translation also relies on the improvements that I made to tic up to that point (early 2002).

made hashed-databases a portable option
The BSD's implemented hashed-databases for termcap starting with BSD 4.4 (in 1994). This stores a copy of each termcap entry's text in the database. At runtime, the termcap library puts the terminal description together, resolving "tc=" capabilities (includes).

OpenBSD added hashed database for their system version of ncurses in 1999. Again, this stores text — this time for terminfo. It means that the library must contain most of the tic terminfo compiler. One of the features of terminfo in comparison to termcap is that terminfo is compiled and loads into a usable form with less work. Also, in contrast to ncurses, the OpenBSD design uses cap_mkdb to load the entire database at one time rather than providing from incremental loading from various sources.

I added support for hashed databases in 2006. Like other features of ncurses it is reasonably portable (in this case relying upon Berkeley Database), and stores terminal entries in compiled form. Berkeley Database allows records longer than 1024 bytes (unlike ndbm on Solaris for instance).

Issues with the Original BSD Termcap

The original implementation (of BSD 4.3) termcap has several problems:

Memory Usage

The design of termcap assumed that the calling application would be more efficient by providing a fixed-size buffer to return the data than by using malloc. Recalling that 1023 bytes seemed "big enough" in that era, it has proven too cramped for terminals with multiple function keys. In particular, the widespread PC keyboard with 12 function keys, multiple modifiers and more sophisticated applications has made 1023 seem too small.

But in 1979, a 1023-byte buffer was also fairly large on the small machines that Unix ran on. That may explain why wasted space within that buffer was overlooked. When BSD 4.3 termcap reads data into the buffer, it reads everything. It does not discard the whitespace and extra colons which are not actually part of the terminal description. Reduce that 1023-bytes by 3%.

Some applications such as xterm (depending on the system) and screen may set the TERMCAP environment variable to exploit another feature of the termcap library: if it is set to something that looks like a termcap description, that is used as the terminal description. If you happen to be using a system which does this, you might have noticed that it is formatted as several lines. For xterm, that would happen with a BSD 4.3 termcap (screen is perverse and does this all the time).

Mark Horton argued against the use of environment variables:

>A termcap database sorted approximately in
>decreasing order of frequency of use should be at least as fast as the
>repeated directory lookups required to descend the terminfo tree -- and
>termcap format is *trivial* to parse.
>
>If speed is what you want, sort /etc/termcap in decreasing order of
>frequency of use. If that's not good enough for you, cram your termcap
>definition in the environment variable TERMCAP and leave terminfo behind
>entirely, when it comes to speed.

I used to think this too.  I was at Berkeley when we decided how to sort
termcap files and put them into the environment.  It helped a lot.

But it turns out that even if you put a termcap in your environment,
it's still too slow.  The termcap algorithm for reading the entry
into a set of capabilities is QUADRATIC on the size of the entry.
This is the nature of the beast - because of tc=, you have to start
from the left for each capability search.  As termcap descriptions got
longer, starting up vi grew slower and slower.  It was taking 1/4 second
of CPU time on a VAX 750 to parse the termcap entry, even when it came
out of the environment.

This was when I decided to move to a compiled format.  Things get much
simpler for the typical user - no need for the whole entry in the
environment anymore, or the hair of tset -s in the .profile/.login.
The ps command was breaking from the huge environment entries that
took the arguments off the top page of memory.  Forks were expensive.
And it took too long to start up vi.  All these problems went away
when terminfo was compiled.

Besides wasting process space, a multi-line TERMCAP variable complicates shell scripts. In contrast, BSD 4.4 discards the unnecessary characters, resulting in a single-line value.

Different implementations use additional workarounds to increase the effective buffer size for a terminal description; no particular scheme is used for all of these.

Error Checking

With some care, it is possible to fit usable terminal descriptions into the 1023-byte limit. The termcap library does some simple checks to keep from writing past its caller's buffer. However, the "tc=" (includes) are a little more complicated than the program can handle, making it possible to chop a capability at the end of the buffer, giving odd results. The termcap library's handling of buffer overflows has other bugs, allowing it to write past the end of the buffer anyway.

Error Reporting

When reporting problems in a termcap entry, the library uses only simple messages, calling write rather than printf. According to comments in the code, the library did not use <stdio.h> because the editor (vi) did not. As a result, termcap error messages do not provide names of too-long entries.

Inheritance

The termcap library implements inheritance by replacing the "tc=" capability at the end of the termcap entry with the included text. (It does discard the name and description of the included entry, but rather than being for efficiency, that is done because of syntax restrictions). A capability could appear in both the original and included entry. The text for both is stored in the same 1023-byte buffer, and the library has to search for the first occurrence. Because of the duplication, the effective buffer size is again reduced, and searches for the first occurrence of a capability are longer than necessary.

Fortunately, termcap buffer sizes are small; the performance issues are not as noticeable as they were in older machines.

Later implementations, e.g., BSD 4.4, support multiple "tc=" capabilities. Again, the inheritance is purely textual. To get efficient storage, a scheme such as that used by terminfo is needed. With terminfo, the capabilities are merged into an array, which eliminates the need for juggling and recopying the entry as "tc=" includes are processed.

Escaping Rules

Rather than being designed, it appears that termcap "just grew". The handling of escapes in particular is uneven (see table of escapes). For instance:

Later implementations of the termcap parser resolved some of its problems by first splitting the termcap entry into an array of strings to use consistent boundaries. That helps with "tc=" parsing. However the original misdesign of tskip is carried forward. Legacy implementations (such as Solaris) are unimproved.

Using tctest, I found that the parsing for escaped colons is incomplete and inconsistent.

For instance, this example:

O0|Octals|test octal-escapes:\
        :F9=a\472:\
        :FA=a\472FB=\333:\
        :FB=a\134:\
        :FC=a\::\
        :FD=a\:FE=\333:\
        :FE=a\134:\
        :FF=a\072:\
        :FG=a\072FH=\333:\
        :FH=a\134:\
        :is=\EZ:

is translated to this:

# alias E0
Octals:\
        :F9=a\072:\
        :FA=a\072FB=\333:\
        :FB=a\\:\
        :FC=a\072:\
        :FD=a\072FE=\333:\
        :FE=\333:\
        :FF=a\072:\
        :FG=a\072FH=\333:\
        :FH=a\\:\
        :is=\EZ:

The "\:" in the definition for FD is translated to an actual colon, and the value returned includes the shadowed FE, contrary to the termcap manpage which says that literal colons must be given as "\072". That is because escapes are checked in forward-scanning, but not in backward scanning.

The mapping of "\472" to "\072" is expected, and it happens to match the treatment of "\:".

The sequence "\0" also is mishandled by BSD 4.3 termcap. If one uses that in an entry, it loses track of the actual character position (due to the inconsistent scanning) and acts as if the characters following the misencoded null are part of the capability. If the first of those happens to be the delimiting colon of the capability, it becomes part of the value. In some cases, a garbage character is added for completeness. Not only that capability value is misparsed, but others which follow it in the entry. The escapes.tc test case shows this behavior, in the Octals entry.

Oddly enough, the equivalent "^@" is handled as one might expect from the documentation, and thrown away.

Termcap Extensions

Modern (since 1990) implementations of termcap provide extensions.

Rather than rely on documentation (which can be interesting), I have set up test-cases with tctest to verify whether a given implementation reads a particular syntax feature, and how it is returned to a calling application.

NCurses

Because ncurses can read termcap source files, it is technically a termcap implementation. It stores the terminal entries in terminfo format, but at the same time it provides better support for termcap applications than other terminfo-based implementations. Much of that is because of reports from screen's developer Michael Schroeder. For instance

NetBSD

NetBSD termcap (deprecated in 2010 in favor of a native terminfo implementation) provides the BSD 4.4 extensions. They are actually not in the termcap library, but rather are provided by cgetstr (originally May 1993), which is in src/lib/libc/gen:

Source Result
\B backspace
\C colon
\F form-feed
\N newline
\R return
\T tab
\c colon
\e escape

However there is some breakage, making it incompatible with BSD 4.2 termcap (testing NetBSD 5.1):

Source Result
\b is eaten
\t is eaten
\072 is eaten

GNU Termcap

Neither flavor (2.0.8 or 1.3.1) documents the features that are of interest.

Termcap 2.0.8 ignores the termcap entry's lines and columns values, replacing those by the actual screensize in tgetent.

It does not honor the \072 escape. Both \072 and \: are interpeted as a separator.

Like the BSD termcap implementations, it dumps core when processing too-large entries.

Termcap 1.3.1, on the other hand, does not dump core for the examples in tctest.

Termcap 1.3.1's handling of escapes is loosely based on BSD 4.4's extensions:

Source Result
\A ^G
\B \b
\F \f
\N \n
\T \t
\V ^K
\a ^G
\e \E
\v ^K
\08 (eaten)
\09 \t
\134 (garbage)
\8 \b
\9 \t

Performance

In addition to using tctest to check for syntax issues with different termcap implementations, it is useful (simply because it retrieves all of the terminal descriptions from a source) for comparing performance.

Using different command-line options, tctest can be told to

The measurements reported here are from tctest's "make check", "make check-cap" or "make check-tic" rules. The "check-cap" and "check-tic" makefile rules tell the test script to store each termcap file as a database, either hashed (for the BSD's) and/or file-system (for ncurses). The tests are designed to work on a large terminal database, getting data from a variety of terminal entries. Other types of tests are possible, but not currently of interest in this discussion.

There are several configuration choices for ncurses. It can read a flat file, but that is the least efficient. The comparison with BSD 4.4 hashed databases is the most interesting; data from the older flat file implementations are shown for comparison. To configure ncurses with support for termcap, I used these options:

        --enable-getcap --enable-termcap --enable-bsdpad

Mark Horton's 1986 comment on Usenet says he found parsing $TERMCAP to be slower than reading binary terminfo from a file, simply because of the cost of parsing it, irregardless of the storage mechanism. That might be interesting in another discussion; however distinguishing file access times from disk-caching complicates it.

CPU Time

I have five systems that I can get interesting timing figures for:

Actually I have other systems, but those would duplicate things without adding information. Here are some issues that are relevant to the comparison:

Of course, ncurses is available on each platform, while the others (except for the four variants which I compiled for Debian) are available only on specific platforms. The timing figures are subject to the usual caveats:

All times are in seconds (real time), and are for one of the test-files (the BSD 4.3 termcap file, which is about 167Kb). The test file was chosen because it was the largest one having no multiple "tc=" includes. Also, no entries are too large. That makes the test work with the older termcap implementations.

System Library Database Test tgetent*10 Test standard caps Test all possible caps
Debian 5.0 ncurses filesystem 0.72 1.42 19.29
Debian 5.0 ncurses hashed-db 0.32 1.40 19.75
Debian 5.0 BSD 4.2 flat file 18.16 7.80 106.47
Debian 5.0 BSD 4.3 flat file 12.54 4.15 50.13
Debian 5.0 schily-2011-06-22 flat file 14.94 5.20 54.59
Debian 5.0 termcap 2.0.8 flat file 11.42 3.90 47.48
Debian 5.0 termcap 1.31 flat file 13.86 6.20 85.68
FreeBSD 4.9 ncurses filesystem 0.63 2.00 29.91
FreeBSD 4.9 ncurses hashed-db 0.49 1.98 29.50
FreeBSD 4.9 termcap hashed-db 2.11 2.43 44.71
FreeBSD 8.1 ncurses filesystem 6.20 2.30 27.80
FreeBSD 8.1 ncurses hashed-db 0.37 1.79 26.38
FreeBSD 8.1 termcap hashed-db 5.03 3.43 57.19
NetBSD 5.1 ncurses filesystem 0.78 1.31 17.60
NetBSD 5.1 ncurses hashed-db 0.50 1.29 17.33
NetBSD 5.1 termcap hashed-db 0.65 3.43 52.01
NetBSD 5.1 curses hashed-db 0.65 3.11 51.31
OpenBSD 4.9 ncurses filesystem 5.90 1.70 64.50
OpenBSD 4.9 ncurses hashed-db 0.53 1.22 63.70
OpenBSD 4.9 otermcap hashed-db 3.84 2.81 160.26
OpenBSD 4.9 curses hashed-db 3.84 2.81 160.24
Solaris 10 ncurses filesystem 4.11 2.49 31.01
Solaris 10 ncurses hashed-db 0.54 2.07 30.18
Solaris 10 ucblib flat file 14.53 5.51 67.87

The table illustrates some of the performance differences within a given platform, showing that hashed databases are more effective on some platforms. Similarly, there are differences between different termcap implementations; some use more efficient methods for retrieving capability information.

Memory Use

The BSD 4.3 termcap library wastes space by not discarding the unnecessary whitespace used to make it simple to edit. Generally this is about 3% of the text returned by tgetent, as illustrated:

Wasted space versus size (BSD 4.3 termcap)

The output from tctest would then waste even more space (if used as a termcap datafile) simply because it uses a separate line for each capability. BSD 4.4 and ncurses are unaffected by the extra whitespace, discarding it as they read the datafile.

Peter Wemm pointed this out to me early on, saying that BSD 4.3 had many known bugs and was slow, and that BSD 4.4 had fixed most of those problems. Keeping that in mind, the memory limitations of BSD 4.3 are not generally an issue, and that problems due to "large" termcap entries are mainly a concern to secondary users of ncurses' terminal database.

For instance, I provide a link to a generated termcap source on my ncurses page. The generated termcap matches the general structure of the terminfo source from which it is generated:

Primary users are those who are using ncurses or some other termcap library (such as NetBSD) which can handle that generated termcap file.

Secondary users on the other hand are developers using a different termcap library.

The developers of GNU termcap 1.3.1 used different options of tic to resolve multiple "tc=" capabilities, and to relax the limit on entry size. They noted that entry size is not a problem with their library, and that users who need the data from tgetent should allocate a buffer at least 2500 bytes.

Other developers may wish to experiment with BSD 4.3 (or equivalent). Using ncurses' tic, it is simple to generate a termcap source which is trimmed down enough for that , e.g.,

  tic -Cr0 terminfo.src >termcap-file

The "-r" option has been part of ncurses for quite a while. I added the "-0" option to tic> in 2011, while developing this program.

Of course, working within the 1023-byte limit ensures that some functionality is lost. It was a noticeable problem even with the BSD 4.3 terminal database. The plot below (using gnuplot) shows that the distribution of entry-sizes is bimodal:

Number of entries versus size (BSD 4.3 termcap)

The second peak includes terminals such as the Ann Arbor Ambassador and the Concept terminals–widely used improvements over VT100's.

Since BSD 4.3, the terminal database has grown, both in number of entries and the size of the entries.

For example, here is the same BSD 4.3 plot with a line showing the termcap 1.3.1 data.

Number of entries versus size (BSD 4.3 termcap)

It is easy to see that the entry-size distribution has shifted off to the right, and that newer terminals simply have too many features to use effectively within the old limit. Redrawing the same chart with ncurses would be less interesting, since the BSD 4.3 data is still smaller in relation to the current database.


Changes

Documentation

tctest (pdf) (postscript) (plain text)


Download