https://invisible-island.net/
Copyright © 2001-2019,2023 by Thomas E. Dickey

A Short Tutorial on ANSI-fication of C Programs

(top)
Background
Why ANSIfy C Programs?
What IS ANSIfication?
What Tools are Available?
Process
Miscellenous Notes

Background

I have been writing programs with C since 1983, when I wrote a meta-assembler using VAX C, porting it to Apollo and BSD 4.1. In those days, there was little or no syntax checking in C compilers. For instance, VAX C did not care if there was an unnecessary & before an array. It ignored it. That made learning C a bit harder than necessary - and made program maintenance and development harder. Later, I discovered lint, which was a Good Thing. I adopted the practice of running lint, before starting to debug, because it often found the bug that I had just noticed.

That was before ANSI C, of course. Though actually ANSI C was standardized before it was widely available. So (like other people), I adopted it piecemeal, using prototypes in header files, adding varargs (before converting to stdarg), converting some functions to ANSI form. I stopped writing code which did not take advantage of ANSI C's better type-checking around 1990, having spent about a year developing a system written in Ada.

In the mid-1990's I converted the larger programs I was working on to ANSI C (tin, vile, ncurses). My development focus had switched from SunOS 4 with a K&R compiler to Linux or Solaris with gcc or Sun's compiler. Unlike Sun's compiler, gcc could be told to give lots of warning messages which were useful for finding non-ANSI code. Actually some other compilers are better for this purpose, but they cost money, and generally run on only one or two platforms. (I do use them when they're available).

XFree86 is larger than the other programs (about 1.3 million lines when I started, and before some of the contrib programs were added). Initially I started looking at ANSIfying X when I got tired of filtering compiler warnings in my day job's legacy code. It seemed possible that I could get XFree86 to change their code, and that could be leveraged into getting X Consortium to adopt the changes. I made an initial set of changes in the server code, to test this, but as it happened, that was the final year for the X Consortium. I put that plan aside. Later, I converted xterm to ANSI C when it was clear that the former custodians (and their successors) were not going to maintain it any longer.

Shortly after, the XFree86 core group changed the compiler warnings used for building to stricter ones which would show problems in the code, as well as non-ANSI stuff. The resulting 8Mb logfile gave me some motivation to reduce its size. It was too large to see the pattern, so I wrote a simple utility (in September 1998) to filter the logfile and make a list of files which produced the most warnings.

Here is some sample output from a build log from Redhat 6.2.

Why ANSIfy C Programs?

ANSIfying code reduces maintenance effort, and allows me to work on much larger systems than with K&R code. It also extends the lifetime of existing programs.

What IS ANSIfication?

See this page for discussion of the term itself.

For C programs (the primary meaning), it depends on the application. At least convert all of the functions to use prototypes and the corresponding "new-style" function definition. If the application is a standalone program, then additional features of ANSI C (such as the use of const and new-style variable argument lists) should be incorporated.

Maintainers of libraries which must interface with existing applications should be careful to not alter the nature of the interface. In particular, these are problem areas:

when prototyping (converting the function definitions to use prototypes), one must take into account the types of the parameters. Different types take up different amounts of space on the stack. In the absence of prototypes, char and short parameters are treated as if they are first assigned to an int variable—that is, they take up the same amount of space as an integer. This is called argument promotion. Given a prototype, the compiler is free (depends on its design) to use less space on the stack for those parameters. If different callers to a function disagree on those sizes, the program will malfunction.
it is not sufficient to change the prototype only, leaving the function definition alone. Some compilers (including gcc) will (silently) change the parameter sizes.
introducing const into a function's definition changes the functions with which it can be compiled, even if the compiler does not change the stack alignment of the parameter list.
even converting an argument list from varargs.h to stdarg.h is claimed by some people to introduce possible incompatibilities but I have not found any practical cases which are not due to changes in argument promotion or using const. But it is something to consider.

What Tools are Available?

One would think that there are several tools for automatically converting C programs to ANSI prototypes. Perhaps there is a commercial product (I've never encountered one). Noncommercial programs include cproto and protoize. The former attempts to preserve argument promotion (but fails). The latter does not even try.

The tools that I do use are determined by my goal: convert as much of the code as possible without introducing functional changes. For XFree86 libraries, the goal is stricter: no changes to parameter alignment. In turn, the choice of tools determines the process.

I use the compiler to find the places to change and to ensure that there is minimal impact on the interface. Compiling with gcc without the -g (debug) option produces object files which will differ if an editing change modifies parameter alignment. At the same time, most editing changes that do not modify logic or alignment will not change the object code.

Shell scripts are useful for automating the compiler checks: Regress and remake. A good text editor is needed to carry out the process of following the compiler warnings, doing recompiles and occasionally undoing a set of changes.

Process

Very briefly, what I do (e.g., on XFree86) is

make a normal build
run utility to filter the 7-8Mb make-log into a listing of the number of warnings attributed to each file (this is mostly header files since they get included more than once).

That gives me a starting point. I look at why the warnings come about, which is usually because they're not prototyped, and decide which file I should try to resolve.

Bear in mind that changes to code which is not ifdef'd for the current platform will not be testable by edit/compile/compare, e.g., the Regress script. A further limitation is that some types may happen to be the same on the current platform, e.g., int/long on a 32-bit machine.

That said, with reasonable care you can convert most of a program to use prototypes without risk of altering the interfaces as used in the K&R original. Make the gcc warnings find the missing prototypes. Gcc will not find all of them; it lacks one corresponding to the ANSI compiler on IRIX which flags functions that are defined with K&R syntax which may have a prototype, e.g.,

int foo(void);

int foo() { return 1; }

But the IRIX compiler is not useful for this type of development, because it embeds line-number information into the object files even when debugging is disabled. Hence, deleting a blank line will result in a different object file. So we use gcc. Gcc's useful warnings include:

        -Wall
        -Wstrict-prototypes
        -Wmissing-prototypes
        -Wshadow
        -Wconversion

Because of gcc's blind spot (it does not flag functions defined with K&R syntax), I modify the header files last:

convert all private (static) functions to ANSI form.
move extern declarations to header files (choosing the right one can be a problem, since they are associated with type definitions).
convert the public functions to ANSI form.
finally, convert the header file's declarations to ANSI form.

Comparing object files periodically is needed to guard against unwanted changes. Inevitably, there are other things to fix (uninitialized variables, incorrect printf format, etc.). If I fix one of those, of course, I update the object files used for reference (and if the change is not obvious, snapshot the source files as well).

Some troublesome language features to watch out for:

signed/unsigned comparisons: when comparing or assigning between unsigned and signed values, the signed value is extended (adding bits) to match the precision of the unsigned type. Frequently an integer (signed) is used to represent a length and compared to an unsigned value, e.g., sizeof(foo). If the length happens to be negative, then C will treat it as larger than the size, giving an unexpected result.
const is nice, but do it later, and do not modify the documented interfaces. Otherwise existing programs simply will not compile with several compilers.
varargs may be lurking in the code. Accepted wisdom in some quarters when X was designed was to use something like this:

#define ARGS a1,a2,a3,a4,a5,a6,a7,a8,a9

int foo(ARGS) long ARGS; { ... }

int bar() { foo(1,2,3); }

rather than even use <varargs.h>.
parameters with other than default promotion

Caveat: this is not something I would do late at night (the whole process requires a clear head).

Miscellaneous Notes

An argument stack is assumed by most C programmers, though the underlying machine may not support a pushdown stack.
Sorry, there is no perfect tool. The limitation of compiler checks is in conditionally-compiled code which is not exercised on the test-platform, or which happens to pass the test, e.g., some obscure cases of byte-ordering.
Occasionally an editing change does change the object code, presumably due to some internal buffering of gcc.
Parameter alignment in old-style functions is affected by the presence of prototypes. I learned about this the hard way in 1993 by attempting to port some GNU utilities to Apollo SR10.4. Most of the GNU utilities then (and still now) are written in extended C rather than ANSI C. That is, they have function prototypes mixed with new-style function definitions as in the example below:

int foo(char a);
int foo(a)
char a;
{ }

The Apollo C compiler did the wrong thing with this: it compiled the prototype (and calls against it) with a stack-alignment for the char parameter, but the function definition with a stack-alignment for the char parameter promoted to an integer. While other contemporary C compilers may have internally done the same thing, on Apollo this malfunctioned because the machine's byte ordering (the Motorola 68000 series) resulted in moving the parameter so the function could not get the data. On other architectures such as Intel, this does not happen because of the way the bytes are ordered. Even on SunOS (Sparc) where the bytes are ordered as on the Apollo, the data stored on the stack is apparently aligned to 4-byte integers.