http://invisible-island.net/personal/
Copyright © 2015-2022,2023 by Thomas E. Dickey
As a software developer, I use many tools. Here are some comments about some of the analysis tools which I have encountered.
When I began programming, there were no static (or dynamic) analyzers, no profilers, no debuggers. There was only a compiler (initially in two parts: a precompiler and a compiler). In lieu of a debugger, the IBM 1620 conveniently responded to use of an uninitialized variable by going into an infinite loop. There were no compiler warnings (only errors). This was in fortran, of course.
A little later, as a graduate student, I used the watfiv (fortran) compiler, which had better diagnostics (many more error messages, still no warnings). There were other languages than fortran; I encountered a computer science student who said he was programming in SAIL, and that because of this, his program was automatically well-structured and well-formatted.
Perhaps. I encountered people in 1990 who said the same about Ada.
But my initial exposure to compilers was that they produced
error messages, rather than warnings. The first case where I
recall discussing compiler warnings was a few years
later regarding Univac's “ASCII FORTRAN” (Fortran
77). I showed Henry Bowlden a program where I complained that the
compiler had not treated a goto
into an
if
-then
-else
construct as
an error—or at least warned about it. By then,
compilers had evolved to
provide useful warnings or advice on ways to improve the program, e.g., to make the optimizer's job easier.
provide multiple diagnostics for a program (rather than simply give up on the first error, or worse, compound one problem into multiple error messages)
Later languages and tools introduced type-checking. Besides making the resulting programs more reliable, it made the languages easier to learn. Here are a couple of examples from the early 1980s:
Even assembly languages were affected. For instance, the 1970s DEC assemblers used a special syntax for register names (with a percent-sign as part of their definition), and would produce an error if an untyped digit were used where a register was expected. By comparison, IBM assembler (dating from the 1960s) did not make this distinction, and was harder to learn — in 1982.
VMS C 1.0, which I used in 1983 to develop a meta-assembler, had poor
diagnostics. It would "do the right thing" if the
&
symbol were omitted where an address was
needed. Again, that made it harder to learn than using VMS C
2.0, which had diagnostics.
Things went along in this way for some time, with newer compilers providing better (more specific) diagnostics.
In contrast to compiler diagnostics (static analysis), the means for testing programs lagged:
In 1979, I thought I would improve the performance of my macro assembler by changing the way it translated to/from radix-50 (three characters in 16 bits). Finding that the program did not get faster, I wrote a (crude) profiler to see where it was spending time. It was spending 90% of the time in the operating system. About a year later, using the program on a lab computer with floppy disks, I found the answer: the buffer size used for disk reads was too small.
In 1985, I added command record/playback to HCAB (the CAD program which I was developing), to ensure that I could reproduce error conditions. This was more successful, making it easy to isolate problems in the code by replaying over and over again.
I did the same in my directory editor ded, in August 1992, and in lynx in June 2000.
Instrumenting programs to make them testable or to measure their performance takes time and requires running the programs in controlled conditions. But compilers give warnings for free, same result every time.
There is no sure-fire technique for improving programs by static analysis. Rather, there are a variety of techniques which can be learned, based on various tools. All of these evolve over time.
I started using lint
in 1986, continuing into the
late 1990s. Initially it “just worked” and became the
first thing I would do when I found that new changes to a program
broke it. lint
would usually find the problem.
By itself, lint
could only go so far. It came
with lint libraries telling the program about the interface of
the C runtime (before C was standardized). But if I used a C
library not known to lint
, it was less useful.
I learned to make lint libraries. Initially (in 1991 or 1992)
I did these by hand, converting header files (with comments for
the parameter names) into the quasi-prototype form used by
lint
. The header files for X libraries were large,
taking a lot of time.
I noticed cproto early in
1993, and sent Chin Huang my changes to allow it to generate lint
library sources. Having those, I could compile the sources and
get usable lint libraries. This set of changes appeared in
June 1993. I made
further changes (including making the enhanced error reporting
which works with the different types of yacc
), and
worked with Chin Huang off and on for the next few years.
I used this feature in ncurses (June 1996) to generate lint
library sources for the four component libraries (ncurses, form,
menu and panel). However (unlike SunOS), Solaris support for
lint
was poor. Sun delivered stubs for the lint
libraries and its tools were not capable of producing usable lint
libraries. I used other platforms for lint
, as long
as those were available. Judging by my email, by around 2000
lint
was gone.
Even after the tool itself was no longer useful, I kept the
lint library text in ncurses because it is useful as a
documentation aid. Updating the files was not completely
automatic, e.g., changing attr_t
to int
to match the prototypes for the legacy attribute manipulation,
and of course adding the copyright notice. Finally, in 2015 I
wrote a script make-ncurses-llibs
to generate the
sources without requiring manual editing, and used that in
preparing the ncurses6 release.
As an alternative to lint
, gcc
has
some advantages and some disadvantages. The main disadvantage is
that it has no way to detect that a function is not used in any
of the various modules that comprise a program.
While I preferred lint
, I found the compiler
warnings from gcc
useful. For instance, in a posting
to comp.os.linux.misc in September 1995, I remarked
From dickey Wed Sep 6 07:50:15 1995 Subject: Re: Lint for Linux? Newsgroups: comp.os.linux.misc References: <42ia4f$9k2@sifon.cc.mcgill.ca> <42iir4$81a@solaria.cc.gatech.edu> Organization: Clark Internet Services, Inc., Ellicott City, MD USA Distribution: Lines: 28 X-Newsreader: TIN [UNIX 1.3 950824BETA PL0] Byron A Jeff (byron@cc.gatech.edu) wrote: : In article <42ia4f$9k2@sifon.cc.mcgill.ca>, : Marc BRANCHAUD <marcnarc@cs.mcgill.ca> wrote: : > : >Hi, all... : > : >I'm trying to find a copy of lint to use with Linux. I've scoured : >sunsite and a few other places, and I can't find one at all. Does it : >even exist? : : I'll give the standard party line: : : gcc -Wall : : will give you about the same level of warning messages as lint. Actually, no. There's a number of checks that lint does that gcc doesn't (read the gcc internal to-do list stuff). When I don't have lint, I use gcc -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion (though -Wconversion has been butchered in the last release, so it's really only useful with gcc 2.5.8) -- Thomas E. Dickey dickey@clark.net
I used those options in a wrapper script, gcc-normal
.
I also used an improved set of options in a wrapper script,
gcc-strict
,
from a posting by Fergus Henderson. I created both of these
scripts in April 1995. The former has grown a little since then,
to allow it to ignore warnings from OSX and Solaris header files.
Originally it was just this:
#!/bin/sh
# these are my normal development-options
OPTS="-Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wconversion"
gcc $OPTS "$@"
With a configure script, those scripts can override the default compiler, e.g.,
CC=gcc-normal ./configure ./configure CC=gcc-normal
There were other compilers, of course. The best warnings came
from DEC's compiler for OSF/1 (later Tru64). On other platforms,
lint
was where checking occurred, so the compilers
were neglected.
However, gcc
was available for most of the
platforms on which I was developing.
A wrapper script worked well enough, except when defining
quoted values on the command-line. To work around that, I
incorporated a --with-warnings
or
--enable-warnings
option into my configure scripts.
Until August 1997, I
maintained those within the separate programs'
aclocal.m4
files, before combining them into an
archive using acsplit
and acmerge
.
Initially I just added checks for the available warnings, and
later (in
2002) added macros to the archive for the options
themselves.
Besides enabling warnings, my checks also tested for
gcc support for features which were based on lint. You may have
seen this chunk in curses.h
:
/*
* GCC (and some other compilers) define '__attribute__'; we're using this
* macro to alert the compiler to flag inconsistencies in printf/scanf-like
* function calls. Just in case '__attribute__' isn't defined, make a dummy.
* Old versions of G++ do not accept it anyway, at least not consistently with
* GCC.
*/
#if !(defined(__GNUC__) || defined(__GNUG__) || defined(__attribute__))
#define __attribute__(p) /* nothing */
#endif
/*
* We cannot define these in ncurses_cfg.h, since they require parameters to be
* passed (that is non-portable). If you happen to be using gcc with warnings
* enabled, define
* GCC_PRINTF
* GCC_SCANF
* to improve checking of calls to printw(), etc.
*/
#ifndef GCC_PRINTFLIKE
#if defined(GCC_PRINTF) && !defined(printf)
#define GCC_PRINTFLIKE(fmt,var) __attribute__((format(printf,fmt,var)))
#else
#define GCC_PRINTFLIKE(fmt,var) /*nothing*/
#endif
#endif
#ifndef GCC_SCANFLIKE
#if defined(GCC_SCANF) && !defined(scanf)
#define GCC_SCANFLIKE(fmt,var) __attribute__((format(scanf,fmt,var)))
#else
#define GCC_SCANFLIKE(fmt,var) /*nothing*/
#endif
#endif
#ifndef GCC_NORETURN
#define GCC_NORETURN /* nothing */
#endif
#ifndef GCC_UNUSED
#define GCC_UNUSED /* nothing */
#endif
I added that to ncurses in July 1996 to get checking comparable to these lint features:
/* PRINTFLIKEn */ makes lint check the first (n-1) arguments as usual. The n-th argument is interpreted as a printf(3) format string that is used to check the remaining arguments. /* SCANFLIKEn */ makes lint check the first (n-1) arguments as usual. The n-th argument is interpreted as a scanf(3) format string that is used to check the remaining arguments. /* NOTREACHED */ At appropriate points, inhibit complaints about unreachable code. (This comment is typically placed just after calls to functions like exit(3)). /* ARGSUSEDn */ Makes lint check only the first n arguments for usage; a missing n is taken to be 0 (this option acts like the -v option for the next function).
lint
uses comments, gcc provides attributes but
some compilers may provide __attribute__
as a macro,
some as a built-in symbol. When I started writing these configure
scripts:
__attribute__
symbol as a macro (using #define
), while gcc made
it a built-in.Taking that into account, I made macros which would not rely on the macro/symbol difference, and wrote configure checks to ensure that the corresponding attributes were supported. Looking at the source-code for 2.7.2.3, released in July 1997, it seems that gcc by then supported each attribute that I used. But the configure checks are still used.
Still later (2004 and 2012), the Intel and clang compilers provided useful (and different) warnings. I added checks for these under the existing warning options. Although both try to imitate gcc, using the compiler-specific options gives better results, since neither imitates all of the options which I use.
Conversely, gcc's developers (apparently starting in version 8) have apparently decided to imitate clang. Writing in March 2019, that effort appears to have been largely unsuccessful, producing as many misleading messages as useful ones. For example, all of this came from one line of code in a build of ncurses using gcc 9.0.1:
In file included from ../ncurses/./tinfo/write_entry.c:39: ../ncurses/./tinfo/write_entry.c: In function ‘_nc_write_entry’: ../ncurses/curses.priv.h:865:18: warning: ‘%s’ directive writing up to 4095 bytes into a region of size 4094 [-Wformat-overflow=] 865 | #define LEAF_FMT "%c" | ^~~~ ../ncurses/./tinfo/write_entry.c:469:7: note: in expansion of macro ‘LEAF_FMT’ 469 | LEAF_FMT "/%s", ptr[0], ptr); | ^~~~~~~~ ../ncurses/./tinfo/write_entry.c:469:18: note: format string is defined here 469 | LEAF_FMT "/%s", ptr[0], ptr); | ^~ In file included from ../ncurses/curses.priv.h:259, from ../ncurses/./tinfo/write_entry.c:39: ../include/nc_string.h:81:46: note: ‘sprintf’ output between 3 and 4098 bytes into a destination of size 4096 81 | #define _nc_SPRINTF NCURSES_VOID sprintf ../ncurses/./tinfo/write_entry.c:468:2: note: in expansion of macro ‘_nc_SPRINTF’ 468 | _nc_SPRINTF(linkname, _nc_SLIMIT(sizeof(linkname)) | ^~~~~~~~~~~
That is, the message points to a character format as the problem, rather than the (possibly) too-long string format which is appended to the character. At the same time, gcc is not doing enough analysis to determine that the appropriate checks were already made (a failing of clang as well).
I investigated lclint (later renamed to splint) in 1995, but found it too cumbersome and fragile to use because it could not handle the regular header files.
But Coverity is a
useful product. It follows a program through several plausible
steps using the conditional checks to infer critical values of
variables, to look for inconsistencies. Most of the reports are
valid, i.e., about 90%. It will occasionally find serious
problems with a program. Like lint
, the only way to
find those is to fix the minor issues along the way.
My email first mentions it used by FreeBSD in August 2004. Largely due to promotional efforts by the Coverity people, I grew interested enough to ask to have my project scanned in April 2006. A year later, dialog and ncurses were accepted for scanning, and I made several fixes using its reports.
The initial workflow for this tool was based on releases, which does not mesh well with my development process. Coverity was more or less forgotten for a few years until they streamlined the submission procedure, and I got involved again in 2012. Along with streamlining things, they made it simpler to submit projects for scanning, and I added several of the programs which I maintain.
I have been using clang for both its compiler warnings and the static analysis feature since May 2010.
Clang provides useful checks also, but has some shortcomings relative to Coverity:
On the other hand, it runs on my machines, and can be run many times without running into resource limits.
The FreeBSD developers have replaced gcc with it (except for ports). Clang's developers have made it quasi-compatible with gcc, i.e., it is not only able to handle the non-standard headers but also most of the command-line options. That introduces problems with configure scripts because clang does not return error status for unrecognized options, even in many cases where the option directs gcc to report an error.
However, not all is well with Clang:
On OSX (macOS), this quasi-compatibility is aggravated by
Apple's decision to make a copy of clang installed as
/usr/bin/gcc
.
Interesting enough, the wrappers for
c89
and
c99
on OSX are less usable than
those provided by other platforms, although Clang's version
number is generally higher (e.g., 9.x versus 4.x), because
those do not accept the standard command-line options.
For instance, macOS clang 11.0.0 c89 breaks when
a -D
option is not followed by
a blank.
Since the standard options do not work reliably for c89/c99, it is probably unsurprising that nonstandard options have problems.
For example, Apple's configuration for
c89
and
c99
is less than useful,
because those use -W
to specify
32-bit versus 64-bit applications (not mentioned in
POSIX, hence not standard). In Xcode 12.2, Apple pre-set
a
-Werror
option (copied from gcc) to produce
an error when encountering an implicit function declaration,
for clang and its aliases. That makes the latter unusable in
autoconf configure scripts. I worked around that problem by
making the configure script transform the alias into an
equivalent clang option (see
CF_CLANG_COMPILER
).
Beyond that, none of Clang's
c99
wrappers are useful anyway,
because it emits warnings for offsetof for structure
members.
For example, with MacOS 10.13.2 (late 2017), I got these results from test-builds of xterm:
c89
(clang)gcc
6.4.0clang
9.0.0c99
(clang)The difference between 2557 and 2673 is due to a type-error in XQuartz's header file
/usr/X11/include/X11/Xpoll.h
which has been present for many years.
The bigger number is a known, unresolved defect in clang:
The offsetof issue has been reported (see [llvm-bugs] [Bug 10065] Bogus pedantic warning when using offsetof(type, field[n])) and (as of the end of 2017), ignored.
Coming back to this a year later, the bug-database entry for 10065 was unchanged. However, a test-build with clang 10.0.0 on MacOS no longer displayed that defect. The build log shrank by 95%.
The reduction in warnings on MacOS may have been due to a local (MacOS-specific) patch. In 2022 (more than five years later), the same problem exists with clang 13.0.0 on FreeBSD 13.1 (an odd coincidence of versions).
The cppcheck
project started early in 2009; I
first noticed reports for it late in 2009. As I noted on the
xorg mailing list (after reading the cppcheck
source-code):
On Sat, 3 Oct 2009, Martin Ettl wrote: > Hello friends, > > further analysation with the static code analysis tool cppcheck brought up another issue. The tool printed the following warning: > > .../xfree86/common/xf86AutoConfig.c,337,possible error,Dangerous usage of strncat. Tip: the 3rd parameter means maximum number of characters to append > > Take a look into the code at line 337: > ..... > char path_name[256]; > ..... > 334 if (strncmp(&(direntry->d_name[len-4]), ".ids", 4) == 0) { > /* We need the full path name to open the file */ > strncpy(path_name, PCI_TXT_IDS_PATH, 256); > 337 strncat(path_name, "/", 1); > strncat(path_name, direntry->d_name, (256 - strlen(path_name) - 1)); > ..... > > I is possible (suppose to be the string PCI_TXT_IDS_PATH) is 256 > characters long) that the array path_name is allready filled. Then (lin > 337) an additional character is appended --> array index might be go out > of range. It's possible, but cppcheck isn't that smart. It's "only" warning that its author disapproves of strncat. cppcheck only notes its presence in the code, makes no analysis of the parameters. It's a _checking_ tool, by the way, not to be confused with static analysis. (dynamic allocation as an alternative is not necessarily an improvement)
According to the project metrics, it has grown by a factor of six since I first saw it in 2009. The checks have been improved (reducing the false positives) and checks added. But essentially it is still a style checker. A static analyzer is expected to go beyond that, doing analysis of the data flows—more than the diagnostics emitted by an optimizing compiler pointing out unused or uninitialized variables do.
I pointed this out in 2015:
cppcheck is essentially only a style-checker (and like other tools which incorporate the developer's notion of "good style", its usefulness depends on various factors).
There are suitable tools for detecting memory leaks (such as valgrind); cppcheck is not one of those. Of course, you will find differing opinions on which are the best tools, and even on what a tool is suitable for, e.g., a blog entry *Valgrind is NOT a leak checker *
Daniel Marjamäki, a cppcheck developer, did not disagree with my comment in his response. At the time that I wrote those comments, I was at that moment mulling over a Coverity report regarding an incorrect sign-extension check in xterm ResizeScreen for patch #319.
On the other hand, I have found the tool useful. David Binderman reported some warnings from cppcheck 1.73 which I used to improve the style of xterm, in patch #325. Most of the changes were to reduce the scope of variables.
Static analysis is only part of the process. When I was first introduced to computer programming, I was told:
Any interesting program has at least
- one loop,
- one I/O statement, and
- one bug.
The situation has not improved; interesting programs can be run in many more ways than a static analyzer can see. Dynamic analysis is used to explore a few ways the program might be run, to look for performance and reliability problems.
I started looking for memory leaks in November 1992 by
modifying a function
doalloc
in my directory editor's library. I had
written this in 1986 as a wrapper for malloc and realloc (since
pre-ANSI C could not be relied upon for reallocating a
null pointer).
Because I made all calls to malloc or realloc through this
function, I could easily extend it to check for memory leaks.
This was back in the days when tools for leak-checking were not common. So I wrote my own,
To do this effectively, the program which is being analyzed must also be modified to free memory which was allocated permanently, i.e., not normally freed while the program is running. This process exposes inconsistencies in the program's use of memory and usually will make it fail. Later (by 1996), I used ElectricFence to trigger these failures in the debugger, where I could get the file- and line-information from a traceback.
Later I encountered other programs using a similar approach: keep track of memory allocated, and report just before exiting. I improved some of those. For instance:
vile — I added
functions for freeing “permanent” memory starting
in February 1993.
Those were all ifdef'd with NO_LEAKS
, following
my practice with ded.
lynx had compile-time checks for this as part of making a "dbg" flavor. This was one of 41 targets in the original makefile, before I replaced it with a configure script:
# for some reason loc_t isn't defined when compiling for debug on my system.
# needed for NLchar.h
dbg:
cd WWW/Library/osf; $(MAKE) CC="gcc" LYFLAGS="-DDIRED_SUPPORT \
-DLY_FIND_LEAKS"
cd src; $(MAKE) all CC="gcc" MCFLAGS="-O -Wall $(ADDFLAGS) \
-DFANCY_CURSES -DLY_FIND_LEAKS \
-Dloc_t=_LC_locale_t -D_locp=__lc_locale\
-DDIRED_SUPPORT -DOK_TAR -DOK_GZIP -DOK_OVERRIDE \
-DUNIX -I../$(WWWINC) -DEXEC_LINKS \
-DALLOW_USERS_TO_CHANGE_EXEC_WITHIN_OPTIONS $(SITE_DEFS)" \
LIBS="-lcurses -ltermcap \
$(WAISLIB) $(SOCKSLIB) $(SITE_LIBS)" \
WWWLIB="../WWW/Library/osf/libwww.a"
I made the LY_FIND_LEAKS
definition into a
configure option (v2-7-1ac-0.6,
in March 1997) and provided a run-time option to turn it on
(2.8.5dev.13,
in January 2003).
One drawback to the approach I used with doalloc
was that it relied upon debug-code which I compiled into the
directory editor. It would be nice to have an add-on library that
I could just re-link the program to get memory-leak
checking (which is fast), rather than re-compile (which
is slow).
People posted useful programs to Usenet; I collected useful programs for my projects. I went in search of a memory-leak checker, at the same time that I was extending doalloc.
The first one that I found was dbmalloc in November 1992, written by Conor Cahill, originally posted to comp.sources.misc in volume 32 (September 1992). This was patch-level 14; Cahill had posted a much smaller version in comp.sources.unix volume 22 (July 1990).
While dbmalloc worked, it had a few drawbacks:
dbmalloc.h
)
which redefined the standard string- and memory-functions in
terms of its own functions,libdbmalloc.a
).libc.a
) and patching those object files
to allow dbmalloc to intercept calls to memcpy
,
memmove
and memset
.That aspect of patching the system library was a drawback: it did not always work, and became less viable as shared libraries became more prevalent. Also, it was awkward not being able to turn the leak-checking off in a given executable. I looked for alternatives, and found dmalloc 3.1.0, published in July 1995.
There were later releases (see its homepage), but for this discussion the initial version from 1995-1996 is appropriate.
According to its NEWS file, it was first published 1993/04/06 on comp.sources.unix (but it is not present in the archives). Its ChangeLog states that development began in March 1992. The earliest reliable mention of dmalloc that I have found is a comment in Mail Archives: djgpp/1995/04/05/16:10:59 by Marty Leisner that he had ported it to DJGPP the previous year. dmalloc's changelog comments on that for July 1994. That was probably version 2.1.0, from 1994/5/11.
Like dbmalloc, one must include its header and link with its static library to use it. On the other hand:
The two libraries used a different approach toward intercepting function calls to allow diagnosing them. dbmalloc
memcpy
andIn contrast, dmalloc
malloc
, realloc
,
calloc
, free
, as well as the
nonstandard cfree
).At the same time, I preferred the reports from dbmalloc. It also seemed to provide better coverage.
I also found dbmalloc to be more robust. Indeed, on revisiting
the tools to write this summary, I find that it was easy to get
dbmalloc to build and work, but that is not the case for dmalloc.
For both, it was necessary to delete their conflicting prototypes
for malloc
, etc. But dmalloc's configure script
malloc.h
, making
dmalloc not generate its own file, and-traditional
option, making dmalloc not define its
symbol CONST
.Here is a simple demo to show how a program would be instrumented for both tools:
#include <stdlib.h>
#include <stdio.h>
#ifdef DBMALLOC
#include <dbmalloc.h>
#endif
#ifdef DMALLOC
#include <dmalloc.h>
#endif
int main(void)
{
char *oops = malloc(100);
#ifdef DBMALLOC
malloc_dump(fileno(fopen("dbmalloc.log", "w")));
#endif
exit (oops != 0);
}
Turning on the DBMALLOC
definition for the first
tool, here is the resulting logfile:
************************** Dump of Malloc Chain **************************** POINTER FILE WHERE LINE ALLOC DATA HEX DUMP TO DATA ALLOCATED NUMBER FUNCT LENGTH OF BYTES 1-7 -------- -------------------- ------- -------------- ------- -------------- 0200C088 demo.c 14 malloc(1) 100 01010101010101 0200C178 unknown malloc(2) 568 8034ADFB010101
Here is a report generated with dmalloc 3.1.0 (after making fixes for the problems noted):
1: Dmalloc version '3.1.0'. UN-LICENSED copy. 1: dmalloc_logfile 'dmalloc.log': flags = 0x4f41d83, addr = 0 1: starting time = 1471795976 1: free count/bits: 31/7 1: basic-block 4096 bytes, alignment 8 bytes, heap grows up 1: heap: start 0x6d8000, end 0x6db000, size 12288 bytes, checked 1 1: alloc calls: malloc 1, realloc 0, calloc 0, free 0 1: total memory allocated: 112 bytes (1 pnts) 1: max in use at one time: 112 bytes (1 pnts) 1: max alloced with 1 call: 112 bytes 1: max alloc rounding loss: 16 bytes (12%) 1: max memory space wasted: 3952 bytes (96%) 1: final user memory space: basic 0, divided 1, 4080 bytes 1: final admin overhead: basic 1, divided 1, 8208 bytes (66%) 1: not freed: '0x6da008|s1' (100 bytes) from 'demo.c:14' 1: known memory not freed: 1 pointer, 100 bytes 1: ending time = 1471795976, elapsed since start = 0:0:0
Because neither tool did everything, I used both, depending on what I was trying to do.
I first mentioned dmalloc in email in July 1996, and incorporated it and dbmalloc as options in the configure script for ncurses later that year (1996/12/21). Up until then, I would simply edit the makefile for programs because I preferred to not make debugging features part of the build-scripts. I did this for ncurses first, because it uses several makefiles.
Later, I made similar changes to other configure scripts:
Finally, in 2006 I combined these options in a configure macro, and gradually added that to each program that I maintain.
Checking for memory leaks is not the only useful thing to do at runtime. Developers would like to know how effective their test cases are, by measuring test-coverage. In 1995, some people at Bell Labs made available a version of ATAC. I made some changes to allow it to work with gcc 2.7.0 (see the iBiblio archive), and was interested in this for a few years.
Here is my original announcement:
From dickey Sun Dec 31 13:10:39 1995 Subject: atac (test-coverage) Newsgroups: comp.os.linux.announce Organization: Clark Internet Services, Inc., Ellicott City, MD USA Summary: atac3.3.13 ported to Linux Keywords: test coverage Lines: 16 X-Newsreader: TIN [UNIX 1.3 950824BETA PL0] Status: RO ATAC was written by some folks at BellCore, who're working on a newer version (licensed). I modified the version that they made available for free so that it'll run on Linux with gcc 2.7.0 (mostly by accommodating the non-standard features of gcc), and uploaded it to sunsite.unc.edu (now in Incoming). I've found it very useful for setting up regression tests for my programs. ATAC measures how thoroughly a program is tested by a set of tests using data flow coverage techniques, identifies areas that are not well tested, identifies overlap among tests, and finds minimal covering test sets. Atac displays C source code, highlighting code fragments not covered by test executions. -- Thomas E. Dickey dickey@clark.net
I used this for improving the test-cases for cproto and diffstat. I also built ncurses with it, in September 1996. Unlike cproto and diffstat, there is no set of test cases for ncurses which can be run in batch mode. It was interesting to explore the coverage of the ncurses test-programs, but I quickly found that doing this properly would take a little time:
From dickey Wed Sep 25 20:33:03 1996 Subject: test-coverage of ncurses... To: ncurses-list@netcom.com (Ncurses Mailing List) Date: Wed, 25 Sep 1996 20:33:03 -0400 (EDT) X-Mailer: ELM [version 2.4 PL24alpha3] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Content-Length: 939 Status: RO Just for grins (I had it on my list for this month, to gauge feasibility), I build ncurses with atac (a test coverage tool that I did some work on last year, so I could use it for analysis of some of my programs). If/when I have time, I've got on my list to design a test for lib_doupdate.c that'll exercise more than the 5-10% of the paths/conditions that're being tested at present. Basically atac generates a listing showing the places (and conditions) in the code that aren't exercised (i.e., it highlights them). The listings tend to be long, since most paths aren't exercised, and I didn't spend a lot of time pruning the data down (the shortest I generated is ~11000 lines). I use a script that converts vile into a pager (I'll post it if anyone needs it). I've put the listings I generated in ftp.clark.net:/pub/dickey/ncurses/atac-output.zip (I'll leave it there a couple of days). -- Thomas E. Dickey dickey@clark.net
But gcc 2.7.2 introduced a problem, as I commented to Jürgen Pfeifer early in 1997:
From dickey Mon Jan 27 20:45:38 1997 Subject: simpler solutions To: Juergen.Pfeifer@t-online.de (Juergen Pfeifer) Date: Mon, 27 Jan 1997 20:45:38 -0500 (EST) X-Mailer: ELM [version 2.4 PL24alpha3] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Status: RO Content-Length: 1615 Lines: 31 I looked closely at the atac vs ({ ... }) conflict, and realized that this is supposed to be a gcc extension. The one file that's giving me trouble isn't ifdef'd to turn it off when gcc extensions are to be suppressed. (And the gcc #define __STRICT_ANSI__ isn't turned on when I set -ansi). However, understanding the problem, I can work around it w/o modifying atac. The ({ ... }) delimits a compound statement, which is a block that can appear (almost) anywhere an expression can. That'd be rather difficult to implement in atac, since it completely changes the notion of flow control. So I'm working around it by modifying <sys/time.h> to be ANSI-compliant (there's code in the X distribution that works just fine). I'm willing to fix bugs in atac, but not to maintain a whole host of non-standard stuff.
The “compound statement” is a GCC extension statement expression. GCC's statement expressions are used in glibc's header files. Most compilers accept some extensions, but this particular one changes the syntax rather than just adding a keyword. An ANSI C compiler cannot handle this extension.
ATAC works by using the C preprocessor to add line- and column-markers to every token, and starting with that, adds its own functions for counting the number of times each part of the program is executed.
That sounds simple, but there are a few problems:
int
) interchangeably with
pointers (char*
).All of that is technical, and could be fixed. However, its incorporation of GNU C preprocessor makes the resulting license too restrictive to bother with.
In writing this page and looking for the state of the art in
1992 when I modified doalloc
, I found a contemporary
paper citing a 1992 paper with this bibliographic entry:
R. Hastings and B. Joyce. Fast detection of memory leaks and access errors. In Proceedings of the Winter ’92 USENIX conference, pages 125–136. USENIX Association, 1992.
The actual paper has "Purify" in the title.
We did not have Purify at the Software Productivity Consortium (SPC). It came on the market a little too late to be of interest to the management.
I encountered it after I left SPC, and joined a large development project. The management there was interested, and seeing that I was involved in making several improvements (such as converting to ANSI C), I was asked to evaluate two tools for possible use in 1995:
Insure++ was interesting because it had a nice user interface depicting the problems found and the corresponding stack traces. However, it was very slow on our Solaris servers. Purify performed reasonably well, and Quantify seemed as if it could be useful.
Even Purify was not fast. It relies upon making a modified executable (see patents by Reed Hastings), and for a newly compiled executable that takes time. Even after “purifying” the executable, it uses more memory (because it makes a map of the data to keep track of what was initialized, etc.). Our servers did not have much memory by today's standards. Also, some resource settings for Motif applications were ignored, making the windows lose their background pixmaps.
On the other hand, Purify required less setup than
dbmalloc/dmalloc to use (essentially just linking with the
purify
tool). No special header file was needed. It
has *functions* which can be called, but for casual use, those
(like most of the functions in dbmalloc/dmalloc) are
unnecessary.
Quantify was not that useful. Our applications spent most of their time in a tiny area of its graphical display and the GUI had no way to select an area for detailed analysis.
Quantify was not the only tool for profiling.
I was familiar with gprof by the time I started working with Eric Raymond and Zeyd Ben-Halim in 1995. After the initial discussion of incorrect handling of background color and (inevitably) compiler warnings, I sent this:
From dickey Sun Apr 23 06:57:14 1995 Subject: profiling comparison (brief) To: esr@snark.thyrsus.com, esr@locke.ccil.org, zmbenhal@netcom.com (Zeyd M. Ben-Halim) Date: Sun, 23 Apr 1995 06:57:14 -0400 (EDT) X-Mailer: ELM [version 2.4 PL24alpha3] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Content-Length: 1970 Status: RO Here's a first cut of the differences between profiling my test case with 1.8.7 and 1.9. (I added/subtracted in my calculator program to account for the major differences between the numbers from gprof). As you'll see, the biggest (but not all) differences were from 1.36 wnoutrefresh .84 wrenderchar .66 write .38 relative_move .32 wchangesync .33 wladdch My impression of the other numbers is that you're making fewer I/O calls, but possibly emitting more characters. Next pass, I'll make the mods to allow redirection of the screen, so I can get a character count to verify this hunch. btw, in one of zeyd's most recent patches, the definition of TERMINFO got moved, so that the program doesn't make w/o patching the makefile. ------------------------------------------------------------------------------- +5.56 5.56 #mcount (profiling 1.8.7) -8.40 -2.84 #mcount (1.9) +32.53 29.69 #ncurses (total 1.9) -25.72 3.97 #ncurses (total 1.8.7) -- actual difference +11.94 15.91 #write (1.8.7) -12.61 3.30 #write (1.9) +1.54 4.84 #IDcTransformLine (1.8.7) -1.62 3.22 #IDcTransformLine (1.9) +1.20 4.42 #wladdch (1.8.7) -1.53 2.89 #wladdch (1.9) +1.01 3.90 #_IO_vfprintf (1.8.7) -.80 3.10 #_IO_vfprintf (1.9) +.52 3.62 #_IO_file_overflow (1.8.7) -.39 3.23 #_IO_file_overflow (1.9) +.41 3.64 #strcat (1.8.7) -.61 3.03 #strcat (1.9) +.36 3.39 #tparm (1.8.7) -.41 2.98 #tparm (1.9) +.35 3.33 #_IO_default_xsputn -.26 3.07 #_IO_default_xsputn (1.9) +.34 3.41 #strcpy -.39 3.02 #strcpy (1.9) +.30 3.32 #wnoutrefresh -1.66 1.66 #wnoutrefresh (1.9) -.84 .82 #wrenderchar (1.9) -.38 .44 #relative_move (1.9) -.32 .12 #wchangesync (1.9) +.22 .34 #baudrate (1.8.7!) +.18 .52 #waddnstr -.25 .27 #waddnstr (1.9) -.00 .27 #baudrate (1.9) -- Thomas E. Dickey dickey@clark.net
I began the autoconf-generated configure script shortly after (during May 1995). After some discussion regarding the libraries that should be built, I extended the script to generate makefile rules for profiling. Combining all of the flavors in one build was not my idea (since it makes things unnecessarily complex). But I agreed to make the changes:
### ncurses 1.9.2c -> 1.9.2d * revised 'configure' script to produce libraries for normal, debug, profile and shared object models. ### ncurses 1.9.1 -> 1.9.2 * use 'autoconf' to implement 'configure' script. * panels support added * tic now checks for excessively long termcap entries when doing translation * first cut at eliminating namespace pollution.
The profiling libraries for ncurses are not used often (if something is used, I get bug reports). Debian's package description says they are present in the debugging package for ncurses. But they are not there. The changelog says they have been gone a while:
ncurses (5.2.20020112a-1) unstable; urgency=low * New upstream patchlevel. - Correct curs_set manual page (Closes: #121548). - Correct kbs for Mach terminal types (Closes: #109765). * Include a patch to improve clearing colored lines (Closes: #112561). * Build even shared library with debugging info; we strip it out anyway, but this makes the build directory more useful. * Build in separate object directories. * Build wide character support in new packages. * Change the -dbg packages to include debugging shared libraries in /usr/lib/debug; lose the profiling and static debugging libraries; ship unstripped libraries in -dev. * Don't generate debian/control or debian/shlibs.dummy. * Use debhelper in v3 mode. -- Daniel Jacobowitz <dan@debian.org> Wed, 16 Jan 2002 22:20:00 -0500
For a while before that change, profiling had fallen into disuse with Linux because no one was in a hurry to get gprof working after the transition to ELF in the late 1990s. For example, here is part of a message I sent to vile's development list:
From dickey Mon Jul 5 21:52:37 1999 Subject: vile-8.3k.patch.gz To: vile-code@foxharp.boston.ma.us (Paul Fox's list) Date: Mon, 5 Jul 1999 21:52:37 -0400 (EDT) X-Mailer: ELM [version 2.4 PL25] Content-Type: text Status: RO Content-Length: 1950 Lines: 36 vile 8.3k - patch 1999/7/5 - T.Dickey <dickey@clark.net> Miscellenous fixes. The biggest change is that I've got about 2/3 of the fixes I had in mind for long lines. This makes the ruler info (and related cursor positioning) cached so it runs much faster. That's for nonwrapped lines (I got bogged down in the logic for wrapped lines - it works, but the caching isn't doing anything useful there). For both (wrapped/nonwrapped) there are other performance improvements. I used gcov to find the code I wanted to change. Has anyone got a gprof that works with Linux ELF format?
and in 2001, I had this to say:
Date: Thu, 5 Jul 2001 14:44:48 -0400 (EDT) From: "Thomas E. Dickey" <dickey@herndon4.his.com> To: <os2-unix@eyup.org> Subject: Re: [UnixOS2] ncurses.build In-Reply-To: <Pine.GSO.4.21.0107051917500.15084-100000@cdc-ultra1.cdc.informatik.tu-darmstadt.de> Message-ID: <Pine.BSI.4.33.0107051443020.12415-100000@herndon4.his.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-os2-unix@eyup.org Precedence: bulk Reply-To: os2-unix@eyup.org Status: RO Content-Length: 639 Lines: 22 On Thu, 5 Jul 2001, Stefan Neis wrote: > On Thu, 5 Jul 2001, Thomas E. Dickey wrote: > > > > --with-debug \ > > > --with-profile \ > > > > I don't know if profiling works on OS/2. > > Works nicely since IIRC EMX-0.9a, standard -pg switch to gcc, so I > suppose it'll work for ncurses, too. "works" can be relative - it's been broken on Linux since the switch to ELF libraries some years ago (it "works" now only in the sense that the compile and gprof run, but the numbers are worthless because there's no timing information). -- T.E.Dickey <dickey@herndon4.his.com> http://dickey.his.com ftp://dickey.his.com
Still, after a while, gprof was useful:
I have used it frequently in improvements for mawk, starting in September 2009 with a discussion with Jim Mellander about the hashing algorithm. Rather than embed rules for profiling as done in ncurses, I added another wrapper script for configuring with the profiling option:
#!/bin/sh
CFLAGS='-pg' cfg-normal "$@"
Doing this for ncurses (to just get useful profiling
libraries) is different, because I want to use the profiling
version of the C runtime as well. Otherwise timing for
strcpy
, memset
, etc., will be
overlooked. Here is a script which I started in 1995, updated
last in 2005 to accommodate a changed name for the C runtime
profiling library:
#!/bin/sh
CFLAGS='-pg' \
LDFLAGS='-pg' \
LIBS="-lc_p" \
cfg-normal \
--with-profile \
--without-normal \
--without-debug \
"$@"
I began using gcov (GNU coverage analyzer) in 1999. It was not a replacement for ATAC because:
The last point (a different format of gprof) is the way I have used it, e.g., with vile, mawk and other programs. Using gprof to study more than one file is cumbersome.
Along the way, the options I needed for running gcov changed, from
#!/bin/sh
LDFLAGS="--coverage" \
CFLAGS='-fprofile-arcs -ftest-coverage' \
cfg-normal "$@"
to just
#!/bin/sh
CFLAGS='--coverage' cfg-normal $*
Some
documentation states that the profile-arcs
and
test-coverage
options are needed.
I began using valgrind
in 2002, on more than one
program:
Urs Janßen summarized the choices in discussing a bug report
ccmalloc (<http://www.inf.ethz.ch/personal/biere/projects/ccmalloc/>) and valgrind (<http://developer.kde.org/~sewardj/>) are very usefull (on Linux), MMS (<http://hem1.passagen.se/blizzar/mss/> also look very usefull last time I tested it (~2 years ago).
and
*shrug* as I said before, I haven't used checkergcc for ages. insure looked like a good leak/bounds checker but it very expensive. ccmalloc is a bit noisy but imho the best freeware leak checker. valgrind has an acceptable noise level but misses some usefull informations like exact line numbers. I can't remember how good MMS was (but it must have been better than dmalloc, dbmalloc and electric fence).
Until 2004, I used Slackware for my development machine, and
routinely built things like valgrind
as they became
available. I had a comment:
On Mon, 30 Sep 2002, Martin Klaiber wrote: > Urs Janßen wrote: > > > *shrug* as I said before, I haven't used checkergcc for ages. > > Probably a bug in checkergcc. In the docs they say that libc and > ncurses are supported. I'll contact the developers. perhaps (but when I last looked at checkergcc a few months ago on Debian, there was no support for the current glibc, so it was useless). valgrind is interesting, but I had to hack it to make it run with Slackware 7.1 (it makes too many assumptions about the header files)
The nice thing about valgrind
was that it
produced reports comparable to purify. And it was free.
Because effective use of valgrind
requires the
program to be built with the debugging option, I modified the
configure scripts to combine the --with-no-leaks
or
--disable-leaks
options as a new option
--with-valgrind
. I did that first for vile, then ncurses, late in 2006.
Like purify
, valgrind
gave warnings
that could be regarded as false positives. And like
purify
, valgrind
allows those to be
suppressed by configuration files. Ruben Thomas contributed a few
sample suppression files for ncurses in 2008.
In practice, I do not use those, relying on the
“no-leaks” configuration for my testing. Either way,
others rarely use the suppression files and do not build no-leaks
configurations of ncurses (see Testing for Memory
Leaks in the ncurses FAQ).
Valgrind
comes packaged with default suppression
files. For instance, there are 152 items in
default.supp
on my Debian/testing system in 2016.
The one for ncurses has 15.
When I check for leaks in xterm
I have to ignore
about 5000 lines of listing. I could generate a
suppression file using valgrind
(and experimented
with this early in 2006, suppressing 285 items). But once
ignored, even some useful cases can be overlooked.
Late in
2006, I reorganized my autoconf macros for memory-leak
checks, so that doalloc, dbmalloc, dmalloc and valgrind were all
handled by the same set of macros. Purify
also was
supported by these macros, but the main ones are
--with-no-leaks
and --with-valgrind
. I
kept the existing --disable-leaks
option in
configure scripts such as ncurses which had provided this for
some time.
I started using lcov early in 2014:
LCOV is a graphical front-end for GCC's coverage testing tool gcov. It collects gcov data for multiple source files and creates HTML pages containing the source code annotated with coverage information. It also adds overview pages for easy navigation within the file structure. LCOV supports statement, function and branch coverage measurement.
I mentioned it as a way to organize regression tests while discussing with Tom Shields a large change that he proposed making to byacc (i.e., incorporating the btyacc code).
This may have been the article which I came across while looking for a better test-coverage tool:
Since then, I have used it as well for cproto, diffstat, indent, mawk and xterm.
Early on, in 1995-1996, my goal was to convert all of the
K&R code which I was maintaining to ANSI C, so that I could
use function prototypes, const
and the standard
variable-argument syntax. I took care to avoid changing the
program unnecessarily while making these changes, relying on
changes to object files to alert me
to unintended changes.
Working with Paul Fox and others on vile
, I had some concern
about being able to build the program on machines where ANSI C
was not supplied by the vendor (HPUX and SunOS). I used unproto, on these platforms, and for
a while, compatibility with unproto
determined what
features of ANSI C I would use. In particular, it could not
handle string concatenation in the C preprocessor.
Not all of this went smoothly. I became the maintainer for
vile
rather suddenly after one of my changes made a
variable const
which should not have been.
Making a variable “const
” asks the
compiler and linker to make it read-only. Developers do
this to ensure that a function parameter does not change within
that function.
One unexpected issue with const
is that it is not
possible to retrofit an application such as lynx or ncurses without using casts. That is
because some of the longstanding interfaces use
non-const
values, introducing an inconsistency. But
casts hide type mismatches. While there is (and I use) a compiler
warning to tell me about this, it is possible to introduce errors
due to isolated cases where a “const” variable is
modified.
I added a configure option --enable-const
to
ncurses which changes several of these inconsistent interfaces to
use const
, noting that this is a difference versus
X/Open Curses. In NetBSD curses, the developers changed
interfaces to use const
(saying that they had gotten
agreement from someone at The Open Group), but did not make it
configurable. This is not reflected in X/Open Curses
Issue 7 (in 2013). Finally with ncurses6 in 2015, I changed the
default for this option, to use const
.
Besides making C programs (a little) type-safe, using
const
also improves the time needed to load the
program. With Linux, you can see the loader statistics by setting
the environment variable LD_DEBUG=statistics
.
Besides compiler warnings some of my work with
const
has been aimed at improving the loader
statistics.
Constant/read-only variables are not the only aspect to
cleanup when doing a conversion to ANSI C. There are
variable-length argument lists to consider. Using
gcc
you can declare that a given function is
“printflike” or “scanlike” — making
the compiler check calls to the function as if they were
printf
or scanf
. Both vile and lynx had interesting
quirks to work around.
When I first started working on vile
, its internal
messages were formatted with lsprintf
(a function
that was much like sprintf
, but different):
sprintf
(a subset is not a problem),sprintf
, e.g., for padding (a problem), text to
move it against the right margin, andsprintf
because (some of
the code dated to the late 1980s) there were conceivably
low-end systems without an ANSI C runtime.Because of the nonstandard formatting features provided by
lsprintf
it was not possible to use the
gcc
compiler warnings to look for parameter
mismatches. When I realized this, I started by making a patch for
gcc
which would implement the extra features. But I
decided it was a bad idea, not because the features were
non-standard (gcc
assumes the GNU C library, which
has a few non-standard features), but because it meant I
would have to patch the compiler — repeatedly.
Instead, I modified lsprintf
to use standard
formatting codes and revised some calls to lsprintf
to avoid those features which had no counterpart in standard C.
Most of the changes for lsprintf
dealt with the
lower-level dofmt
function. I did that rapidly in 1999:
+ modify dofmt() to make its repertoire closer to printf's: remove 'D' format code, and change 'p' to 'Q', 'u' to 'f'.
but the process of revising calls took some time, e.g., in 2004
and 2005:+ modify dofmt() to handle "%*.*s" and "%.*s" formats, removed "%*S" format.
+ fixes to build clean(er) with Intel compiler: + adjust ifdef's for fallback prototype for open() in case it is really open64(). + add a "%u" format type to dofmt(), use it to display file-sizes. + define DIRENT to struct dirent64 for the case where _FILE_OFFSET_BITS is 64 and _LP64 is not defined. Use a configure script check to ensure the type exists. + add/use function bpadc() to replace "%P" and "%Q" formats in dofmt(). + add/use function format_int() to replace "%r" format in dofmt().
Of course, GNU C is not the only runtime with
printf
or scanf
extensions (vendor Unix
systems have some longstanding quirks in this area which are not
in standard C). But in practice the only extensions which I
use are those which provide better diagnostics without
interfering with portability.
My work on lynx in 1998 included
changes to reduce the possibility of buffer-overflows. At that
point, lynx
used its own functions for allocating
and copying memory for strings, but used sprintf
for
formatting strings.
The latter was a problem because not all of the buffer-sizes
were checked, as reported by Jeffrey Honig in
May 1998. The suggested “Fix” was not a solution,
because snprintf
requires the same
information for its correct use as the (presumably correct)
checks already in place.
Rather than imitate snprintf
with its emphasis on
repeating the programmer's assumptions about fixed buffer sizes,
a more appropriate solution should take into account that many of
the formatting operations in lynx
used string
copying and concatenation to avoid using
sprintf
. If there were a portable function which
could format into a dynamically-allocated string, that would
solve the problem as well as making lynx
easier to
maintain.
There were (are are as of 2016) no suitable standard
functions for this purpose. Standardization lags. Originally
written by Chris Torek in the early 1990s, snprintf
was incorporated in NetBSD, OpenBSD, FreeBSD, and
adapted by Linux and Solaris developers. Finally it was
standardized in
POSIX (issue 5 was published in 1997). At the time, that
meant that newer platforms would have the function, but older
ones (such as SunOS) would not. See for example
As you can see by referring to the page by Martinec, if
snprintf
had been a good-enough solution for Lynx,
and I had chosen to not write one specially for porting
applications, I could have waited a year or two and gotten
someone else to write it.
While not suitable for use by a portable program, the
asprintf
function in the GNU C library was
interesting because it showed that a version of
sprintf
was possible with a dynamically allocated
output buffer. Solaris developers were slower here (finally
appearing in
Solaris 11 twelve years later):
A number of new routines are included in the Oracle Solaris C library to improve familiarity with Linux and BSD operating systems and help reduce the time and cost associated with porting applications to Oracle Solaris 11 Express. Examples of new routines include
asprintf()
,vsprintf()
,getline()
,strdupa()
andstrndup()
.
It has not been standardized (as of 2016).
Again, asprintf
did not do what I needed in
lynx
:
Taking all of that into account, I added two functions
HTSprintf
and HTSprintf0
later that year. The latter creates a new string, the former
appends to a string. Lynx uses both:
HTSprintf0
, 20
HTSprintf
HTSprintf0
, 70
HTSprintf
The two are closely related: HTSprintf0
is a
special case of HTSprintf
, since it sets the
destination to an empty buffer and calls the latter function.
From the counts, you can see that lynx
uses the more
general form most of the time.
In my initial version, I made this “portable” (not
relying on any non-standard function). If lynx
is
built on a machine which has vasprintf
, it will use
that function. One reason is that the NLS support (message files)
may use an obscure feature for plurals:
/*
* If vasprintf() is not available, this works - but does not implement
* the POSIX '$' formatting character which may be used in some of the
* ".po" files.
*/
Rather than double the amount of work, I chose to use
vasprintf
, which is available with Linux and the
BSDs. On other platforms, lynx
uses the easily
ported code:
Compiler warnings help show when the format cannot handle a
given data type (and the resulting printout will be incorrect).
Compiler warnings come into play in lynx
with a
couple of troublesome data types:
off_t
This data type is associated with file size,
because the lseek
function uses an offset with
this type when positioning a file descriptor within a file.
The original lynx
developers decided that was
the same as long
and passed the
filesize around — and printed it — with a format
for long
. Times changed, and
machines got bigger: off_t
can have more bits
than long
. C has no predefined format for
printing off_t
, but does for long
.
A cast is needed, as well as definitions for printing
format.
One reason why it is so complicated is that even when
off_t
and
long
happen to be the same
size, gcc has a preference (made known via compiler warnings)
for the “right” type to use. The configure script
can easily determine the size of these types, but getting the
compiler to tell which are the preferred names for a given
size is harder.
/*
* Printing/scanning-formats for "off_t", as well as cast needed to fit.
*/
#if defined(HAVE_LONG_LONG) && defined(HAVE_INTTYPES_H) && defined(SIZEOF_OFF_T)
#if (SIZEOF_OFF_T == 8) && defined(PRId64)
#define PRI_off_t PRId64
#define SCN_off_t SCNd64
#define CAST_off_t(n) (int64_t)(n)
#elif (SIZEOF_OFF_T == 4) && defined(PRId32)
#define PRI_off_t PRId32
#define SCN_off_t SCNd32
#if (SIZEOF_INT == 4)
#define CAST_off_t(n) (int)(n)
#elif (SIZEOF_LONG == 4)
#define CAST_off_t(n) (long)(n)
#else
#define CAST_off_t(n) (int32_t)(n)
#endif
#endif
#endif
#ifndef PRI_off_t
#if defined(HAVE_LONG_LONG) && (SIZEOF_OFF_T > SIZEOF_LONG)
#define PRI_off_t "lld"
#define SCN_off_t "lld"
#define CAST_off_t(n) (long long)(n)
#else
#define PRI_off_t "ld"
#define SCN_off_t "ld"
#define CAST_off_t(n) (long)(n)
#endif
#endif
time_t
Since it is used far more often, one would suppose that
time_t
would be less of a problem. Again, I
solved this with a thicket of ifdef's. But on some platforms,
gcc decides that time_t
is really an
int
and warns when I use a %ld
(long) format:
/*
* Printing-format for "time_t", as well as cast needed to fit.
*/
#if defined(HAVE_LONG_LONG) && defined(HAVE_INTTYPES_H) && defined(SIZEOF_TIME_T)
#if (SIZEOF_TIME_T == 8) && defined(PRId64)
#define PRI_time_t PRId64
#define SCN_time_t SCNd64
#define CAST_time_t(n) (int64_t)(n)
#elif (SIZEOF_TIME_T == 4) && defined(PRId32)
#define PRI_time_t PRId32
#define SCN_time_t SCNd32
#if (SIZEOF_INT == 4)
#define CAST_time_t(n) (int)(n)
#elif (SIZEOF_LONG == 4)
#define CAST_time_t(n) (long)(n)
#else
#define CAST_time_t(n) (int32_t)(n)
#endif
#endif
#endif
#ifndef PRI_time_t
#if defined(HAVE_LONG_LONG) && (SIZEOF_TIME_T > SIZEOF_LONG)
#define PRI_time_t "lld"
#define SCN_time_t "lld"
#define CAST_time_t(n) (long long)(n)
#else
#define PRI_time_t "ld"
#define SCN_time_t "ld"
#define CAST_time_t(n) (long)(n)
#endif
#endif
Generated code should compile with as few (or fewer)
warnings than normal source-code. For both byacc
and reflex
I have done this.
That is not true of bison
and “new”
flex
.
I use lex/flex for most of the syntax filters in vile
. Occasionally someone
wants that to support “new” flex
. In a
recent (July 2016) episode I made some build-fixes to make that
work. But as I reported in Debian
#832973, I preferred to not use that tool because it added
more than 25,000 lines of warnings to my build-logs.
I started writing build-scripts when I started working on
programs that took more than a minute or two to compile. For
example, in 1996 I wrote this build-x
script:
#!/bin/sh
WD=`pwd`
LEAF=`basename $WD`
if [ $LEAF = xc ];then
head -1 programs/Xserver/hw/xfree86/CHANGELOG |sed -e 's/^/** /' >make.out
cat >>make.out <<EOF
** tree: `pwd`
** host: `partition`
EOF
run nice make-out World
else
echo '** You must be in the xc-directory'
fi
Later, I found that it helped to construct build-scripts which knew about the specific compilers available on different machines — so that I could verify that my programs built correctly with each compiler. I collected logs from these builds, starting in 1997 (both ncurses and lynx). However, these collections were not systematic; I did not at first store the logs in a source repository to allow comparison, but settled for a record of the “last good build” to use in trouble-shooting the configure scripts.
There were a few exceptions: as part of my release process for
lynx
I kept the logfile from a test-build on the
server at ISC which hosted lynx. I started that in December
1997.
But for the other programs: my versioned archives for ncurses build-logs start in July 2006. Other programs followed, as well as more elaborate (and systematic) build-scripts. For an overview of those, see my discussion of sample build-scripts. As of September 2016, I have 218 scripts for building programs, in various configurations. I collect the logs from these, and compare against the previous build — and look for new problems such as compiler warnings.
I track build-logs to avoid introducing problems for others who build my programs (either packagers or individual developers). None of my machines run continuously, and a build-server would make little sense (because I have to test on many platforms), so a set of scripts provides a workable solution.
Packagers give the most immediate feedback when there is a problem building ncurses or xterm. They typically use a particular set of machines, with build-servers. Packaging and systematic builds go together.
In a few cases, others contributed scripts for building my programs and creating packages:
but this was not done systematically. Also (until around the time that I got involved in packaging), Linux packagers did not as a rule provide source repositories for their packaging scripts from which one could get useful information about the reasons for package changes. The BSD ports on the other hand provide historical information but are strongly dependent on the structure within which a port is built.
I began packaging all of my programs in 2010 when I started using virtual machines for the bulk of my development. Most of these packages use either "deb" (Debian and derived distributions) or "rpm" (Red Hat, OpenSUSE and others). I wrote (and am still writing) a set of scripts to manage this. Rather than adding onto an existing upstream source and adding version information to that source, my packaging scripts work within the existing structure and use my existing versioning scheme.
During this process, I finally got around to dropping
support for K&R compilers in the configure script checks
for the C compiler. The unproto
program was of
course long unused.
For each program XXX
:
bump-XXX
uses a program-specific
naming convention for the working directory to determine the
target version and updates version information in the program's
files to match the target. It also makes/updates an entry in
the changelog
file for my Debian
test-package.clean-XXX
uses the makefile rule for
cleaning a working directory, removing all generated
files.release-XXX
cleans a working
checked-out directory for a given package XXX,
verifies that administrative stuff (such as copyright dates)
are in order, and builds/signs a tarball suitable for upload or
test-builds. These scripts also extract the change list for
later use in email.build-all-XXX
does platform-specific
test builds, and makes packages.upload-XXX
uses the files prepared by
release-XXX
, putting those on my ftp
site. It prepares the email using text previously prepared,
along with a list of files on the ftp site with their
URLs.While developing a new set of changes, I “release” several updates for packaging and make test-builds, comparing the build-logs to the previous release to eliminate new compiler warnings. The build scripts actually invoke several scripts depending on the availability of packaging tools and cross-compilers. Each produces a log file.
As of September 2016, I have 77 release scripts, including 22 which generate documentation for my website. I have not written scripts for everything: I have 14 programs in my to-do list for scripting.
Paid work, of course.
When I first used lint
in 1986, I was working
with another developer. I pointed out that including
<string.h>
in a program would give the correct
return-type for strcpy
, making it unnecessary to use
a cast:
char *p = (char *)strcpy(target, source);
He refused to make the change, giving as his reason:
My supervisor on another job told me to do it this way.
At the time (1985-1987), we were developing networking applications for M68K-based computers. The C compiler for those machines used one set of registers for data, and another set for addressing. Without a suitable declaration:
char *strcpy();
the compiler would assume (cast or no cast) that
strcpy
returned an integer. The cast would cause the
the resulting data register to be used as an
address.
I found lint
useful anyway. It would tell me
about cases where I made a typo, using fprintf
where
I meant printf
, e.g., this error
fprintf("Hello World!\n");
The alternative (without lint
) was a core
dump.
At the time I wrote the gcc-normal
and
gcc-strict
scripts, I found a use for
gcc
solely for its warnings. This helped me
to improve the code quality of a fairly large system which had
been written in K&R C, retrofitted to some POSIX features,
and ported to many platforms. I used gcc
warnings to
flag places where
I kept track of my progress by keeping the build-logs in the source repository, and measuring the daily rate of change using c_count and diffstat in a cron job. Due to its lack of warranty, we did not use it for the end product.
As part of the process, I made the cron job send everyone on
the development team a daily report of the diffstat
.
That was ... not popular. I stopped the email. But I kept on with
the compiler warnings, converting the programs to ANSI C.
Seeing that a few developers made a majority of the changes, I discussed the compiler warnings with those people. Some liked the idea. One developer, however, got up immediately and left the office when I sat down next to him. When I caught up with him, he had not calmed down, saying:
I know what you're trying to do, and I think it's good. But I just can't stand it.
So not everyone liked fixing compiler warnings. This developer was fairly productive, but liked to work alone in the evenings when others were not around. One morning I was chatting with another developer when I happened to notice a hole in the wall, perhaps 8-10 inches in diameter. I remarked that I hadn't seen that before. The person I was talking to remarked that (the other developer) had done it. "How…", I began. He replied that (the other) wore boots. Enough said.
During the last year or so that I was at the Software Productivity Consortium, I spent some time reviewing and suggesting improvements to programs that I found on the Internet. One of those was gcc. In its bootstrap, it compiled a program named “enquire” which it used to determine the sizes of various datatypes. That in particular had a lot of compiler warnings (because the code ignored the difference between signed and unsigned values), but other parts of gcc needed work as well.
I sent a patch for gcc (which fixed about 5,000 warnings) to gcc's developers (probably early 1993). Richard Stallman responded, a little oddly I thought: he asked what I wanted them to do with the patch. I replied that I wanted them to use it to improve the program. I heard no more. If they did incorporate any of the fixes, there was no record of that in later versions.
I kept that in mind, and sent no more fixes to gcc's developers.
Expanding a little on my remarks
here, quite a while ago, Richard Stallman sent a message to
mailing list explaining that others found warnings useful, but
that he did not. Likely that had some influence on the default
compiler warnings as well as the choice of options which comprise
-Wall
:
Date: Thu, 2 Sep 1999 23:19:27 -0400 Message-Id: <gnusenet199909030319.XAA08701@psilocin.gnu.org> From: Richard Stallman <rms@gnu.org> To: gnu-prog@gnu.org Subject: On using -Wall in GCC Reply-to: rms@gnu.org Resent-From: info-gnu-prog-request@gnu.org Status: RO Content-Length: 841 Lines: 17 I'd like to remind all GNU developers that the GNU Project does not urge or recommend using the GCC -Wall option. When I implemented the -Wall option, I implemented every warning that anyone asked for (if it was possible). I implemented warnings that seemed useful, and warnings that seemed silly, deliberately without judging them, to produce a feature which is at the upper limit of strictness. If you want such strict criteria for your programs, then -Wall is for you. But changing code to avoid them is a lot of work. If you don't feel inclined to do that work, please don't let anyone else pressure you into using -Wall. If people say they would like to use it, you don't have to listen. They're asking you to do a lot of work. If you don't feel it is useful, you don't have to do it. I never use -Wall myself.
In gcc-2.7.2.3's ChangeLog.4 file, the
-Wstrict-prototypes
option (though present in gcc
1.42 in January 1992) is first mentioned as being distinct from
-Wall
:
Thu Nov 21 15:34:27 1991 Michael Meissner (meissner at osf.org) * gcc.texinfo (warning options): Make the documentation agree with the code, -Wstrict-prototypes and -Wmissing-prototypes are not turned on via -Wall; -Wnoparenthesis is now spelled -Wno-parenthesis. (option header): Mention that -W options take the no- prefix as well as -f options.
Also, in documentation (gcc.info-3), it appeared in the section begun by this paragraph:
The remaining `-W...' options are not implied by `-Wall' because they warn about constructions that we consider reasonable to use, on occasion, in clean programs.
The option itself was documented like this:
`-Wstrict-prototypes' Warn if a function is declared or defined without specifying the argument types. (An old-style function definition is permitted without a warning if preceded by a declaration which specifies the argument types.)
The reason for the option being separate is easy to
understand, given the context: this was only a few years after C
had been standardized, and few programs had been converted to
ANSI C. gcc had other options to help with this, e.g.,
-Wtraditional
.
Developers of complex tools have to keep in mind compatibility. Moving options between categories is guaranteed to break some people's build scripts. For instance, gcc also has
`-Werror' Make all warnings into errors.
which some people use regularly. Needlessly turning on warnings that developers had earlier chosen to not use and stopping the compile as a result is not a way to maintain compatibility.
For more context on -Wall
versus
-Wstrict-prototypes
it helps to read the entire
section rather than selectively pick out text. The last paragraph
in current documentation for -Wall
for instance
points out that -Wall
is not comprehensive, and that
ultimately the reason for inclusion was a matter of judgement (as
in the original documentation):
Note that some warning flags are not implied by -Wall. Some of them warn about constructions that users generally do not consider questionable, but which occasionally you might wish to check for; others warn about constructions that are necessary or hard to avoid in some cases, and there is no simple way to modify the code to suppress the warning. Some of them are enabled by -Wextra but many of them must be enabled individually.
As for whose judgement that was – it would be the original developers of gcc around 1990.
Around the same time (1993/1994), I sent Mike Brennan
suggested changes for mawk. Some of
those were prompted by lint
warnings. Those he
rejected as unnecessary. For example, one of the diffs would have
begun like this:
--- execute.c.orig 1996-02-01 00:05:42.000000000 -0500
+++ execute.c 2016-08-11 18:30:56.726609195 -0400
@@ -219,7 +219,7 @@
}
while (1)
- switch (cdp++->op)
+ switch ((cdp++)->op)
{
/* HALT only used by the disassemble now ; this remains
@@ -234,13 +234,13 @@
case _PUSHC:
inc_sp() ;
- cellcpy(sp, cdp++->ptr) ;
+ cellcpy(sp, (cdp++)->ptr) ;
break ;
case _PUSHD:
inc_sp() ;
sp->type = C_DOUBLE ;
- sp->dval = *(double *) cdp++->ptr ;
+ sp->dval = *(double *) (cdp++)->ptr ;
break ;
case _PUSHS:
Interestingly enough, gcc has nothing to say about
that. Compiling execute.c
with
gcc-normal
gives me 40 warnings, and with
gcc-strict
84 warnings. So there is something to be
said, even without lint
.
After Brennan released mawk 1.3.3 in November 1996, there was no maintainer except for packagers until I adopted it in September 2009. I noticed this because one of the packagers made an inappropriate change involving byacc and mawk. Debian had accumulated 8 patches; I incorporated those and set about making the normal sort of improvements: compiler warnings, portability fixes and bug-reports.
One of the bug-reports dealt with gsub
(Debian
#158481).
Brennan had written this to recur each time it made a
change. I made an initial fix to avoid the recursion in December 2010, but it was
slow. Returning to this after some time, I was in the middle of
making a plan to deal with this in August 2014 when I received
mail from Brennan.
It was always my intention to return to mawk and fix some mistakes. Never intended to wait 15+ years, but now I am going to do it. I hope you want to cooperate with me. If so, let's figure out how we both work on mawk. If not, let's figure out how to separate.
I recalled his attitude toward lint
and compiler
warnings, but agreed, saying:
ok. I have some changes past the last release/snapshot (working off/on, since there are other programs...). At the moment I was mulling over how to measure performance with/without -Wi for https://code.google.com/p/original-mawk/issues/detail?id=12 so... I'll put that aside for the moment, and see about adding your patch, resolving any rejects and then making a snapshot available for discussion. (I'm not currently working in the area you mentioned - was working on some simple stuff while thinking how to revise gsub - intending to make a public release once _that_ is done).
That lasted 5 weeks, ending because we were not in agreement regarding compiler warnings. Here is one of my replies:
| What requires changes? Compiles -Wall without a peep. Are you adding gcc -Wall is sub-minimal, actually. To see minimal warnings, use the configure script's "--enable-warnings" option. For development work (as opposed to warnings which some packager might consider using), I use the gcc-normal and gcc-stricter scripts here: http://invisible-island.net/scripts/readme.html#build_scripts
Also, Mike regarded the no-leaks code as unnecessary, proposing a change to remove it. His parting message in September was:
I've decided to work on something else. Please discard the code I sent you on 20140908. Your gsub3 works just as well as my gsub, so use you own code not mine. It will be easier to maintain code you wrote yourself. Also, remove me from 2014 in the version display.
In August 2016, Mike Brennan posted to comp.lang.awk his announcement of a beta for mawk 2.0, stating in the README file:
In my absence, there have been other developers that produced mawk 1.3.4-xxx. I started from 1.3.3 and there is no code from the 1.3.4 developers in this mawk, because their work either did not address my concerns or inadequately addressed my concerns or, in some cases, was wrong. I did look at the bug reports and fixed those that applied to 1.3.3. I did switch to the FNV-1a hash function as suggested in a bug report.
From the discussion above, the reader can see that the "in some cases, was wrong" refers to compiler warnings and checking for memory leaks. Because Brennan expressed no other concerns during five weeks, likely the entire sentence is focused on that one issue. For the record,
gcc-normal
and fails to compile using
gcc-strict
(due to an inadequate configure
script).gcc-normal
and 16
warnings using gcc-strict
(due to an unavoidable
use of pointers to functions).The Usenet thread X11, Xt, Xm and purify on comp.windows.x in late 1998 illustrates the differences of opinion between developers and vendors.
Not all warnings are beneficial. When they are overdone, they
are detrimental. Consider snprintf
and its followups
strlcpy
,
strlcat
.
In my first encounter with these in 2000, it was to modify the
names of the strlcpy
function which I wrote in 1990 to avoid
conflict with the latter (one of several cases where BSD header
files included non-standard functions without any ifdef's to
avoid namespace pollution).
My function lowercases its argument. The OpenBSD function
attempts to remedy the ills of the world by offering a better
version of strncpy
. Conveniently enough, there is an
strncat
variant. There are a few drawbacks to using
the standard
strncpy
and
strncat
:
If the actual (null-terminated) source string is shorter
than specified, strncpy
appends null characters
to fill the destination to the specified length.
If the actual source string is longer than specified,
strncpy
does not supply a terminating null
character on the destination.
While strncat
is not documented with either
of these problems, there is the possibility that users might
expect it to be consistent with strncpy
by using
the total size of the destination buffer rather than
the unused size of the destination.
Those are mentioned in the paper strlcpy and
strlcat - consistent, safe, string copy and
concatenation by Miller and de Raadt (USENIX
99). But none of that was an original observation by the
OpenBSD developers. Just from my own experience, while
strcpy
, etc., are used in the source code
(usually correctly), dynamically allocated strings are the way to
go:
For my directory editor ded
I wrote stralloc
in 1987 to replace cases where one might use
strdup
(reducing memory by allocating one copy)
and dyn_string
in 1992 to allocate strings.
In the text editor vile (vi like emacs), the
strmalloc
function (equivalent to
strdup
) dates from 1991. It has a replacement
for strncpy
(fixing the string-terminator issue)
dating from 1994. However, starting with my changes for
TBUFF
in 1993, vile
has generally used dynamically
allocated strings except for special cases such as
buffer-names.
In the web browser lynx, it has used
dynamically allocated strings since 1995, with some holdouts
for fixed buffers related to terminal width and filename
length. My work on HTSprintf
starting in
1998 reduced those special cases. However — read this
thread
lynx-dev Lynx buffer mismanagement — to be
reminded of the difference between work and criticism,
constructive or otherwise.
Quoting from the paper by Miller and de Raadt,
We plan to replace occurrences of
strncpy()
andstrncat()
withstrlcpy()
andstrlcat()
in OpenBSD where it is sensible to do so.
Lynx
was in OpenBSD
base from March 1998 until July 2014 (16
years). OpenBSD developers did not modify any of those calls
in lynx
. Rather, they played a minor role in
reporting bugs and making small fixes. For instance,
comparing the last labeled release (2.8.7rel.2 and the
OPENBSD_5_5_BASE), shows 21 files differing. But 15 of those
files are unrelated to fixes, but rather overlooking
deleted files, removing the help-message from the configure
script, OpenBSD-specific URLs, etc. Here is a diffstat for
the files with fixes:
WWW/Library/Implementation/HTNews.c | 2 -
WWW/Library/Implementation/dtd_util.c | 2 -
configure.in | 7 ++++-
src/HTML.c | 42 +---------------------------------
src/LYJump.c | 2 -
src/makefile.in | 6 ++--
6 files changed, 13 insertions(+), 48 deletions(-)
That is about 50 lines changed in a program with 180
thousand lines of C code. Finally lynx
was
removed from base to allow more frequent
updates via OpenBSD
ports (see
mailing list thread).
According to the CVS history, those functions were added in 1998, with substantial changes going into 2001. Later (July 2003) after relicensing the replacements, these standard functions were modified to force the linker to warn about their use:
The same was done later when incorporating these functions (though on OpenBSD the wide-character functions are little used):
On the other hand, there are no analogously improved versions
of
scanf
or
sscanf
, etc., and there is no warning about
their use. The developers have been inconsistent.
I left ncurses
out of the list. The paper
mentioned ncurses
, so some discussion is needed.
What I notice chiefly about the paper is that it gives no
interesting numbers. The benchmarks for single
functions are (as the paper admits) contrived to show an
advantage to strlcpy
.
Single functions are not interesting because they do not show whole-program performance.
The paper actually mentions one case, stating that
changing a single function was enough to improve performance
of tic
by a factor of four. That refers to this
item in 1999/03/06:
+ recode functions in name_match.c to avoid use of strncpy, which caused a 4-fold slowdown in tic (cf: 980530).
However in the linked item, Miller's patch replaced two
occurrences of strcpy
by strncpy
(which would slow the program down on that change.
The "4-fold" is probably a quote from someone's email by the
way.
Rather than focus on speedup of single functions, useful measurements are based on sets of operations, e.g., using tctest or vttest in combination with profiling tools.
Finally, the paper gives no numbers regarding the number of programs analyzed, the number of serious problems found relative to the total number of uses. Without numbers that whole aspect of the paper falls down because there are no measurements, nothing to compare with other research.
Todd Miller provided several fixes for ncurses
but none of those dealt with a buffer overflow. There was one
report of a buffer overflow — on the
bug-ncurses mailing list in October 2000, and on
freebsd-security
a week later (after I had made fixes in 2000/10/07). I addressed that
by doing what I had done with my directory editor (see above). Incidentally there was a
different report earlier that year (mentioned
in the FreeBSD discussion by the person who had reported it),
against 1.8.6 which was then about 5 years out
of date, having been released in October
1994.
The changes in OpenBSD in 2003 pointed out that
ncurses
still used strcpy
and
strcat
, e.g., to copy to a newly allocated chunk of
memory with the correct size.
For example, linking the ncurses
C++ demo
produces these messages:
linking demo ../lib/libncurses++.a(cursslk.o)(.text+0xde): In function `Soft_Label_Key_Set::Soft_Label_Key::operator=(char*)': ../c++/cursslk.cc:46: warning: strcpy() is almost always misused, please use strlcpy() ../lib/libncurses.so.6.0: warning: strcat() is almost always misused, please use strlcat() ../obj_s/demo.o(.text+0xbf): In function `TestApplication::init_labels(Soft_Label_Key_Set&) const': ./cursesf.h:64: warning: sprintf() is often misused, please use snprintf()
Like lynx
, ncurses
was then —
and still is — part of OpenBSD base. In
that case, Todd Miller globally substituted to use the
strlcpy
, etc., in 2003 (in preparation for the
change to warn about strcpy
, etc.). As of September
2016, OpenBSD CVS has ncurses 5.7 (released in 2008/11/02).
Early in 2012
(after ncurses 5.9) I added a configure option to make it simpler
and less error-prone if OpenBSD updates ncurses
in
the future. It is called
“--enable-string-hacks
,” and uses macros
to switch between strcpy
and strlcpy
,
etc. I did this to demonstrate that the checks either way were
equivalent.
The strlcpy
and related programs are only as good
as the buffer-size which is passed to them. In real programs, the
buffer-size may be passed through several levels of function
calls. Or it may simply be assumed, via a symbol definition.
Those cases would require a static analyzer to verify the
calls.
On the other hand, there are cases that a simple checker should be able to do. Consider this test program:
#include <stdio.h>
#include <string.h>
#include <bsd/string.h>
static int
copyit(char *argument)
{
char buffer[20];
strlcpy(buffer, argument, sizeof(buffer) + 40);
return (int) strlen(buffer);
}
int
main(int argc, char *argv[])
{
int n;
for (n = 0; n < argc; ++n) {
copyit(argv[n]);
}
return 0;
}
Running the program and passing an argument longer than 20 characters will make it overflow the buffer. I checked for warnings about this program using these tools in October 2016.
Coverity of course is remote; the other two were on my Debian/testing system.
Oddly enough, none of them reported a problem with the test
program. Just to check, I ran it with valgrind
, and
got the expected result: a core dump and a nice log:
==26109== Jump to the invalid address stated on the next line ==26109== at 0x787878: ??? ==26109== by 0x3FFFF: ??? ==26109== by 0xFFF0005E7: ??? ==26109== by 0x20592F147: ??? ==26109== by 0x40061F: ??? (in /tmp/foo) ==26109== Address 0x787878 is not stack'd, malloc'd or (recently) free'd ==26109== ==26109== ==26109== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==26109== Bad permissions for mapped region at address 0x787878 ==26109== at 0x787878: ??? ==26109== by 0x3FFFF: ??? ==26109== by 0xFFF0005E7: ??? ==26109== by 0x20592F147: ??? ==26109== by 0x40061F: ??? (in tmp/foo)
That is, the behavior of these functions is reasonably well-known, but none of the tool developers thought it important to verify the parameters for them.