(This document will hopefully in the future include X-Windows & printers)
The displays on most terminals and monitors are a result of raster scanning where an electron beams travels horizontally across the screen, "painting" a line of hundreds of "dots" or pixels. Then the beam goes down slightly lower (about one pixel width down) and paints the next line, etc. Bit-mapped font software is for use in this type of display. The pixels of a font character are mapped to a certain region of the screen where the character is to appear. A scan line clear across the screen may pass thru 80 or so characters. Thus to know what pixels (dots) in this scan line to turn on, the bit-maps (or pixel-layouts) of many characters need to be inspected. This may be done by the video card in a PC or by the equivalent in the electronics of a "dumb" terminal. For a graphics display, which displays pictures as well as text, this is often done by the main part of the computer (the CPU chip) with the CPU creating the screen image and the video card storing it in its memory.
Another electronic scanning method which is rather rare is "vector graphics". In this case the electron beam is "smart" and traces out patterns on the screen much as you would do with a pen, moving not just horizontally but in any direction. By doing this there are no jags in drawing a diagonal line (such as happens in raster graphics). While vector graphics is more advanced than raster graphics, it's difficult to employ on color terminals/monitors since the color dots on the inside of a color picture tube form an inherent raster of dots. Vector graphics uses the concepts of lines rather than pixels so that bit-mapped font software is of no use for it. However, some raster graphic terminals can emulate a vector graphic terminal by being able to display vector graphic code on a raster graphic screen. The result is not a true vector graphic display and slanted lines are likely to appear jagged.
A bit map is simply a matrix, the elements of which are either 0 or 1. Each element represents a pixel (a location (dot) on a sheet of paper or on a screen). A pixel may be "on" or "off". 1 means the pixel is "on" while 0 means it is "off". If the pixel is "on" the dot is normally "black" and if its off the "dot" is not printed. Substitute for "black" the color of ink you are using or the foreground color selected on a video monitor or terminal. An "off" pixel will be of the background color, which is the color of the paper if you are printing. A number which is either 0 or 1 may be represented by a bit. Thus the name "bit-map".
For graphics, a pixel may have other attributes (such as color) and this requires many bits. But for most fonts a pixel requires only one bit. However, we could represents a bit by the ASCII letters 0 or 1, where each letter uses up 8 bits in the computer. This use of 8 bits to represent one pixel uses 8 times the memory necessary. Memory is becoming so cheap that such overuse of memory is, in some cases (if the character doesn't use too many pixels), of little significance. The coding rules used for Wyse and VT220 terminals came from the Wyse 99GT Programmer's Guide (c. 1989) chapters 7 and 8.
In addition to the pixel map (bitmap), other information may accompany each character such as its width (for the case of proportional spaced characters). Such information may constitute a character header (descriptor per HP printers). In rare cases no such header is needed but even in this case some kind of separator mark may be inserted between the bitmaps of successive characters in order to separate them.
In addition to a possible header for each character bitmap, there is usually a font header placed at the start of the soft-font. It may tell the terminal or printer such things as: 1. A name for the font. 2. How many font characters are about to be sent? 3. What is the ASCII character number for the first character to be sent? 4. What bank to store the font in. 5. What to do with the font that was formerly stored there.
In most cases there are both headers for each character and a single header for the entire soft-font. In some cases, one of these is omitted. If enough information is provided in the header for each character, then there is no need for a header for the entire font. For terminal font: VT220 has a font header but no character headers; Wyse has character headers but no font header; X-Windows has both. Nomenclature varies. In one case the first couple bytes of the header is itself called a header but the full header may be called something else.
These headers together with the the encoded bit-maps of characters constitute the soft-font. Soft-font is downloaded to the terminal or printer on same wire on which character codes are sent for displaying or printing characters. Such soft-font (or segments of softfont) must be "escaped" or the like so that it will not be mistaken for characters to be printed or displayed. This is usually done by an escape sequence which starts with the escape (ESC) character (Hex 1B). The code just after the ESC may tell the device that what follows is softfont code or may even give the number of bytes of soft-font to follow. Unless the number of bytes of soft-font is specified in advance, some kind of an "end" character or escape sequence must be sent to mark the end of the soft-font.
How does one represent a bit-map of a character in soft-font? In order to understand (or write) a program to create softfont one must know this. Unfortunately for the font programmer, there are many different ways to represent a bit-map. One way would be to represent it within a rectangle (cell) on a page with a character such as * (or 1) representing "on" pixels and a space (or 0) representing "off" pixels. Since the * character is represented by an ASCII byte in the computer memory, one could simply put this character into the soft-font to represent an "on" pixel. Likewise for the "off" pixel (space). This is not very efficient in memory utilization (and disk storage utilization) since each ASCII character uses 8 bits.
The most efficient way to represent a simple pixel is just by a bit. Most printer fonts do just this but fonts for ANSI/ASCII terminals don't do it this way. Just how they do it will be explained shortly.
It's sometimes desirable to be able to edit a soft-font file with an ordinary editor in order to change (or even create) the font header, and possibly for other purposes such as to check the format. Unfortunately, if the softfont is represented in the most efficient and simple way (a bit in the map is represented by a bit in the computer) then the soft-font file is a binary file containing the entire range of byte values. Most editors and word processors can't handle this very well (if at all). Thus there is a tradeoff between storage efficiency and editibility. How could one make a soft-font file easy to edit.
The pixels of a bitmap may be grouped into bytes (usually of 8 pixels (or bits) each). One way to represent each such byte (which is an integer number between 0 and 255) is by a 3 digit ASCII number (such as 179). This requires 3 bytes to represent one byte. A better way is to represent the byte by two hexadecimal ASCII digits (such as B3 for 179). This works since a byte ranges from 00 to FF. This method is used for both Wyse terminal font and font for X-Windows. Although it's simple, it only utilizes 16 printable characters: 0, 1, 2, ..., A, B, ..., F.
DEC's VT220 terminals came out (in the mid 1980's) with a more efficient (but more complex) sixel method which was widely emulated (especially by Wyse). They simply use 6-bit bytes (called "sixels") instead of 8-bit bytes. (The word "sixel" actually means six pixels.) Since there are only 64 six-bit numbers they can be readily mapped to printable characters. Rather than devise a new six-bit code mapping the numbers 0-63 to printable 8-bit characters, the ASCII code scheme is utilized. Since the first 33 ASCII characters don't print, one could simply add 33 to the 6-bit numbers to enable them to print. Of course they now become 7-bit numbers (and occupy 8-bits in memory). However, adding 63 (Hex. 3F) will also work and this is exactly what the VT sixel encoding method does. 3F is the largest number one may add and still get printable ASCII characters. Thus, roughly speaking, it uses the upper half (Hex 40 to 7F) of the lower ASCII range, except that since the ASCII character DEL (7F) doesn't print it actually goes from Hex 3F to 7E (subtract one). Thus to convert a "sixel" to printable ASCII add Hex 40 and subtract one, or what is the same thing, add 3F.
We have mentioned 3 significant ways of encoding bytes:
Just knowing these three basic methods of encoding pixels as bytes doesn't give one much a of clue as to how to create soft-font since one must know how to scan a character matrix of pixels. A character on most terminals, monitors or printers is just a bunch of pixels (or bits) in a character matrix. How do we partition this matrix into bytes? Which byte is to be sent (downloaded) first to a device (a printer or terminal). Which byte is to be sent next, etc? The scanning method will provide the answer these questions.
There are many ways to scan a character matrix. One may start scanning rows, columns, or some combination of rows and columns. Scanning by rows would start with one row (say the top row), partition this row into bytes, and then read the resulting bytes. Then it would repeat this for the next row, etc. However (for the case of 8-bit bytes) the number of pixels in a row may not be an exact multiple of 8. Thus it is often necessary to zero fill the last partial byte to make it full length. Should we zero-fill the low order bits or the high-order ones? In scanning rows, should we scan from right to left or from left to right? Should the first bit (pixel) we scan be considered high-order or low-order.
A method of scanning that is neither strictly by rows nor columns works for example as follows. We start scanning the first row but read only the first byte. Then we scan the next row but take only the first byte. Then we do the same for the third row, etc. Then after we have done this for all rows we go back to the first row and read the second byte and so on until all the second bytes on every row have been read. Then we repeat for the fourth byte, etc., etc. until all the bytes in the character matrix have been read.
This is something like drawing a character matrix on square-ruled graph paper and scanning with a toy car one byte (say 8 pixels or squares) wide. The car starts from the top of the page and rolls down the left hand "strip" of the paper, running over 8 pixels at a time. Thus while the scanning is by rows reading in one bit at a time, it may also be viewed as scanning down "wide columns" reading in 8 pixels (a byte) at a time.
What voltages are used? For the conventional serial port a 1 bit is about -8 volts and a 0 bit is about +8 volts. The exact voltages may vary. A received voltage of between about 2 and 25 volts may be deemed to be a 0 bit. A modem will convert these digital pulses into an analog phase-amplitude modulated signal. The exact details, including the possible addition of start, stop, and parity bits by the serial port and the possible stripping of these bits by the modem is beyond the scope of this document.
Another question is how to send a byte to a device. It's often done over a serial line which means a single wire over which one bit is sent at a time. Which bit of a byte is to be sent first? The ANSI standard for ASCII characters is that the low-order (least significant) bit is sent first. The hardware for a serial port should automatically do this in converting from the parallel bus of the computer to serial. Since the serial port doesn't know the difference between an ASCII character and some other kind of byte (such as part of a binary code), it sends the low-order byte first for all kinds of bytes. Internal modems normally do the same.
While the low-order bit of each byte is always sent first, there is still another question as to how to send an integer which consists of 2 or 4 bytes. Intel based machines send the low-order byte first. This is called little-endian order which means little-end-first. Motorola, SPARC, and Power PC based machines do the opposite and are bid endian. This should have no effect on a bit-map sent as a sequence a bytes (and not as integers). However, if integers are part of the headers for a font (fonts for dumb terminals don't use this) then they need to be compatible with the machines (including printers) they are being used on.
The VT220 terminal font is generated by the type of scanning previously described that is neither by rows or columns. It may be thought of as scanning by a toy car (six pixels wide) which runs from left to right (just like one would read a page). It reads in the top strip (a wide "row" 6 pixels in height) by moving from left to right across the character matrix. Then it moves back to the left edge of the matrix, jumps down 6 pixels, and rescans from left to right again like a human reading 6 lines in one sweep. If the character matrix is 18 pixels in height, then it scans the entire matrix in 3 sweeps from left to right. The highest bit (at the top of the matrix) is the low-order bit for the first scan, etc. If the bottom (last) strip is less than 6 pixels high, then each byte in this strip is zero-filled with high order zeros to make full-sized 6-bit bytes (sixels). These sixels are then converted to printable 7-bit bytes (stored as 8-bit bytes) using the scheme previously described. Since the encoding only uses 64 ASCII printable characters, many other characters are left over to punctuate the results.
Here is an example of soft-font code for a Russian character 12 pixels high by 7 pixels wide: wACCcQw/NCA@??N; The first 7 characters, wACCcQw, represents the 7@ 6-bit bytes from the first scan of the top strip of the character. The first byte is "w" (hex. 77) which after subtracting 3F results in a "sixel" of hex. 38 = 111000. This vertical sixel has 3 on-pixels in the lower 3 positions (the low order pixels, 000, are at the top). Punctuation marks are "/" which separates scans of wide rows, and ";" which marks the end of a character. Before sending a stream of such soft-font code to a VT220 terminal, one must send a complicated header code of many bytes. This is described in term_notes
This just scans one row at a time starting with the first row, going from left to right, just like reading a page. The bit by the left margin is the high order one. If the width is less than 8 pixels the zero-fill is done on the low order pixels. Here is a sample soft-font encoding for a character: ESCcA134003E42424242828282FE820000^.
ESC is the escape control character. ^Y is control-Y. The encoding for the character starts with the 00 just after A134 and ends with the final 00 before the ^Y. The 00's means a zero byte for that row (all pixels off). In contrast to the VT220 encoding, each character has its own header. Here the header is ESCcA134 which means the character is ASCII 34 (hex.) put into bank 1. ESCc says that a character encoding is to follow. The ^Y is only needed on early model Wyse terminals but seems to do no harm in other cases.
The BDF format (for X-Windows) is allegedly the same as above (Wyse) but the headers are different.
This is also called the cell size. There are usually two sizes for the same character and it can be confusing if you don't understand it. The larger size is the advertised size which includes rows and columns that are almost always left blank to provide for spacing between characters. Since one doesn't normally (if ever) use them, they are not included in the cells that are encoded nor in the cells that are used in a "pattern file" for my BitFontEdit software. Sometimes one must include a row or column in the pattern file (and soft-font code) which is required to contain no on-pixels.
Using the BitFontEdit program one creates a "pattern file" using any editor or word processor. Here is an example of what the encoding of an "A" would look like in such a pattern file. If you have read the rest of this document, it will be easy to follow this. In BitFontEdit, the pattern of *'s and spaces is stored in an array of character matrices called a band[][][].
Note: 7 is the width in pixels, 12 is the height in pixels. It might be displayed within a larger 10x13 cell on a terminal. But in this program the 7x12 region is also called a "cell". The pattern of *'s inside this cell does not usually go all the way to the top or bottom of the cell but it is still called a 7x12 pattern (or cell). Note that in matrix algebra notation it is a 12x7 matrix.
A pattern Characters is shown below (in 2 different formats). Such a pattern is also known as a Character-matrix or a dot-matrix. If you think of the *'s as 1's and the background as 0's it is also a bitmap. In BitFontEdit, one format uses dots for the background. The other format uses spaces for background with vertical bars separating characters. Several such matrices in a row (on a "page") form a "band" of several Character matrices. BitFontEdit will automatically determine which format you have used.
In BitFontEdit the fill_band() function scans a band of several such Characters and puts the pixels (including background pixels but excluding separators such as | ) into the band[][][] array of characters: char band [Char_no] [row] [col]. Below 11,6 means row 11 col. 6, etc. The index origin is 0 but index origin=1 is used in BitFontEdit error messages. .
For VT220 terminals:
The character pattern shown below is encoded as two sequences of 7 ASCII bytes, with a slash / separating the two sequences. The six-bit bytes (sixels) are read from "half-columns" with low order pixels at the top. Note: Add 3F to each byte before outputting to the soft-font file. For taller cells, a column may be split up into 3 sixels. The bottom sixels have their high-order pixels padded with 0's (if needed).
| | (0,0).......(0,6) FOR VT220: | * | ...*... The 1st byte has pixels (5,0)-(0,0)low-order | * * | ..*.*.. The 2nd byte has pixels (5,1)-(0,1) | * * | .*...*. The 3rd byte has pixels (5,2)-(0,2) |* *| *.....* ...... |*******| ******* The 7th byte has pixels (5,6)-(0,6) |* *| *.....* Add a / for separation |* *| *.....* The 8th byte has pixels (11,0)-(6,0)low-order |* *| *.....* The 9th byte has pixels (11,1)-(6,1) |* *| *.....* .......... | | ....... The 14th byte has pixels (11,6)-(6,6) | |(11,0).......(11,6) Add ; to end the character definition For WYSE terminals: The 1st byte has pixels (0,0)-(0,6) (0,0) is the high-order bit. The 2nd byte has pixels (1,0)-(1,6) The 3rd byte has pixels (2,0)-(2,6) ......... The 12th byte has pixels (11,0)-(11,6)Note: Represent each byte as two Hex. digits (2 ASCII bytes such as D3) before outputting it to a softfont file. Table of Contents