http://invisible-island.net/luit/
Copyright © 2013 by Thomas E. Dickey


Chroboczek's demo

On Chroboczek's webpage, he shows luit being used to run the grep command in a succession of locale settings which do not use UTF-8. Luit translates the input/output of the grep process, making its messages have UTF-8 encoding. Here is a reconstructed image, showing my UTF-8 locale and the results for each message:

Example showing multiple encodings

DEC terminal character sets

At the time Chroboczek began writing luit there were several people interested in adding UTF-8 support to xterm. Most were uninterested in providing support for legacy applications.

I was, of course. Not all character encodings have a mapping to UTF-8, and not all applications produce UTF-8.

Related to terminal character-set support, there is the example of DEC's In the figure below, most of those odd-looking backwards-"?" marks correspond to DEC characters for which there was no corresponding Unicode code point.

DEC Character Sets
DEC Special Character Set DEC Technical Character Set

ISO-8859-1 and others

DEC's terminals are the best-known because of the VT100. The mainstream for character encoding is different: ISO-8859-x. Here are screenshots for the available encodings.

I used the same script for each, simply writing the binary code corresponding to the hexadecimal value to get the character rendered using luit. Note that 0x80-0x9f are "n/a" (not available) since the ISO encoding reserves these for C1 controls.

ISO-8859-1

Example of ISO-8859-1 encoding

ISO-8859-2

Example of ISO-8859-2 encoding

ISO-8859-3

Example of ISO-8859-3 encoding

ISO-8859-4

Example of ISO-8859-4 encoding

ISO-8859-5

Example of ISO-8859-5 encoding

ISO-8859-6

Example of ISO-8859-6 encoding

ISO-8859-7

Example of ISO-8859-7 encoding

ISO-8859-8

Example of ISO-8859-8 encoding

ISO-8859-9

Example of ISO-8859-9 encoding

ISO-8859-10

Example of ISO-8859-10 encoding

ISO-8859-11

Example of ISO-8859-11 encoding

ISO-8859-12

There is no ISO-8859-12

ISO-8859-13

Example of ISO-8859-13 encoding

ISO-8859-14

Example of ISO-8859-14 encoding

ISO-8859-15

Example of ISO-8859-15 encoding

ISO-8859-16

Example of ISO-8859-16 encoding

IBM/Microsoft code pages

CP 437

Example of CP 437 encoding

CP 850

Example of CP 850 encoding

CP 852

Example of CP 852 encoding

CP 866

Example of CP 866 encoding

CP 1250

Example of CP 1250 encoding

CP 1251

Example of CP 1251 encoding

CP 1252

Example of CP 1252 encoding

CP 1255

Example of CP 1255 encoding

Related code pages

KOI8-E

Example of KOI8-E encoding

KOI8-R

Example of KOI8-R encoding

KOI8-RU

Example of KOI8-RU encoding

KOI8-U

Example of KOI8-U encoding

ISO-2022 character sets

TCVN

Example of TCVN encoding

EUC-JP and JIS X

The ISO-8859-x character sets are a simple example of ISO-2022. There are others, such as the EUC (Extended Unix Code) character sets. The Internation Register of Coded Character Sets and IANA's Character Sets are good places to start reading.

EUC-JP is made up of multiple parts using the JIS X character sets. The first part (G0) is ASCII (and so there is no need for a screenshot). The other parts are mapped in luit to G1/G2/G3 and use two bytes per code, which does not work with my simple script:

Other character sets

BIG5 HKSCS (unifont)

Luit also supports some non-ISO-2022 encodings. Here are some sample screenshots, again for codes 128-255.

Example of BIG5 HKSCS encoding (display with unifont)

BIG5 HKSCS

Example of BIG5-HKSCS encoding

GB18030 (unifont)

Example of GB18030 encoding (display with unifont)

GB18030

Example of GB18030 encoding

GB2312

This encoding uses two bytes, and the simple script shows nothing:

Example of GB2312 encoding

GBK (unifont)

Example of GBK encoding (display with unifont)

GBK

Example of GBK encoding

SJIS

Example of SJIS encoding