luit - Locale and ISO 2022 support for Unicode terminals

http://invisible-island.net/luit/
Copyright © 2013-2017,2021 by Thomas E. Dickey

(top)
Chroboczek's demo
DEC terminal character sets
ISO-8859-1 and others
- ISO-8859-1
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- ISO-8859-10
- ISO-8859-11
- ISO-8859-12
- ISO-8859-13
- ISO-8859-14
- ISO-8859-15
- ISO-8859-16
IBM/Microsoft code pages
- CP 437
- CP 850
- CP 852
- CP 866
- CP 1250
- CP 1251
- CP 1252
- CP 1255
Related code pages
- KOI8-E
- KOI8-R
- KOI8-RU
- KOI8-U
ISO-2022 character sets
- TCVN
- EUC-JP and JIS X
Other character sets
- APL2
- BIG5 HKSCS
- GB18030
- GB2312
- GBK
- SJIS

Chroboczek's demo

On Chroboczek's webpage, he shows luit being used to run the grep command in a succession of locale settings which do not use UTF-8. Luit translates the input/output of the grep process, making its messages have UTF-8 encoding. Here is a reconstructed image, showing my UTF-8 locale and the results for each message:

DEC terminal character sets

At the time Chroboczek began writing luit there were several people interested in adding UTF-8 support to xterm. Most were uninterested in providing support for legacy applications.

I was, of course. Not all character encodings have a mapping to UTF-8, and not all applications produce UTF-8.

Related to terminal character-set support, there is the example of DEC's VT100, VT220 terminals. In the figure below, most of those odd-looking backwards-"?" marks correspond to DEC characters for which there was no corresponding Unicode code point.

DEC Character Sets

ISO-8859-1 and others

DEC's terminals are the best-known because of the VT100. The mainstream for character encoding is different: ISO-8859-x. Here are screenshots for the available encodings.

I used the same script for each, simply writing the binary code corresponding to the hexadecimal value to get the character rendered using luit. Note that 0x80-0x9f are "n/a" (not available) since the ISO encoding reserves these for C1 controls.

ISO-8859-1

ISO-8859-2

ISO-8859-3

ISO-8859-4

ISO-8859-5

ISO-8859-6

ISO-8859-7

ISO-8859-8

ISO-8859-9

ISO-8859-10

ISO-8859-11

ISO-8859-12

There is no ISO-8859-12

ISO-8859-13

ISO-8859-14

ISO-8859-15

ISO-8859-16

IBM/Microsoft code pages

CP 437

CP 850

CP 852

CP 866

CP 1250

CP 1251

CP 1252

CP 1255

Related code pages

KOI8-E

KOI8-R

KOI8-RU

KOI8-U

ISO-2022 character sets

TCVN

EUC-JP and JIS X

The ISO-8859-x character sets are a simple example of ISO-2022. There are others, such as the EUC (Extended Unix Code) character sets. The International Register of Coded Character Sets and IANA's Character Sets are good places to start reading.

EUC-JP is made up of multiple parts using the JIS X character sets. The first part (G0) is ASCII (and so there is no need for a screenshot). The other parts are mapped in luit to G1/G2/G3 and use two bytes per code, which does not work with my simple script:

G1 = jisx0208.1983-0
G2 = jisx0201.1976-0
G3 = jisx0212.1990-0