plan9fox/sys/man/1/tcs

.TH TCS 1
.SH NAME
tcs \- translate character sets
.SH SYNOPSIS
.B tcs
[
.B -slcv
]
[
.B -f
.I ics
]
[
.B -t
.I ocs
]
[
.I file ...
]
.SH DESCRIPTION
.I Tcs
interprets the named
.I file(s)
(standard input default) as a stream of characters from the
.I ics
character set or format, converts them to runes,
and then converts them into a stream of characters from the
.I ocs
character set or format on the standard output.
The default value for
.I ics
and
.I ocs
is
.BR utf ,
the
.SM UTF
encoding described in
.IR utf (6).
The
.B -l
option lists the character sets known to
.IR tcs .
Processing continues in the face of conversion errors (the
.B -s
option prevents reporting of these errors).
The
.B -c
option forces the output to contain only correctly converted characters;
otherwise,
.B Runeerror
(0xFFFD)
characters will be substituted for
.SM UTF
encoding errors and unknown characters.
.PP
The
.B -v
option generates various diagnostic and summary information on standard error,
or makes the
.B -l
output more verbose.
.PP
.I Tcs
recognizes an ever changing list of character sets.
In particular, it supports a variety of Russian and Japanese encodings.
Some of the supported encodings are
.TF jis-kanji
.TP
.B utf
The Plan 9
.SM UTF
encoding, known by ISO as UTF-8
.TP
.B utf1
The deprecated original
.SM UTF
encoding from ISO 10646
.TP
.B ascii
7-bit ASCII
.TP
.B 8859-1
Latin-1 (Central European)
.TP
.B 8859-2
Latin-2 (Czech .. Slovak)
.TP
.B 8859-3
Latin-3 (Dutch .. Turkish)
.TP
.B 8859-4
Latin-4 (Scandinavian)
.TP
.B 8859-5
Part 5 (Cyrillic)
.TP
.B 8859-6
Part 6 (Arabic)
.TP
.B 8859-7
Part 7 (Greek)
.TP
.B 8859-8
Part 8 (Hebrew)
.TP
.B 8859-9
Latin-5 (Finnish .. Portuguese)
.TP
.B html
Unicode as encoded by HTML
.TP
.B koi8
KOI-8 (GOST 19769-74)
.TP
.B jis-kanji
ISO 2022-JP
.TP
.B ujis
EUC-JX: JIS 0208
.TP
.B ms-kanji
Microsoft, or Shift-JIS
.TP
.B jis
(from only) guesses between ISO 2022-JP, EUC or Shift-Jis
.TP
.B gb
Chinese national standard (GB2312-80)
.TP
.B big5
Big 5 (HKU version)
.TP
.B unicode
Unicode Standard 1.0
.TP
.B tis
Thai character set plus
.SM ASCII
(TIS 620-1986)
.TP
.B msdos
IBM PC: CP 437
.TP
.B atari
Atari-ST character set
.SH EXAMPLES
.TP
.B tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into
.SM UTF
format.
.TP
.B tcs -s -f jis
Convert characters encoded in one of several shift JIS encodings into
.SM UTF
format.
Unknown Kanji will be converted into
.B 0xFFFD
characters.
.TP
.B tcs -t html
Convert UTF into character set-independent HTML.
.TP
.B tcs -lv
Print an up to date list of the supported character sets.
.SH SOURCE
.B /sys/src/cmd/tcs
.SH SEE ALSO
.IR ascii (1),
.IR rune (2),
.IR utf (6).