173 lines
2.6 KiB
Plaintext
173 lines
2.6 KiB
Plaintext
|
.TH TCS 1
|
||
|
.SH NAME
|
||
|
tcs \- translate character sets
|
||
|
.SH SYNOPSIS
|
||
|
.B tcs
|
||
|
[
|
||
|
.B -slcv
|
||
|
]
|
||
|
[
|
||
|
.B -f
|
||
|
.I ics
|
||
|
]
|
||
|
[
|
||
|
.B -t
|
||
|
.I ocs
|
||
|
]
|
||
|
[
|
||
|
.I file ...
|
||
|
]
|
||
|
.SH DESCRIPTION
|
||
|
.I Tcs
|
||
|
interprets the named
|
||
|
.I file(s)
|
||
|
(standard input default) as a stream of characters from the
|
||
|
.I ics
|
||
|
character set or format, converts them to runes,
|
||
|
and then converts them into a stream of characters from the
|
||
|
.I ocs
|
||
|
character set or format on the standard output.
|
||
|
The default value for
|
||
|
.I ics
|
||
|
and
|
||
|
.I ocs
|
||
|
is
|
||
|
.BR utf ,
|
||
|
the
|
||
|
.SM UTF
|
||
|
encoding described in
|
||
|
.IR utf (6).
|
||
|
The
|
||
|
.B -l
|
||
|
option lists the character sets known to
|
||
|
.IR tcs .
|
||
|
Processing continues in the face of conversion errors (the
|
||
|
.B -s
|
||
|
option prevents reporting of these errors).
|
||
|
The
|
||
|
.B -c
|
||
|
option forces the output to contain only correctly converted characters;
|
||
|
otherwise,
|
||
|
.B Runeerror
|
||
|
(0xFFFD)
|
||
|
characters will be substituted for
|
||
|
.SM UTF
|
||
|
encoding errors and unknown characters.
|
||
|
.PP
|
||
|
The
|
||
|
.B -v
|
||
|
option generates various diagnostic and summary information on standard error,
|
||
|
or makes the
|
||
|
.B -l
|
||
|
output more verbose.
|
||
|
.PP
|
||
|
.I Tcs
|
||
|
recognizes an ever changing list of character sets.
|
||
|
In particular, it supports a variety of Russian and Japanese encodings.
|
||
|
Some of the supported encodings are
|
||
|
.TF jis-kanji
|
||
|
.TP
|
||
|
.B utf
|
||
|
The Plan 9
|
||
|
.SM UTF
|
||
|
encoding, known by ISO as UTF-8
|
||
|
.TP
|
||
|
.B utf1
|
||
|
The deprecated original
|
||
|
.SM UTF
|
||
|
encoding from ISO 10646
|
||
|
.TP
|
||
|
.B ascii
|
||
|
7-bit ASCII
|
||
|
.TP
|
||
|
.B 8859-1
|
||
|
Latin-1 (Central European)
|
||
|
.TP
|
||
|
.B 8859-2
|
||
|
Latin-2 (Czech .. Slovak)
|
||
|
.TP
|
||
|
.B 8859-3
|
||
|
Latin-3 (Dutch .. Turkish)
|
||
|
.TP
|
||
|
.B 8859-4
|
||
|
Latin-4 (Scandinavian)
|
||
|
.TP
|
||
|
.B 8859-5
|
||
|
Part 5 (Cyrillic)
|
||
|
.TP
|
||
|
.B 8859-6
|
||
|
Part 6 (Arabic)
|
||
|
.TP
|
||
|
.B 8859-7
|
||
|
Part 7 (Greek)
|
||
|
.TP
|
||
|
.B 8859-8
|
||
|
Part 8 (Hebrew)
|
||
|
.TP
|
||
|
.B 8859-9
|
||
|
Latin-5 (Finnish .. Portuguese)
|
||
|
.TP
|
||
|
.B html
|
||
|
Unicode as encoded by HTML
|
||
|
.TP
|
||
|
.B koi8
|
||
|
KOI-8 (GOST 19769-74)
|
||
|
.TP
|
||
|
.B jis-kanji
|
||
|
ISO 2022-JP
|
||
|
.TP
|
||
|
.B ujis
|
||
|
EUC-JX: JIS 0208
|
||
|
.TP
|
||
|
.B ms-kanji
|
||
|
Microsoft, or Shift-JIS
|
||
|
.TP
|
||
|
.B jis
|
||
|
(from only) guesses between ISO 2022-JP, EUC or Shift-Jis
|
||
|
.TP
|
||
|
.B gb
|
||
|
Chinese national standard (GB2312-80)
|
||
|
.TP
|
||
|
.B big5
|
||
|
Big 5 (HKU version)
|
||
|
.TP
|
||
|
.B unicode
|
||
|
Unicode Standard 1.0
|
||
|
.TP
|
||
|
.B tis
|
||
|
Thai character set plus
|
||
|
.SM ASCII
|
||
|
(TIS 620-1986)
|
||
|
.TP
|
||
|
.B msdos
|
||
|
IBM PC: CP 437
|
||
|
.TP
|
||
|
.B atari
|
||
|
Atari-ST character set
|
||
|
.SH EXAMPLES
|
||
|
.TP
|
||
|
.B tcs -f 8859-1
|
||
|
Convert 8859-1 (Latin-1) characters into
|
||
|
.SM UTF
|
||
|
format.
|
||
|
.TP
|
||
|
.B tcs -s -f jis
|
||
|
Convert characters encoded in one of several shift JIS encodings into
|
||
|
.SM UTF
|
||
|
format.
|
||
|
Unknown Kanji will be converted into
|
||
|
.B 0xFFFD
|
||
|
characters.
|
||
|
.TP
|
||
|
.B tcs -t html
|
||
|
Convert UTF into character set-independent HTML.
|
||
|
.TP
|
||
|
.B tcs -lv
|
||
|
Print an up to date list of the supported character sets.
|
||
|
.SH SOURCE
|
||
|
.B /sys/src/cmd/tcs
|
||
|
.SH SEE ALSO
|
||
|
.IR ascii (1),
|
||
|
.IR rune (2),
|
||
|
.IR utf (6).
|