html2ms, tcs, mothra, uhtml: threat ' as special entity, add uhtml(1)
This commit is contained in:
parent
6d6880cec9
commit
13304b7b96
5 changed files with 95 additions and 29 deletions
46
sys/man/1/uhtml
Normal file
46
sys/man/1/uhtml
Normal file
|
@ -0,0 +1,46 @@
|
|||
.TH UHTML 1
|
||||
.SH NAME
|
||||
uhtml \- convert foreign character set HTML file to unicode
|
||||
.SH SYNOPSIS
|
||||
.B uhtml
|
||||
[
|
||||
.B -p
|
||||
] [
|
||||
.B -c
|
||||
.I charset
|
||||
] [
|
||||
.I file
|
||||
]
|
||||
.SH DESCRIPTION
|
||||
HTML comes in various character set encodings
|
||||
and has special forms to encode characters. To
|
||||
make it easier to process html, uthml is used
|
||||
to normalize it to a unicode only form.
|
||||
.LP
|
||||
Uhtml detects the character set of the html input
|
||||
.I file
|
||||
and calls
|
||||
.IR tcs (1)
|
||||
to convert it to utf replacing html-entity forms
|
||||
by ther unicode character representations except for
|
||||
.B lt
|
||||
.B gt
|
||||
.B amp
|
||||
.B quot
|
||||
and
|
||||
.B apos .
|
||||
The converted html is written to
|
||||
standard output. If no
|
||||
.I file
|
||||
was given, it is read from standard input. If the
|
||||
.B -p
|
||||
option is given, the detected character set is printed and
|
||||
the program exits without conversion.
|
||||
In case character set detection fails, the default (utf)
|
||||
is assumed. This default can be changed with the
|
||||
.B -c
|
||||
option.
|
||||
.SH SOURCE
|
||||
.B /sys/src/cmd/uhtml.c
|
||||
.SH SEE ALSO
|
||||
.IR tcs (1)
|
Loading…
Add table
Add a link
Reference in a new issue