cinap_lenrek
6974a1ecb6
uhtml: dont trust charset=utf-8 attribute, verify.
...
when the charset is explicitely specified as utf-8, ignore it
for now. we'll assume utf-8 when all bytes have been properly
utf-8 encoded.
2016-03-13 23:47:24 +01:00
cinap_lenrek
3d1e12363d
uhtml: check if document is valid utf8 even with charset specified
...
often, documents specify charsets but are really utf-8 encoded.
we now try to decode as utf-8 and only if that fails assume
the charset specified in the document.
2015-05-28 16:37:55 +02:00
cinap_lenrek
1f850cbab1
uhtml: honor default charset -c when not found in document
2013-07-14 16:44:16 +02:00
cinap_lenrek
f99007281d
uhtml: fix wrong open error handling (fd 0 != fd 1) (thanks BurnZeZ)
2013-06-21 02:37:10 +02:00
cinap_lenrek
3932153299
mothra: handle misplaced trailing quotes
2012-08-15 13:15:34 +02:00
cinap_lenrek
55ddbff77d
fix strchr \0 bugs
2012-07-19 23:34:37 +02:00
cinap_lenrek
4c2c62ee96
uhtml: use first match
2012-07-16 05:32:16 +02:00
cinap_lenrek
f4480d1517
mothra/uhtml: properly handle quoting in tags
2012-06-24 08:36:42 +02:00
cinap_lenrek
4eaea14f76
uhtml: fix -c override
2012-02-20 20:54:42 +01:00
cinap_lenrek
0a9ae3758c
uhtml: scan tags only, fix cat fallback, usage, cleanup
2012-02-20 20:48:48 +01:00
cinap_lenrek
51c7856350
uhtml: assume latin1 if not valid utf8
2011-10-05 04:47:53 +02:00
cinap_lenrek
13304b7b96
html2ms, tcs, mothra, uhtml: threat ' as special entity, add uhtml(1)
2011-09-24 17:06:45 +02:00
cinap_lenrek
94646a4287
html2ms: table support
2011-09-21 14:17:27 +02:00
cinap_lenrek
6c91d99ce2
uhtml: remove trailing utf BOM marker, html2ms: fix underline handling and escaping
2011-09-20 04:14:29 +02:00
cinap_lenrek
a31e4f61a4
uhtml: add html to unicode converter, used by mothra and page/html2ms
2011-09-20 00:38:28 +02:00