plan9fox/sys/man/2/encode
Ori Bernstein e934530ee4 libc: add encode(2) variants for custom alphabets
There are a number of alphabets in common use for base32
and base64 encoding, such as url-safe encodings.

This adds support for passing a function to encode into
arbitary alphabets.
2021-07-03 20:03:17 +00:00

174 lines
3.3 KiB
Plaintext

.TH ENCODE 2
.SH NAME
dec64, enc64, dec32, enc32, dec16, enc16, \
dec64x, enc64x, dec32x, enc32x, \
dec64chr, enc64chr, dec32chr, enc32chr, dec16chr, enc16chr, \
encodefmt \- encoding byte arrays as strings
.SH SYNOPSIS
.B #include <u.h>
.br
.B #include <libc.h>
.PP
.B
int dec64(uchar *out, int lim, char *in, int n)
.PP
.B
int dec64x(uchar *out, int lim, char *in, int n, int (*map)(int))
.PP
.B
int enc64(char *out, int lim, uchar *in, int n)
.PP
.B
int enc64x(char *out, int lim, uchar *in, int n, int (*map)(int))
.PP
.B
int dec32(uchar *out, int lim, char *in, int n)
.PP
.B
int dec32x(uchar *out, int lim, char *in, int n, int (*map)(int))
.PP
.B
int enc32(char *out, int lim, uchar *in, int n)
.PP
.B
int enc32x(char *out, int lim, uchar *in, int n, int (*map)(int))
.PP
.B
int dec16(uchar *out, int lim, char *in, int n)
.PP
.B
int enc16(char *out, int lim, uchar *in, int n)
.PP
.B
int dec64chr(int c)
.PP
.B
int enc64chr(int o)
.PP
.B
int dec32chr(int c)
.PP
.B
int enc32chr(int o)
.PP
.B
int dec16chr(int c)
.PP
.B
int enc16chr(int o)
.PP
.B
int encodefmt(Fmt*)
.SH DESCRIPTION
The functions described here handle encoding and decoding of
bytes to printable ASCII strings as specified by RFC4648.
.PP
.IR Enc16 ,
.I enc32
and
.I enc64
create null terminated strings. They return the size of the
encoded string (without the null) or -1 if the encoding fails.
The encoding fails if
.IR lim ,
the length of the output buffer (including null), is too small.
.PP
.IR Dec16 ,
.I dec32
and
.I dec64
return the number of bytes decoded or -1 if the decoding fails.
The decoding fails if the output buffer is not large enough or,
for base 32, if the input buffer length is not a multiple
of 8.
.PP
.IR Dec16chr ,
.I dec32chr
and
.I dec64chr
return the value for a symbol of the alphabet or -1 when the
symbol is not in the alphabet.
.PP
.IR Enc16chr ,
.I enc32chr
and
.I enc64chr
encode a symbol of the alphabet given a value.
if the value is out of range then zero is returned.
.PP
The
.I enc64x
and
.I enc32x
variants are identical to the above, except that they take a
function mapping from an arbitrary index in the alphabet to
the encoded character.
For example, in the following 32-character alphabet,
.EX
.I ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
.EE
the chr function would map the value
.I 3
to the character
.IR D .
The
.I dec64x
and
.I dec32x
variants are similar to the above, however the function passed
maps from a character within the alphabet to the index within
the alphabet.
.PP
.I Encodefmt
can be used with
.IR fmtinstall (2)
and
.IR print (2)
to print encoded representations of byte arrays.
The verbs are
.TP
.B H
base 16 (i.e. hexadecimal). The default encoding is
in upper case. The
.B l
flag forces lower case.
.TP
.B <
base 32. The default is upper case, same as
.BR H .
.TP
.B [
base 64 (same as MIME)
.PD
.PP
The length of the array is specified as
.IR f2 .
For example, to display a 15 byte array as hex:
.EX
char x[15];
fmtinstall('H', encodefmt);
print("%.*H\\n", sizeof x, x);
.EE
.SH SOURCE
.B /sys/src/libc/port/u16.c
.br
.B /sys/src/libc/port/u32.c
.br
.B /sys/src/libc/port/u64.c
.br
.B /sys/src/libc/port/encodefmt.c
.SH HISTORY
In Jan 2018, base 32 encoding was changed from non-standard
to standard RFC4648 alphabet.
.TP
old:
.B "23456789abcdefghijkmnpqrstuvwxyz"
.TP
new:
.B "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567"