Add assembler versions for aes_encrypt/aes_decrypt and the key
setup using AES-NI instruction set. This makes aes_encrypt and
aes_decrypt into function pointers which get initialized by
the first call to setupAESstate().
Note that the expanded round key words are *NOT* stored in big
endian order as with the portable implementation. For that reason
the AESstate.ekey and AESstate.dkey fields have been changed to
void* forcing an error when someone is accessing the roundkey
words. One offender was aesXCBmac, which doesnt appear to be
used and the code looks horrible so it has been deleted.
The AES-NI implementation is for amd64 only as it requires the
kernel to save/restore the FPU state across syscalls and
pagefaults.
the previous implementation was not portable at all, assuming
little endian in gf_mulx() and that one can cast unaligned
pointers to ulong in xor128(). also the error code is likely
to be ignored, so better abort() when the length is not a
multiple of the AES block size.
we also pass in full AESstate structures now instead of
the expanded key longs, so that we do not need to hardcode
the number of rounds. this allows each indiviaul keys to
be bigger than 128 bit.
the QLp structure used to occupy 24 bytes on amd64.
with some rearranging the fields we can get it to 16 bytes,
saving 8K in the data section for the 1024 preallocated
structs in the ql arena.
the rest of the changes are of cosmetic nature:
- getqlp() zeros the next pointer, so there is no need to set
it when queueing the entry.
- always explicitely compare pointers to nil.
- delete unused code from ape's qlock.c
initThumbprints() now takes an application tag argument
so x509 and ssh can coexist.
the thumbprint entries can now hold both sha1 and sha256
hashes. okThumbprint() now takes a len argument for the
hash length used.
the new function okCertificate() hashes the certificate
with both and checks for any matches.
on failure, okCertificate() returns 0 and sets error string.
we also check for include loops now in thumbfiles, limiting
the number of includes to 8.
drawterm, factotum, secstore and the auth commands
all had ther own implementation of readcons. we
want to have one common function for this to avoid
the duplication, so putting that in libauthsrv.
introduce PASSWDLEN which makes the use more explicit
than ANAMELEN.
assigning the expression value to a temporary variable in
BPSHORT() and BPLONG() saves arround 2K of text in rio on
arm and arround 1K on amd64.
loadchar(): use the passed in "h" as the char index instead
of recomputing it from c-f->cache. dont recompute wid.
cachechars(): do cache lookup and find oldest entry in a
single loop pass.