doing 4 quarterround's in parallel using 128-bit
vector registers. for second round shuffle the columns and
then shuffle back.
code is rather obvious. only trick here is for the first
quaterround PSHUFLW/PSHUFHW is used to swap the halfwords
for the <<<16 rotation.
Add assembler versions for aes_encrypt/aes_decrypt and the key
setup using AES-NI instruction set. This makes aes_encrypt and
aes_decrypt into function pointers which get initialized by
the first call to setupAESstate().
Note that the expanded round key words are *NOT* stored in big
endian order as with the portable implementation. For that reason
the AESstate.ekey and AESstate.dkey fields have been changed to
void* forcing an error when someone is accessing the roundkey
words. One offender was aesXCBmac, which doesnt appear to be
used and the code looks horrible so it has been deleted.
The AES-NI implementation is for amd64 only as it requires the
kernel to save/restore the FPU state across syscalls and
pagefaults.
initThumbprints() now takes an application tag argument
so x509 and ssh can coexist.
the thumbprint entries can now hold both sha1 and sha256
hashes. okThumbprint() now takes a len argument for the
hash length used.
the new function okCertificate() hashes the certificate
with both and checks for any matches.
on failure, okCertificate() returns 0 and sets error string.
we also check for include loops now in thumbfiles, limiting
the number of includes to 8.