doing 4 quarterround's in parallel using 128-bit
vector registers. for second round shuffle the columns and
then shuffle back.
code is rather obvious. only trick here is for the first
quaterround PSHUFLW/PSHUFHW is used to swap the halfwords
for the <<<16 rotation.
Add assembler versions for aes_encrypt/aes_decrypt and the key
setup using AES-NI instruction set. This makes aes_encrypt and
aes_decrypt into function pointers which get initialized by
the first call to setupAESstate().
Note that the expanded round key words are *NOT* stored in big
endian order as with the portable implementation. For that reason
the AESstate.ekey and AESstate.dkey fields have been changed to
void* forcing an error when someone is accessing the roundkey
words. One offender was aesXCBmac, which doesnt appear to be
used and the code looks horrible so it has been deleted.
The AES-NI implementation is for amd64 only as it requires the
kernel to save/restore the FPU state across syscalls and
pagefaults.
the QLp structure used to occupy 24 bytes on amd64.
with some rearranging the fields we can get it to 16 bytes,
saving 8K in the data section for the 1024 preallocated
structs in the ql arena.
the rest of the changes are of cosmetic nature:
- getqlp() zeros the next pointer, so there is no need to set
it when queueing the entry.
- always explicitely compare pointers to nil.
- delete unused code from ape's qlock.c
initThumbprints() now takes an application tag argument
so x509 and ssh can coexist.
the thumbprint entries can now hold both sha1 and sha256
hashes. okThumbprint() now takes a len argument for the
hash length used.
the new function okCertificate() hashes the certificate
with both and checks for any matches.
on failure, okCertificate() returns 0 and sets error string.
we also check for include loops now in thumbfiles, limiting
the number of includes to 8.
theres a bug is in sclose() where it doesnt check if wp is beyond
the buffer. also wp was not updated after realloc().
bug was reported by porlock on 9fans:
Plan 9's implementation of the standard C functions snprintf and
vsnprintf have a buffer overrun bug.
If the buffer length equals the output length (without the terminating
null), then one too many characters is written to the buffer.
For example,
snprintf(buf, 4, "ABCD");
will write 5 characters to buf.
when _syserrno() fails to map a plan9 error string to
a unix error number, we copy the plan9 error string
to the per process error buffer "plan9err" and set
errno = EPLAN9.
when strerror() is called with EPLAN9, it returns
a pointer to the plan9err buffer.
amd64 passes first argument in RARG (BP) register
which has the be preserved duing _profin() and
_profout() calls. to handle this we introduce
_saveret() and _savearg(). _saveret() returns
AX, _savearg() returns RARG (BP). for archs other
and amd64, _saveret() and _savearg() are the
same function, doing nothing.
restoing works with dummy function:
uintptr
_restore(uintptr, uintptr ret)
{
return ret;
}
...
ret = _saveret();
arg = _savearg();
...
return _restore(arg, ret);
as we pass arg as the first argument, RARG (BP) is
restored.
in ape's vfprintf we don't check if the file we're writing is actually a string buffer, resulting in a return of -1, when we should actually return the number of bytes that would be written.
semaphore locks have much higher overhead than initially presented
in the "Semaphores in Plan9" paper. until the reason for it has been
found out i will revert the changes.
instead of trying to resize the segment (which will not work when
the kernel picks the address as it will allocate right before
the base of the topmost segment), we create the mux segment with the
maximum size needed (arround 1.4MB) for OPEN_MAX filedescriptors.
buf slots will be reused and slots get demand paged once used.
*alen has to be initialized to the size of the buffer
by the caller, and we are supposed to put the real
size of the address in there, but not copy more than
the original *alen value (truncate).
store errno on the private process stack so its always per process
and not just per memory space. errno itself becomes a macro
dereferencing int *_errnoloc; which is initialized from main9.s
pointing to the private stack location.
various fixes in programs that just imported errno variable with
"extern int errno;" instead of including <errno.h>.
in ape close(), do the real filedescriptor _CLOSE() *after* we cleared
the _fdinfo[] slot because once closed, we dont own the slot anymore and
another process doing open() can trash the slot. make sure open() retuns
fd < OPEN_MAX.
double check in _startbuf() holding mux->lock if the fd is already buffered
preveting running double copyprocs on a fd.
dont zero the mux->rwant/ewant bitmaps at the end of select() as we do not
hold the mix->lock.
in _closebuf() kill copyproc while holding the mux->lock to make sure the
copyproc isnt holding it at the time it is killed. run kill() multiple times
to make sure the proc is gone.
python uses processes sharing memory. it requires at least fopen() to
be called by multiple threads at once so we introduce _IO_newfile()
which allocates the FILE structure slot under a lock.
add PASS_MAX to limits.h for ape, and make getpass respect it. also increase the size of
the maximum passwords (we use long ones at work). Needed for native port of SVN (in progress).
_grpmems() was broken tokenizing group list in place.
we have to copy it to status buffer before tokenizing.
dynamically alloc path for test file to check write
permission on directory and add pid to the name to
prevent races.
use _OPEN instead of ape open to read /dev/ppid in
getppid().
use mode enums instead of numeric constants for _OPEN()
and _CREATE().
remove envname length limitation in _envsetup()
by using allocated buffer and use /env instead of #e
use /proc and getpid() instead of #p and #c in
readprocfdinit()
fix buffer overflow in execlp(), check if name
of failed exec starts with / . or is \0
make sure not to close our own filedescriptors
for FD_CLOEXEC in execve(), fix wrong length
check for flushing buffer to /env/_fdinfo.
fix error handling cases. copy the enviroment
before decoding \1 to \0 because the strings in
environ[] array might not be writable.
remove bogus close if we fail to open ppid file
in getppid() and use /dev/ppid instead of #c/ppid