introducing the PFPU structue which allows the machine specific
code some flexibility on how to handle the FPU process state.
for example, in the pc and pc64 kernel, the FPsave structure is
arround 512 bytes. with avx512, it could grow up to 2K. instead
of embedding that into the Proc strucutre, it is more effective
to allocate it on first use of the fpu, as most processes do not
use simd or floating point in the first place. also, the FPsave
structure has special 16 byte alignment constraint, which further
favours dynamic allocation.
this gets rid of the memmoves in pc/pc64 kernels for the aligment.
there is also devproc, which is now checking if the fpsave area
is actually valid before reading it, avoiding debuggers to see
garbage data.
the Notsave structure is gone now, as it was not used on any
machine.
this change is in preparation for amd64. the systab calling
convention was also changed to return uintptr (as segattach
returns a pointer) and the arguments are now passed as
va_list which handles amd64 arguments properly (all arguments
are passed in 64bit quantities on the stack, tho the upper
part will not be initialized when the element is smaller
than 8 bytes).
this is partial. xalloc needs to be converted in the future.
this simplifies the arm ports and keeps all the stuff in one place
instead of spreading it thru notify(), trap() and syscall() functions
and prevents useless fp state copying for kernel procs.
also make sure to save fp in notify while still splhi().