this change is in preparation for amd64. the systab calling
convention was also changed to return uintptr (as segattach
returns a pointer) and the arguments are now passed as
va_list which handles amd64 arguments properly (all arguments
are passed in 64bit quantities on the stack, tho the upper
part will not be initialized when the element is smaller
than 8 bytes).
this is partial. xalloc needs to be converted in the future.
validaddr looks up the segments for an address range
and checks the flags and if the address range lies
within bounds on the segments.
as we'r going to lookup the segment in the syssem*
syscalls anyway, we can do the checks ourselfs avoiding
the double segment array lookups.
the implication of this tho is that now a semaphore cannot
span multiple segments. but this would be highly unusual
given that segments are page aligned.
when we fail to fork resources for the child due to resource
exhaustion, make the half forked child process call pexit()
to free the resources that where allocated and error out.
the variables elem and file0 and commited are explicitely
set to avoid that they get freed in ther waserror() handlers.
but it turns out the compiler optimizes this out as he
thinks the variables arent used any further. (the compiler
is not aware of the waserror() / longjmp() semantics).
rearrange the code to account for this. instead of using
a local variable to check for point of no return (commited),
we use up->seg[SSEG] to figure it out.
for file0 and elem, we just rearrange the code. elem can be
checked in the error handler if it was already assigned to
up->text, and file0 is just free()'d after the poperror().
remove silly busy loop in sysrendez. it is not needed.
dequeueproc() will make sure that the process has come to
rest.
we now always use the new FXSAVE format in FPsave structure and fpregs
file, converting back and forth in fpx87save() and fpx87restore().
document that fprestore() is a destructive operation now.
change fp register definition in libmach and adapt fpr() acid funciton.
avoid unneccesary copy of fpstate and fpsave in sysfork(). functions
including syscalls do not preserve the fp registers and copying fpstate
from the current process would mean we had to fpsave(&up->fpsave); first.
simply not doing it, new process starts in FPinit state.
the kernel uses fixed area (TSTKTOP, TSTKSIZ) of the address
space to temporarily map the new stack segment for exec. for
386 and arm, this area was right below the stack segment which
has the problem that the program can map arbitrary segments
there (even readonly).
alpha and ppc dont have this problem as they map the temporary
exec stack *above* the user reachable stack segement and segattach
prevents one from mapping anything above or overlaping the stack.
lots of arch code assumes USTKTOP being the end of userspace
address space and changing this to TSTKTOP would work, but results
in lots of hard to test changes.
instead, we'r going to map the temporary stack programmatically
finding a hole in the address space where to map it. we also lift
the size limitation for arguments and allow arguments to fill
the whole new stack segement.
the TSTKTOP and TSTKSIZ are not used anymore so they where removed.
references:
http://9fans.net/archive/2013/03/203http://9fans.net/archive/2013/03/202http://9fans.net/archive/2013/03/197http://9fans.net/archive/2013/03/195http://9fans.net/archive/2013/03/181
the syscallno check in syscallfmt() was wrong. the unsigned
syscall number was cast to an signed integer. so negative
values would pass the check provoking bad memory access from
kernel. the check also has an off by one. one has to check
syscallno >= nsyscalls instead of syscallno > nsyscalls.
access to the p->syscalltrace string was not protected
from modification in devproc. you could awake the process
and cause it to free the string giving an opportunity for
the kernel to access bad memory. or someone could kill the
process (pexit would just free it).
now the string is protected by the usual p->debug qlock. we
also keep the string arround until it is overwritten again
or the process exists. this has the nice side effect that
one can inspect it after the process crashed.
another problem was that our validaddr() would error() instead
of pexiting the current process. the code was changed to only
access up->s.args after it was validated and copied instead of
accessing the user stack directly. this also prevents a sneaky
multithreaded process from chaning the arguments under us.
in case our validaddr() errors, we cannot assume valid user
stack after the waserror() if block. use up->s.arg[0] for the
noted() call to avoid bad access.
newproc() didnt zero parentpid and kproc() didnt set it, so
kprocs ended up with random parent pid. this is harmless as
kprocs have no up->parent but it gives confusing results in
pstree(1).
now we zero parentpid in newproc(), and set it in sysrfork()
unless RFNOWAIT has been set.
attachimage()'s approach to handling newseg() error is flawed:
a) the the image is on the hash table, but ref is still 0, and
there is no segment/pages attached to it so nobody is going to
reclaim / putimage() it -> leak
b) calling pexit() would deadlock us because exec has acquired
up->seglock when calling attachimage(), so this would just deadlock.
the fix does the following:
attachimage() will putimage() and nexterror() if newseg() fails
instead of pexit(). this is less surprising.
exec now keeps the condition variable commit which is set once
we are commited / reached the point of no return and check this
variable in the highest waserror() handler and pexit() us there.
this way we have released up all the locks and pexit() will
cleanup.
note: this bug shouldnt us hit in with the current newseg()
implementation as it uses smalloc() which would wait to
satisfy the allocation instead of erroring.
in devproc status read handler the p->status, p->text and p->user
could overflow the local statbuf buffer as they where copied into
it with code like: memmove(statbuf+someoff, p->text, strlen(p->text)).
now using readstr() which will truncate if the string is too long.
make strncpy() usage consistent, make sure results are always null
terminated.