the syscallno check in syscallfmt() was wrong. the unsigned
syscall number was cast to an signed integer. so negative
values would pass the check provoking bad memory access from
kernel. the check also has an off by one. one has to check
syscallno >= nsyscalls instead of syscallno > nsyscalls.
access to the p->syscalltrace string was not protected
from modification in devproc. you could awake the process
and cause it to free the string giving an opportunity for
the kernel to access bad memory. or someone could kill the
process (pexit would just free it).
now the string is protected by the usual p->debug qlock. we
also keep the string arround until it is overwritten again
or the process exists. this has the nice side effect that
one can inspect it after the process crashed.
another problem was that our validaddr() would error() instead
of pexiting the current process. the code was changed to only
access up->s.args after it was validated and copied instead of
accessing the user stack directly. this also prevents a sneaky
multithreaded process from chaning the arguments under us.
in case our validaddr() errors, we cannot assume valid user
stack after the waserror() if block. use up->s.arg[0] for the
noted() call to avoid bad access.
newproc() didnt zero parentpid and kproc() didnt set it, so
kprocs ended up with random parent pid. this is harmless as
kprocs have no up->parent but it gives confusing results in
pstree(1).
now we zero parentpid in newproc(), and set it in sysrfork()
unless RFNOWAIT has been set.
assuming that this check tried to prevent the hostowner
from killing init, it is silly because init would just
handle the note.
with kbdfs, we actually want to send interrupt note to
the initial process group so instead of working arround
this with rfork(RFNOTEG|RFNAMEG), we remove the check.
avoid double entries in the cache for copen() and properly handle
locking so we wont just give up if we cant lock the Mntcache entry,
but drop the cache lock, qlock the Mntcache entry, and then recheck
the cache.
general cleanup (cdev -> ccache, use eqchantdqid())
the lock order of page.Lock -> palloc.hashlock was
violated in cachedel() which is called from the
pager. change the code to do it in the right oder
to prevent deadlock.
change lookpage to retry on false hit. i assume that
a false hit means:
a) we'r low on memory -> cached page got uncached/reused
b) duppage() got called on the page, meaning theres another
cached copy in the image now.
paging in is expensive compared to the hashtable lookup, so
i think retrying is better.
cleanup fixfault, adding comments.
swaped pages use a 8bit refcount where as the Page uses a 16bit one.
this might be exploited with having a process having a single page
swaped out and then forking 255 times to make the swap map refcount
overflow and panic the kernel.
this condition is probably very rare. so instead of doubling the
size of the swap map, we add a single 32bit refcount swapalloc.xref
which will keep the combined refcount of all swap map entries who
exceeded 255 references.
zero swapimage.c in setswapchan() after closing it as the stat() call
below might error leaving a dangeling pointer.
attachimage()'s approach to handling newseg() error is flawed:
a) the the image is on the hash table, but ref is still 0, and
there is no segment/pages attached to it so nobody is going to
reclaim / putimage() it -> leak
b) calling pexit() would deadlock us because exec has acquired
up->seglock when calling attachimage(), so this would just deadlock.
the fix does the following:
attachimage() will putimage() and nexterror() if newseg() fails
instead of pexit(). this is less surprising.
exec now keeps the condition variable commit which is set once
we are commited / reached the point of no return and check this
variable in the highest waserror() handler and pexit() us there.
this way we have released up all the locks and pexit() will
cleanup.
note: this bug shouldnt us hit in with the current newseg()
implementation as it uses smalloc() which would wait to
satisfy the allocation instead of erroring.
kstrcpy() did not null terminate for < 4 byte buffers. fixed,
but i dont think there is any case where this can happen in
practice.
always set malloctag in kstrdup(), cleanup.
always use ERRMAX bounded kstrcpy() to set up->errstr, q->err
and note[]->msg. paranoia.
instead of silently truncating interface name in netifinit(),
panic the kernel if interface name is too long as this case
is clearly a mistake.
panic kernel when filename is too long for addbootfile() in
devroot. this might happen if your kernel configuration is
messed up.
in devproc status read handler the p->status, p->text and p->user
could overflow the local statbuf buffer as they where copied into
it with code like: memmove(statbuf+someoff, p->text, strlen(p->text)).
now using readstr() which will truncate if the string is too long.
make strncpy() usage consistent, make sure results are always null
terminated.
we have to acquire p->seglock before we lock the individual
segments of the process and lock them. if we dont then pexit()
might free the segments before we can lock them causing the
"qunlock called with qlock not held, from ..." prints.
always make sure that there are child processes we can wait for
before sleeping.
put pwait() sleep into a loop and recheck. this is not strictly
neccesary but prevents accidents if there are spurious wakeups
or a bug.
once we set q->done = 1 in mountmux, the sleeper might return freeing q
so the wakeup might access invalid memory. we change the embedded Rendez
structure in the Mntrpc into a pointer to the sleeping procs up->sleep
rendez so the rendez is always going to be valid even if the rpc has been
freed.
the call to mntstats was moved before we set q->done also to prevent
accessing invalid memory.
the limit for overqueueing was too small for stuff like fcp
on a fileserver connected with a standard 32K limit pipe like
ramfs.
fcp usesd 8K*16procs > 32K*2
the biggest queue limit used in the kernel is 256K making
the maximum queue bloat 2.5MB or 320K for standard pipes.
that should be big enougth to never happen in practice
unless there is a bug which we like to catch before we
exhaust all kernel memory.