get a bit more verbose about process image exhaustion and make
imagreclaim() try to get at least one image on the freelist.
use rsrcwait() to notify the state, and call freebroken() in
case imagereclaim() couldnt free any images.
the variables elem and file0 and commited are explicitely
set to avoid that they get freed in ther waserror() handlers.
but it turns out the compiler optimizes this out as he
thinks the variables arent used any further. (the compiler
is not aware of the waserror() / longjmp() semantics).
rearrange the code to account for this. instead of using
a local variable to check for point of no return (commited),
we use up->seg[SSEG] to figure it out.
for file0 and elem, we just rearrange the code. elem can be
checked in the error handler if it was already assigned to
up->text, and file0 is just free()'d after the poperror().
remove silly busy loop in sysrendez. it is not needed.
dequeueproc() will make sure that the process has come to
rest.
we now always use the new FXSAVE format in FPsave structure and fpregs
file, converting back and forth in fpx87save() and fpx87restore().
document that fprestore() is a destructive operation now.
change fp register definition in libmach and adapt fpr() acid funciton.
avoid unneccesary copy of fpstate and fpsave in sysfork(). functions
including syscalls do not preserve the fp registers and copying fpstate
from the current process would mean we had to fpsave(&up->fpsave); first.
simply not doing it, new process starts in FPinit state.
use resrcwait() when waiting for memory to become available. randomize
the sleep time and properly restore old process status in case tsleep()
gets interrupted.
filesystems do not handle i/o errors well (cwfs will abandon the blocks),
and temporary exhaustion of kernel memory (because of too many i/o's in
parallel) causes read and write on the partition to fail.
i think it is better to wait for the memory to become available in
this case. the single allocation is at max SDmaxio bytes, which makes
it likely to become available. if we havnt even enought fo that, then
rebooting the machine would be the best option. (aux/reboot)
the fist problem is that qopen() might return nil and that kstrdup() will
sleep, so we should try to avoid holding the mntalloc lock. so we move
the kstrdup() and qopen() calls before the Mnt allocation, and properly
recover the memory if we fail later.
the second problem was that we error(Eshort) after we already created the Mnt
when returnlen < sizeof(f.version). this check has to happen *before* we
even attempt to allocate the Mnt structures. note that we only copy the
version string once everything is in the clear, so the semantics of the
user buffer not being modified in case of error is not changed.
a little cleanup in muxclose(), getting rid of mntptfree()...
1 the config string was grabbed Aoehsz too far into the packet due to using the wrong pointer to start.
2 never accept a response with tag Tmgmt or Tfree.
3 defend against "malicious" responses; ones with a response Aoehdr.type != request Aoehdr.type. this previously could
cause the initiator to crash.
4 vendor commands were improperly filtered out.
the software cursor starts flickering and reacts bumby if a process
spends most of its time with drawlock acquired because the timer interrupt
thats supposed to redraw the cursor fails to acquire the lock at the time
the timer fires.
instead of trying to draw the cursor on the screen from a timer interrupt
30 times per second, devmouse now creates a process calling cursoron() and
cursoroff() when the cursor needs to be redrawn. this allows the swcursor
to schedule a redraw while holding the drawlock in swcursoravoid() and
cursoron()/cursoroff() are now able to wait for a qlock (drawlock) because
they get called from process context.
the overall responsiveness is also improved with this change as the cursor
redraw rate isnt limited to 30 times a second anymore.
from ehci spec:
The buffer pointer list in the qTD is long enough to support a maximum
transfer size of 20K bytes. This case occurs when all five buffer pointers
are used and the first offset is zero. A qTD handles a 16Kbyte buffer
with any starting buffer alignment.
the kernel uses fixed area (TSTKTOP, TSTKSIZ) of the address
space to temporarily map the new stack segment for exec. for
386 and arm, this area was right below the stack segment which
has the problem that the program can map arbitrary segments
there (even readonly).
alpha and ppc dont have this problem as they map the temporary
exec stack *above* the user reachable stack segement and segattach
prevents one from mapping anything above or overlaping the stack.
lots of arch code assumes USTKTOP being the end of userspace
address space and changing this to TSTKTOP would work, but results
in lots of hard to test changes.
instead, we'r going to map the temporary stack programmatically
finding a hole in the address space where to map it. we also lift
the size limitation for arguments and allow arguments to fill
the whole new stack segement.
the TSTKTOP and TSTKSIZ are not used anymore so they where removed.
references:
http://9fans.net/archive/2013/03/203http://9fans.net/archive/2013/03/202http://9fans.net/archive/2013/03/197http://9fans.net/archive/2013/03/195http://9fans.net/archive/2013/03/181
the kernel would go into endless loop when stating "stats" and "ifstats"
files and the network interface having no connections, or otherwise return
wrong stat info.
some controls are inverted. we reflect this by specifying
negative range in the volume table now and let genaudiovolread()
and genaudiovolwrite() do the conversion.
access to non standard serial port COM3 at i/o port 0x200 causes
kernel panic on some machines (Toshiba Sattelite 1415-S115). also,
some machines have gameport at 0x200.
i readded uartisa to the pcf and pccpuf kernel configurations so
one can use plan9.ini to add non standard uarts like:
uart2=type=isa port=0x200 irq=5
some devices freeze up with inqiry allocation length
other than 36 bytes. as we do not really care about
the vendor specific part of the inquiry, lets only do
36 byte inquiry for now.
the unit inquiry data might change in case the drive got pulled
with ahci. so keep track if we locked the ctl in a local stack
variable instead of relying on that the inquiry data stays the
same.
the syscallno check in syscallfmt() was wrong. the unsigned
syscall number was cast to an signed integer. so negative
values would pass the check provoking bad memory access from
kernel. the check also has an off by one. one has to check
syscallno >= nsyscalls instead of syscallno > nsyscalls.
access to the p->syscalltrace string was not protected
from modification in devproc. you could awake the process
and cause it to free the string giving an opportunity for
the kernel to access bad memory. or someone could kill the
process (pexit would just free it).
now the string is protected by the usual p->debug qlock. we
also keep the string arround until it is overwritten again
or the process exists. this has the nice side effect that
one can inspect it after the process crashed.
another problem was that our validaddr() would error() instead
of pexiting the current process. the code was changed to only
access up->s.args after it was validated and copied instead of
accessing the user stack directly. this also prevents a sneaky
multithreaded process from chaning the arguments under us.
in case our validaddr() errors, we cannot assume valid user
stack after the waserror() if block. use up->s.arg[0] for the
noted() call to avoid bad access.
newproc() didnt zero parentpid and kproc() didnt set it, so
kprocs ended up with random parent pid. this is harmless as
kprocs have no up->parent but it gives confusing results in
pstree(1).
now we zero parentpid in newproc(), and set it in sysrfork()
unless RFNOWAIT has been set.
assuming that this check tried to prevent the hostowner
from killing init, it is silly because init would just
handle the note.
with kbdfs, we actually want to send interrupt note to
the initial process group so instead of working arround
this with rfork(RFNOTEG|RFNAMEG), we remove the check.
avoid double entries in the cache for copen() and properly handle
locking so we wont just give up if we cant lock the Mntcache entry,
but drop the cache lock, qlock the Mntcache entry, and then recheck
the cache.
general cleanup (cdev -> ccache, use eqchantdqid())
the lock order of page.Lock -> palloc.hashlock was
violated in cachedel() which is called from the
pager. change the code to do it in the right oder
to prevent deadlock.
change lookpage to retry on false hit. i assume that
a false hit means:
a) we'r low on memory -> cached page got uncached/reused
b) duppage() got called on the page, meaning theres another
cached copy in the image now.
paging in is expensive compared to the hashtable lookup, so
i think retrying is better.
cleanup fixfault, adding comments.
swaped pages use a 8bit refcount where as the Page uses a 16bit one.
this might be exploited with having a process having a single page
swaped out and then forking 255 times to make the swap map refcount
overflow and panic the kernel.
this condition is probably very rare. so instead of doubling the
size of the swap map, we add a single 32bit refcount swapalloc.xref
which will keep the combined refcount of all swap map entries who
exceeded 255 references.
zero swapimage.c in setswapchan() after closing it as the stat() call
below might error leaving a dangeling pointer.
attachimage()'s approach to handling newseg() error is flawed:
a) the the image is on the hash table, but ref is still 0, and
there is no segment/pages attached to it so nobody is going to
reclaim / putimage() it -> leak
b) calling pexit() would deadlock us because exec has acquired
up->seglock when calling attachimage(), so this would just deadlock.
the fix does the following:
attachimage() will putimage() and nexterror() if newseg() fails
instead of pexit(). this is less surprising.
exec now keeps the condition variable commit which is set once
we are commited / reached the point of no return and check this
variable in the highest waserror() handler and pexit() us there.
this way we have released up all the locks and pexit() will
cleanup.
note: this bug shouldnt us hit in with the current newseg()
implementation as it uses smalloc() which would wait to
satisfy the allocation instead of erroring.
kstrcpy() did not null terminate for < 4 byte buffers. fixed,
but i dont think there is any case where this can happen in
practice.
always set malloctag in kstrdup(), cleanup.
always use ERRMAX bounded kstrcpy() to set up->errstr, q->err
and note[]->msg. paranoia.
instead of silently truncating interface name in netifinit(),
panic the kernel if interface name is too long as this case
is clearly a mistake.
panic kernel when filename is too long for addbootfile() in
devroot. this might happen if your kernel configuration is
messed up.
in devproc status read handler the p->status, p->text and p->user
could overflow the local statbuf buffer as they where copied into
it with code like: memmove(statbuf+someoff, p->text, strlen(p->text)).
now using readstr() which will truncate if the string is too long.
make strncpy() usage consistent, make sure results are always null
terminated.
we have to acquire p->seglock before we lock the individual
segments of the process and lock them. if we dont then pexit()
might free the segments before we can lock them causing the
"qunlock called with qlock not held, from ..." prints.
always make sure that there are child processes we can wait for
before sleeping.
put pwait() sleep into a loop and recheck. this is not strictly
neccesary but prevents accidents if there are spurious wakeups
or a bug.
once we set q->done = 1 in mountmux, the sleeper might return freeing q
so the wakeup might access invalid memory. we change the embedded Rendez
structure in the Mntrpc into a pointer to the sleeping procs up->sleep
rendez so the rendez is always going to be valid even if the rpc has been
freed.
the call to mntstats was moved before we set q->done also to prevent
accessing invalid memory.
the limit for overqueueing was too small for stuff like fcp
on a fileserver connected with a standard 32K limit pipe like
ramfs.
fcp usesd 8K*16procs > 32K*2
the biggest queue limit used in the kernel is 256K making
the maximum queue bloat 2.5MB or 320K for standard pipes.
that should be big enougth to never happen in practice
unless there is a bug which we like to catch before we
exhaust all kernel memory.