adjust to new aes_xts routines.
allow optional offset in the 4th argument where the encrypted
sectors start instead of hardcoding the 64K header area for
cryptsetup.
avoid allocating temporary buffer for cryptio() reads, we can
just decrypt in place there.
use sdmalloc() to allocate the temporary buffer for cryptio()
writes so that devsd wont need to allocate and copy in case
it didnt like our alignment.
do not duplicate the error reporting code, just use io()
that is what it is for.
allow 2*256 bit keys in addition to 2*128 bit keys.
there isnt much of a point in keep maintaining separate
kernel configurations for terminal and cpu kernels as
the role can be switched with service=cpu boot parameter.
to make stuff cosistent, we will just have one "pc" kernel
and one "pc64" kernel configuration now.
we used to tweak arround in the ICH registers for all intel controllers,
which is wrong, as the 200 series has different magic registes. see
the datasheet:
https://www.intel.com/content/www/us/en/chipsets/200-series-chipset-pch-datasheet-vol-2.html
this caused the clocks to be disabled for the 6th port causing a full
machine lockup touching the 6th port registers.
the next problem was that aiju's bios disabled the unused ports somehow
but didnt clear ther PI bits, so that they would stay in Sbist status even
after a port reset. so the port would get stuck in the Dportreset state
forever. the fix for this was to use a one second timeout for the
port reset procedure.
with xhci, bandwidth allocations are handled by the controller
and there are various speed settings possible that currently
not exposed in the Udev. so just keep usbload() as it is for
usb2 and keep ep->load as zero for superspeed.
more conservative approach: only one transaction in flight
per endpoint (except iso). also serialize controller commands.
no driver currently uses this and i doubt it is usefull.
create constants for common TRB flags and remove bogus 1<<16
flag on TR_NORMAL.
while endpoints != 0 are opend after the device descriptor has been
parsed and the endpoint properties like maxpkt have been set, the
control endpoint is opend with a guessed maxpkt value. once the first
8 bytes of the descriptor have been read by usbd, maxpkt gets set and
we need to reevaluate the control endpoint 0 context to update the value.
instead of guessing where the controllers dequeue pointer went,
stop the endpoint and then explicitely set te dequeue pointer to
the next write td position. that way we do not need to fix the cycle
bit in the td's and dont need to rely on if the controller
advanced the dequeue pointer after a stall or not.
add ctx and slot back pointers to ring.
add support for edp, dp and hdmi on haswell and haswell ult.
vga, dvi and specific configurations like ulx are unimplemented.
remaining issue: edp link training always fails (time out).
the problem is that segio doesnt check segment attributes
and it can't really in case of SG_FAULT which can be
inherited from pseg and toggle at any time.
so instead of returning -1 from fault into the fault$cputype
handler which then panics when fault happend kernel mode,
we jump into segio's waserror() block just like in the
demand load i/o error case (faulterror()).
make sure the loop terminates and doesnt get stuck at
name == aname. avoid memrchr() as it conflicts with
libc on unix (drawterm). declare namelenerror() as
static.
calculate alloc flag before waserror(), as compilers like
gcc will not notice the value changing later because
setjump() restores the old value due to callee-saves.
change is applies here to make it easier to merge with
drawterm.
thanks to aiju for debugging this; used to cause drawterm
memory leak until compiled with gcc -O0.
turns out on real hardware, the front falls off if we write
the completion queue doorbell registers without consuming
an entry. so only write the register when we have processed
something.
timerdel() did not make sure that the timer function
is not active (on another cpu). just acquiering the
Timer lock in the timer function only blocks the caller
of timerdel()/timeradd() but not the other way arround
(on a multiprocessor).
this changes the timer code to track activity of
the timer function, having timerdel() wait until
the timer has finished executing.
basic NVMe controller driver, reads and writes work.
"namespaces" show up as logical units.
uses pin/msi interrupts (no msi-x support yet).
one submission queue per cpu, shared completion queue.
no recovery from fatal controller errors.
only tested in qemu (no hardware available).
commiting this so it can be found by someone who has
hardware.
on thinkpad x1v4, the PCMP structure resides in upper reserved memory
pa=0xd7f49000 - while system memory ends at 0x0ffff000; so we have to
vmap() it instead of KADDR().
the RSD structure for ACPI might reside in low memory, so we sould
KADDR() in that case.
devmouse controls the screen blanking timeout, so move the
code there avoiding cross calls between modules. the only
function that needs to be provided is blankscreen(), which
gets called with drawlock locked.
the blank timeout is set thru /dev/mousectl now, so kernels
without devvga can set it.
blanking now only happens while /dev/mouse is read. so this
avoids accidentally blanking the screen on cpu servers that
do not have a mouse to unblank it.
- add some milisecond timestamps to the status change debug printing
- flush the packets in the queue on deassoc to avoid processing old pae
packets on next association.
- make roaming timeout shorter (60 -> 20 seconds)
- automatically timeout and restart wpa/pae blocked state
- fix printing race when essid gets changed underneath seprint
from openbsd driver, it seems the Centrino Advanced-N 6030 and 6235
cards share the same device revision as the 6205 (Type6005). Also
changing the device revision field from 4 to 5 bits.
- drivers enable short preamble and sort timeslot depending
on the ap beacon capinfo field (bss->cap)
- wifi sets short preamble bit in capinfo on association request
- wifi sets short timeslot bit when ap advertized it in beacon
- no need to splhi() in timerset, always called with
interrupts off.
- make timerset always update the period (next == 0)
- remove period update in fastticks(), simplify
delta calculation.
introducing new ctrunc() function that invalidates any caches
for the passed in chan, invoked when handling wstat with a
specified file length or on file creation/truncation.
test program to reproduce the problem:
#include <u.h>
#include <libc.h>
#include <libsec.h>
void
main(int argc, char *argv[])
{
int fd;
Dir *d, nd;
fd = create("xxx", ORDWR, 0666);
write(fd, "1234", 4);
d = dirstat("xxx");
assert(d->length == 4);
nulldir(&nd);
nd.length = 0;
dirwstat("xxx", &nd);
d = dirstat("xxx");
assert(d->length == 0);
fd = open("xxx", OREAD);
assert(read(fd, (void*)&d, 4) == 0);
}
given that the igfx driver doesnt provide any acceleration functions
and drawing is usually faster with double buffering as it eleminates
reads over the pci bus, enable softscreen by default.
on 386 kernel, each processor has its own pdb where the primary
pdb for kernel mappings is on cpu0 and other cpu's lazily pull
pdb entries from cpu0 when they fault in vmapsync().
so we have to edit the table tables in the pdb of cpu0 and not
the current processor.
on some modern machines like the x250, the bios arranges the mtrr's
and the framebuffer membar in a way that doesnt allow us to mark
the framebuffer pages as write combining, leading to slow graphics.
since the pentium III, the processor interprets the page table bit
combinations of the WT, CD and bit7 bits as an index into the
page attribute table (PAT).
to not change the semantics of the WT and CD bits, we preserve
the bit patterns 0-3 and use the last entry 7 for write combining.
(done in mmuinit() for each core).
the new patwc() function takes virtual address range and changes
the page table marking the range as write combining. no attempt
is made on invalidating tlb's. doesnt matter in our case as the
following mtrr() call in screen.c does it for us.
the assumption of only one producer ((abs)moustratrack()) is not true
for external mouse events from /dev/mousein, so protect the mouse state
and queue with ilock().
get rid of mousecreate(), just use devcreate().
reset cursor when all instances of /dev/mouse and /dev/cursor got closed,
instead of also considering /dev/mousectl. the reason is that kbdfs keeps
the mousectl file open. so exiting a program that has the cursor changed
will properly reset the cursor to arrow.
don't access user buffer while holding cursor spinlock! the memory access
can fault. theres also no lock needed there, we'r just copying *from* the
cursor memory.
fix use of strtol(), p will always be set, check for end of string.
keep pointer coordinates onscreen (off by one).
make lastms() function to get the last millisecond delta of last
call for resynchronization.
fix msg[3] buffer overflow in m5mouseputc().
get rid of mouseshifted logic, it is not used.
remove bl2mem(), it is broken. a fault while copying to memory
yields a partially freed block list. it can be simply replaced
by readblist() and freeblist(), which we also use for qcopy()
now.
remove mem2bl(), and handle putting back remainer from a short
read internally (splitblock()) avoiding the releasing and re-
acquiering of the ilock.
always attempt to free blocks outside of the ilock.
have qaddlist() return the number of bytes enqueued, which
avoids walking the block list twice.
remove unneeded waserror() block, loopoput is alled from
loopbackbwrite only so we will always get called with a
*single* block, so the concatblock() is not needed.
the convention for Dev.bwrite() is that it accepts a *single* block,
and not a block chain. so we never have concatblock here.
to keep stuff consistent, we also guarantee thet Medium.bwrite()
will get a *single* block passed as well, as the callers are
few in number.
to avoid copying in padblock() when adding cryptographics macs to a block
in devtls/devssl/esp we reserve 16 extra bytes to the allocation.
remove qio ixsummary() function and add acid function qiostats() to
/sys/lib/acid/kernel
simplify iallocb(), remove iallocsummary() statitics.
given that devmnt will almost always write into a pipe
or a network connection, which supports te bwrite routine,
we can avoid the memory copy that would have been done by
devbwrite(). this also means the i/o buffer for writes
will get freed sooner without having to wait for the 9p
rpc to get a response, saving memory.
theres one case where we have to keep the rpc arround and
that is when we write to a cached file, as we want to update
the cache with the data that was written, but the user buffer
cannot be trusted to stay the same during the rpc.
devfs:
- fix memory leak in devfs leaking the aes key
- allocate aes-xts cipher state in secure memory
- actually check if the hexkey got fully parsed
cryptsetup:
- get rid of stupid "type YES" prompt
- use genrandom() to generate salts and keys
- rewrite cryptsetup to use common pbkdf2 and readcons routines
- fix alot of error handling and simplify the code
- move cryptsetup command to disk/cryptsetup
- update cryptsetup(8) manual page
get rid of _INI and _REG method calls, this is not full acpi environment
anyway and all we really want todo at kernel boot time is figuring out
the interrupt routing. aux/acpi can try to enable more stuff if it needs
to later when battery status desired.
dont snoop memory space regions in amlmapio(), this is just wrong as
amlmapio() is *lazily* mapping regions as they are accessed, so the
range table would never be really complete. instead, we provide generic
access to the physical address space, excluding kernel and user memory
with acpimem file.
we can encrypt the 256 bit chacha key on each invocation
making it hard to reconstruct previous outputs of the
generator given the current state (backtracking resiatance).
the kernels custom rand() and nrand() functions where not working
as specified in rand(2). now we just use libc's rand() and nrand()
functions but provide a custom lrand() impelmenting the xoroshiro128+
algorithm as proposed by aiju.
we now access the user buffer in randomread() outside of the lock,
only copying and advancing the chacha state under the lock. this
means we can use randomread() within the fault handling path now
without fearing deadlock. this also allows multiple readers to
generate random numbers in parallel.
we might wake up on a different cpu after the sleep so
delta from machX->ticks - machY->ticks can become negative
giving spurious timeouts. to avoid this always use the
same mach 0 tick counter for the delta.
the manpage states that capabilities time out after a minute,
so we add ticks field into the Caphash struct and record the
time when the capability was inserted. freeing old capabilities
is handled in trimcaps(), which makes room for one extra cap
and frees timed out ones.
we also limit the capuse write size to less than 1024 bytes to
prevent denial of service as we have to copy the user buffer.
(memory exhaustion).
we have to check the from user *before* attempting to remove
the capability! the wrong user shouldnt be able to change any
state. this fixes the memory leak of the caphash.
do the hash comparsion with tsmemcmp(), avoiding timing
side channels.
allocate the capabilities in secret memory pool to prevent
debugger access.
The kernel needs to keep cryptographic keys and cipher states
confidential. secalloc() allocates memory from the secret pool
which is protected from debuggers reading the memory thru devproc.
secfree() releases the memory, overriding the data with garbage.
the first time rtl8169link is called (from rtl8169pnp), the link isn't up, so
setting edev->mbps based on Phystatus register is skipped. edev->mbps is then
still set at the default 100, and that ends up being what devether uses.
this is why some rtl8169 cards are misprinted as 100Mbps in kmesg.
later, after rtl8169link is called again from rtl8169interrupt, the link is up
and edev->mbps is set to the correct value (as shown by e.g. /net/ether0/stats).
so instead, set speed regardless of link status.
the arm compiler can lift long->vlong casts on multiplcation
and convert 64x64->64 multiplication into a 32x32->64 one
with optional 64 bit accumulate.
This patch is only an adaptation for 9front of the patch located in
http://www.9legacy.org/9legacy/patch/pc-ether82563-i210.diff. The
major difference is that this patch ignores errors in checksum of
eeprom, because in my system the checksum was wrong. After 3 months,
I didn't have problems, and I think the patch can be used. although
it has some things that need to be fixed. If the link is inactive
when the system boots then it will remain inactive forever.
- return distinct error message when attempting to create Globalseg with physseg name
- copy directory name to up->genbuf so it stays valid after we unlock(&glogalseglock)
- cleanup wstat() handling, allow changing uid
- make sure global segment size is below SEGMAXSIZE
- move isoverlap() check from globalsegattach() into segattach()
- remove Proc* argument from globalsegattach(), segattach() and isoverlap()
- make Physseg.attr and segattach attr parameter an int for consistency
to figure out what network connection a particular tls
conversation refers to, we add the path of the underlying
we send the encrypted tls traffic over in the status file,
example:
term% grep -n '^Chan:' '#a'/tls/*/status
#a/tls/0/status:7: Chan: /net/tcp/6/data
#a/tls/1/status:7: Chan: /net/tcp/0/data
/n/bugs/open/multicasts_and_udp_buffers
http://bugs.9front.org/open/multicasts_and_udp_buffers/readmemichal@Lnet.pl
I have ported my small MPEG-TS analisis tool to Plan9.
To allow this application working I had to fix a bug in the kernel IPv4 code and increase UDP input buffer.
Bug is related to listening for IPv4 multicast traffic. There is no problem if you listen for only one group or multiple groups with different UDP ports. This works:
Write to UDP ctl:
anounce PORT
addmulti INTERFACE_ADDR MULTICAST_ADDR
headers
and you can read packets from data file.
You need to set headers option because otherwise every UDP packet for MULTICAST_ADDR!PORT is treat as separate connection. This is a bug and should be fixed too, but I didn't tried it.
There is a problem when you need to receive packets for multiple multicast groups. Usually the same destination port is used by multiple streams and above sequence of commands fails for second group because the port is the same.
Simple and probably non-intrusive fix is adding "|| ipismulticast(addr)" to if statement at /sys/src/9/ip/devip.c:861 line:
if(ipforme(c->p->f, addr) || ipismulticast(addr))
This fixes the problem and now you can use the following sequence to listen for multiple multicast groups even if they all have the same destination port:
announce MULTICAST_ADDR!PORT
addmulti INTERFACE_ADDR MULTICAST_ADDR
headers
After that my application started working but signals packet drops at >2 Mb/s input rate. The same is reported by kernel netlog. Increase capacity of UDP connection input queue fixes this problem /sys/src/9/ip/udp.c:153
c->rq = qopen(512*1024, Qmsg, 0, 0);
--
Michał Derkacz
access to the axi segment hangs the machine when the fpga
is not programmed yet. to prevent access, we introduce a
new SG_FAULT flag, that when set on the Segment.type or
Physseg.attr, causes the fault handler to immidiately
return with an error (as if the segment would not be mapped).
during programming, we temporarily set the SG_FAULT flag
on the axi physseg, flush all processes tlb's that have
the segment mapped and when programming is done, we clear
the flag again.
tsleep() used to cancel the timer with:
if(up->tt != nil)
timerdel(up);
which still can result in twakeup() to fire after tsleep()
returns (because we set Timer.tt to nil *before* we call the tfn).
in most cases, this is not an issue as the Rendez*
usually is just &up->sleep, but when it is dynamically allocated
or on the stack like in tsemacquire(), twakeup() will call
wakeup() on a potentially garbage Rendez structure!
to fix the race, we execute the wakup() with the Timer lock
held, and set p->trend to nil only after we called wakeup().
that way, the timerdel(); which unconditionally locks the Timer;
can act as a proper barrier and use up->trend == nil as the
condition if the timer has already fired.
for queue like non-seekable files, it is impossible to implement an
exportfs because one has to run the kernels devtab read() and write()
in separate processes, and that makes it impossible to maintain 9p message
order as the scheduler can come in and randomly schedule one process before
another.
so as soon as we have a transition from 9p -> syscalls, we'r screwed.
i currently see just two possibilities:
- introduce special file type like QTSEQ with strictly ordered i/o semantics
- fix all fileservers and exportfs to only do one outstanding i/o to QTSEQ files
which means maintaining a queue per fid
this doesnt propagate. so exporting slow 9p mount again will be limited
again by latency of the inner mount.
other option:
- return offset in Rread, so client can bring responses back into order. this
requires changing all fileservers and drivers to maintain such an per fid offset
and change the protocol to include it in the response, and also pass it to userspace
(new syscalls or pass it in TOS)
this only works for read pipelining, write is still screwed.
both options suck.
--
cinap
this code was if(0) for a long time due to wrong parentesis,
fixed parentesis cause print spam on some machines making them
unusage (kenji okomoto). removing the check alltogether.
theres a bootstrap problem:
when /bin/init is run, it processes /lib/namespace where we might want to
mount or bind resources to /n or /mnt. but mntgen was run later in
cpurc/termrc so these mounts would be ignored.
we already have mntgen in bootfs, so we can provide these mountpoints early.
i keep the termrc/cpurc mntgens where they are, but ignore the error
prints. this way old kernels will continue to work.
apparently, this causes some quadcore ramnode vm to hang on boot,
even tho all cores successfully started up and are operational.
i suspect some side effect from timersinit()... this would also
mean *notsc= would break it (syncclock() would continue)...
its unclear.
i'm reverting this for now until the problem is better understood.
when testing in qemu, launching each ap became slower and slower
because all the ap's where spinning in syncclock() waiting for
cpu0 to update its mach0->tscticks, which happens only much later
after all cpu's have been started up.
now we wait for each cpu to do its timer callibration and
manually update our tscticks while we wait and each cpu will
not spin but halt while waiting for active.thunderbirdsarego.
this reduces the system load and noise for timer callibration
and makes the mp startup linear with regard to the number of
cores.
introduce cpushutdown() function that does the common
operation of initiating shutdown, returning once all
cpu's got the message and are about to shutdown. this
avoids duplicated code which isnt really machine specific.
automatic reboot on panic only when *debug= is not set
and the machine is a cpu server or has no display,
otherwise just hang.
to solve the usb device enumeration race on boot, usbd creates /env/usbbusy
on startup and once all devices have been enumerated and readers have consumed
all the events, we remove the file so nusbrc/bootrc can continue. this makes
sure all the usb devices that where plugged in on boot are made available.
when opening a /env file ORCLOSE, and the process exits, envgrp() would
return nil can crash in envremove() because procexit will have set up->egrp
to nil before calling closefgrp().
the solution is to capture the environment on open, keeping a reference in
Chan.aux, so it doesnt matter on what process the close happens and a
env chan will always refer to its original environment group.
instead of checking addr+len >= addr, check len >= -addr so
that addr == 0 is never valid for len > 0 even if we decide
to have memory at the zero page so theres never any chance
user can pass in "nil" pointers.
put up some signs where we fall thru the switch cases in
fixfault()
sha256 is only defined for TLS1.2, however, technically, theres
no reason not to use it in TLS1.0/TLS1.1. the choice is up to
tlshand and pushtls, not the kernel.
introduce wificfg() function to convert ether->opt[] strings
to wifictl messages, which needs quoting for the value. so
etherX=type=iwl essid='something with spaces' works.
- fix missing runlock(ifc) when ifcid != a->ifcid in rxmitsols() (thanks erik quanstro)
- don't leak packets when transfering blocks from arp entry hold list to droplist
- free rest of droplist when bwrite() errors in arpenter(), remove useless checks (ifc != nil)
- free arp entry hold list from cleanarpent()
- consistent use of nil for pointers
for incoming connection, we used s->laddr to lookup the interface
for the incoming call, but this does not work when the announce
address is tcp!*!123, then s->laddr is all zeros "::". instead,
use the incoming destination address for interface mtu lookup.
thanks mycroftix for troubleshooting!
instead of ordering the source mount list, order the new destination
list which has the advantage that we do not need to wlock the source
namespace, so copying can be done in parallel and we do not need the
copy forward pointer in the Mount structure.
the Mhead back pointer in the Mount strcture was unused, removed.
there was a race between cunmount() and walk() on Mhead.from as Mhead.from was
unconditionally freed when we cunmount(), but findmount might have already
returned the Mhead in walk(). we have to ensure that Mhead.from is not freed
before the Mhead itself (now done in putmhead() once the reference count of the
Mhead drops to zero).
the Mhead struct contained two unused locks, removing.
no need to hold Pgrp.ns lock in closegrp() as nobody can get to it (refcount
droped to zero).
avoid cclose() and freemount() while holding Mhead.lock or Pgrp.ns locks as
it might block on a hung up fileserver.
remove the debug prints...
cleanup: use nil for pointers, remove redundant nil checks before putmhead().
we have to validaddr() and vmemchr() all argv[] elements a second
time when we copy to the new stack to deal with the fact that another
process can come in and modify the memory of the process doing the
exec. so the argv[] strings could have changed and increased in
length. we just make sure the data being copied will fit into the
new stack and error when we would overflow.
also make sure to free the ESEG in case the copy pass errors.
argv[] strings get copied to the new processes stack segment, which
has a maximum size of USTKSIZE, so limit the size of the strings to
that and check early for overflow.
this moves the name validation out of segattach() to syssegattach()
to make sure the segment name cannot be changed by the user while
segattach looks at it.
when executing a script, we did advance argp0 unconditionally
to replace argv[0] with the script name. this fails when
argv[] is empty, then we'd advance argp0 past the nil terminator.
the alternative would be to *not* advance if *argp0 == nil, but that
would require another validaddr() check for a case that is unlikely
to have been anticipated in most programs being invoked as
libc's ARGBEGIN macro assumes argv[0] being non-nil as it also
unconditionally advances the argv pointer.
to keep us sane, we now reject an empty argv[]. on entry, we
verify that argv[] is valid for at least two elements:
- the program name argv[0], has to be non-nil
- the first potential nil terminator in argv[1]
when argv[0] == nil, we throw Ebadarg "bad arg in system call"
the psaux driver is not used in any kernel configuration and theres
no userspace mouse daemon. i8042auxcmds() is wrong as access
to the user buffer can fault and we are holding an ilocks.
little cleanups in devkbd.
on vmware, loading a new kernel sometimes reboots when
wiggling the mouse. disabling keyboard and mouse on
shutdown fixes the issue.
make sure ps2 mouse is disabled on init, will get re-enabled
in i8042auxenable().
keyboard isnt special anymore, we can just use the devreset
entry point in the device to do the keyboard initialization,
so kbdinit()/kbdenable() are not needed anymore.
the keyboard stops sending interrupts when its fifo gets full,
which can happen on boot when keys get mashed while interrupts
are still disabled. to work arround this, call the keyboard
interrupt handler when kbd.q is starved before blocking.
add bootscreenconf(VGAscr *) function, that is called whenever
the framebuffer configuration is changed by devvga. that way, we
can pass the current setting of the framebuffer to the new
kernel when using /dev/reboot.
we already export mntauth() and mntversion(), so why not stop
being sneaky and just export mntattach() so bindmount() and
devshr can just call it directly with proper arguments being
checked.
we can also avoid handling #M attach specially in namec()
by having the devmnt's attach function do error(Enoattach).
to avoid double caching, attachimage() and setswapchan() clear
the CCACHE flag on the channel but this keeps the read ahread
state of the cache arround (until the chan gets closed), so also
call cclunk() to detach the mcp and free the read ahead state.
avoid the call to cread() when CCACHE flag is clear.
use the actual iounit returned from Ropen/Rcreate to chunk reads and writes
instead of c->mux->msize-IOHDRSZ.
dont preallocate the rpc buffers to msize, most 9p requests are rather small
(except Twrite of course). so we allocate the buffer on demand in mountio()
with some rounding to avoid frequent reallocations.
avoid malloc()/free() while holding mntalloc lock.
this changes devmnt adding mntrahread() function and some helpers
for it to do pipelined sequential read ahead for the mount cache.
basically, cread() calls mntrahread() with Mntrah structure and it
figures out if we where reading sequentially and if thats the case
issues reads of c->iounit size in advance.
the read ahead state (Mntrah) is kept in the mount cache so we can
handle (read ahead) cache invalidation in the presence of writes.
as the Fgrp can be shared with other processes, we have to
recheck the fd index after locking the Fgrp in fdclose()
to make sure not to read beyond the bounds of the fd array.
using the user buffer has a race where the user can modify
the buffer from another process before it is copied into the cache.
this allows poisoning the cache for every file where the user
has read access.
instead, we update the cache from kernel memory.
Wnode gets two new counters: txcount and txerror
and actrate pointer that will be between minrate
and maxrate.
driver should use actrate instead of maxrate for
transmission when it can provide error feedback.
when a driver detects a transmission failed, it calls
wifitxfail() with the original packet. wifitxfail() then
reduces wn->actrate.
every 256th packet, we optimistically increase wn->actrate
before transmitting.
- reject files smaller or equal to two bytes, they are bogus
- fix out of bounds access in shargs() when n <= 2
- only copy the bytes read into line buffer
- use nil for pointers instead of 0
imagereclaim(), pagereclaim():
- move imagereclaim() and pagereclaim() declarations to portfns.h
- consistently use ulong type for page counts
- name number of pages to free "pages" instead of "min"
- check for pages == 0 on entry
freepages():
- move pagechaindone() call to wakeup newpage() consumers inside
palloc critical section.
putimage():
- use long type for refcount
- reduce delay for channel hop to 200ms
- use 1000ms timeout for auth response (dont hop channels while we wait)
- bunny hop sequence is mathematically prooven
addresses va's of 0 and -BY2PG cause trouble with some memmove()/memset()
implementations and possibly other code because of the nil pointer
and end pointers wrapping to zero.
unlock()/iunlock():
we need to place the coherence() *before* "l->key = 0", so that any
stores that where done while holding the lock become observable
*before* other processors see the lock released.
cas()/tas():
place memory barrier before successfull return to prevent reordering.
making sure to close the dot in every kproc appears repetitive,
so instead stop inheriting the dot in kproc() as this is usually
never what you wanted in the first place.
give kernel processes and local disk file servers (procs
having noswap flag set) a clear advantage for page allocation
under starved condition by giving them ther own wait queue so
they get readied as soon as pages become available.
yoann padioleaus report on 9fans:
> I think I’ve found a bug in the network stack.
> in 9/ip/ip.h there is
> struct Ipht
> {
> Lock;
> Iphash *tab[Nipht];
> };
>
> where Night is 521,
>
> but then in 9/ip/ipaux.c there is
>
> ulong
> iphash(uchar *sa, ushort sp, uchar *da, ushort dp)
> {
> return ((sa[IPaddrlen-1]<<24) ^ (sp << 16) ^ (da[IPaddrlen-1]<<8) ^ dp ) % Nhash;
> }
>
> where Nhash is just 64,
prevent double sleep():
callers to sleep() need to be serialized as there can only
be one process sleeping at a time. plrlock and plwlock do
this.
wait for dma to complete in plwrite():
we have to wait for the dma to complete before touching
plbuf again.
maintain COPEN flag in archopen()/archclose():
when open fails because it was in use, clear the COPEN
flag, so archclose() wont screw stuff up.
David du Colombier wrote:
> The slowness issue only appears on the loopback, because
> it provides a 16384 MTU.
>
> There is an old bug in the Plan 9 TCP stack, were the TCP
> MSS doesn't take account the MTU for incoming connections.
>
> I originally fixed this issue in January 2015 for the Plan 9
> port on Google Compute Engine. On GCE, there is an unusual
> 1460 MTU.
>
> The Plan 9 TCP stack defines a default 1460 MSS corresponding
> to a 1500 MTU. Then, the MSS is fixed according to the MTU
> for outgoing connections, but not incoming connections.
>
> On GCE, this issue leads to IP fragmentation, but GCE didn't
> handle IP fragmentation properly, so the connections
> were dropped.
>
> On the loopback medium, I suppose this is the opposite issue.
> Since the TCP stack didn't fix the MSS in the incoming
> connection, the programs sent multiple small 1500 bytes
> IP packets instead of large 16384 IP packets, but I don't
> know why it leads to such a slowdown.
the intend of posting a note to the faulting process is to
interrupt the syscall to give the note handler a chance
to handle it. kernel processes however, have no note handlers
and all the postnote() does is set up->notepending which will
make the next attempt to sleep raise an Eintr[] error. this
is harmless, but usually not what we want.
there's no need to waste space for a error buffer in the Segio
structure, as the segmentio kproc will be waiting for the next
command after an error and will not overwite it until we issue
another command.
devproc's procctlmemio() did not handle physical segment
types correctly, as it assumed it can just kmap() the page
in question and write to it. physical segments however
need to be mapped uncached but kmap() will always map
cached as it assumes normal memory. on some machines with
aliasing memory with different cache attributes
leads to undefined behaviour!
we borrow the code from devsegment and provide a generic
segio() function to read and write user segments which
handles all the cases without using kmap by just spawning
a kproc that attaches the segment that needs to be read
from or written to. fault() will setup the right mmu
attributes for us. it will also properly flush pages for
segments that maintain instruction cache when written.
however, tlb's have to be flushed separately.
segio() is used for devsegment and devproc now, which
also allows for simplification of fixfault() as there is no
special error handling case anymore as fixfault() is now
called from faulting process *only*.
reads from /proc/$pid/mem can now span multiple pages.
code like "return g->dlen;" is wrong as we do not hold the
qlock of the global segment. another process could come in
and override g->dlen making us return the wrong byte count.
avoid copying when we already got a kernel address (kernel memory
is the same on processes) which is the case with bread()/bwrite().
this is the same optimization that devsd does.
also avoid allocating/freeing and copying while holding the qlock.
when we copy to/from user memory, we might fault preventing
others from accessing the segment while fault handling is in
progress.
walking the freelist for every page is too slow. as we
are freeing a range, we can do a single pass unlinking all
pages in our range and at the end, check if all pages
where freed, if not put the pages that we did free back
and retry, otherwise we'r done.
fixed segments are continuous in physical memory but
allocated in user pages. unlike shared segments, they
are not allocated on demand but the pages are allocated
on creation time (devsegment). fixed segments are
never swapped out, segfreed or resized and can only be
destroyed as a whole.
the physical base address can be discovered by userspace
reading the ctl file in devsegment.
this avoids listing the upper half of 64-bit membars
in Pcidev.mem[] array avoiding potential confusion
in drivers.
we also check if the upper half is programmed to zero
by bios and otherwise zap the entry in Pcidev.mem[]
and print a warning.
qemu puts multiboot data after the end of the kernel image, so
to be able to KADDR() that memory early, we extend the initial
identity mapping by 16K. right now we just got lucky with
the pc kernel as it rounds the map to 4MB pages.
when we switch to graphics mode, we do not want graphical arcs console
to print on the screen anymore as it assumes 8bit color mode and just
messes up the screen on kernel prints.
GEVector() saves the exception return PC in Ureg.r27 which needs
to be preserved.
there should be no reason for the user to change the status
register from noted() eigther, so we now just use setregisters()
in noted() to restore previous general purpose registers. this
means that CU1 will always be off after noted() because notify()
has disabled the FPU on entry and set fpstatus to FPinactive
if it was on. once user starts using FPU again, it will trap and
restore fpu registers.
touching transmit descriptors while dma is running causes the
front to fall off. new approach keeps a counter of free
descriptors in the Ring structure that is incremented
by txintr() when transmit completed.
txintr() will clean descriptors once dma has stopped and
restart dma when there are more descrtors in the chain.
this provides basic console support using the ARC bios routines
theu uartarcs driver. and has native seeq ethernet driver which
was written by reading the 2ed devseq driver as i have no
documentation on the hardware. mmu and trap code is based on the
routerboard kernel.
bootmkfile will now looks for the following proto files in order
and pick the first one it finds to build the bootfs.paq file:
1) $CONF.boofs.proto (config specific)
2) bootfs.proto (kernel specific)
3) $BOOTDIR/bootfs.proto (default generic)
the mount cache uses Page.va to store cached range offset and
limit, but mips kernel uses cache index bits from Page.va to
maintain page coloring. Page.va was not initialized by auxpage().
this change removes auxpage() which was primarily used only
by the mount cache and use newpage() with cache file offset
page as va so we will get a page of the right color.
mount cache keeps the index bits intact by only using the top
and buttom PGSHIFT bits of Page.va for the range offset/limit.
when we are skipping a process because we could not acquire
its segment lock, dont call reclaim() again (which is pointless
as we didnt pageout any pages), instead try the next process.
the Pte.last pointer is inclusive, so don't miss the last page
in pageout().
when building bootfs in d770 mode directory, the other permissions
in bootfs paq are masked off which results in boot to fail. theres
no point in checking group/other permissions on boot, so just disable
permissin checking in paqfs with the -a flag.
mcountseg(), mfreeseg():
use Pte.first/last pointers when possible and avoid constructs
like s->map[i]->pages[j].
freepte():
do not zero entries in freepte(), the segment is going away and
here is no point in zeroing page pointers. hoist common code at
the top avoiding duplication.
segpage(), fixfault():
avoid load after store for Pte** pointer.
fixfault():
return -1 in default case to avoid the "used but not set" warning
for mmuphys and get rid of the useless initialization.
syssegflush():
due to len being unsigned, the pe = PGROUND(pe) can make "chunk"
bigger than len causing a overflow. rewrite the function and deal
with page alignment and errors at the beginning.
syssegflush(), segpage(), fixfault(), putseg(), relocateseg(),
mcountseg(), mfreeseg():
keep naming consistent.
the "to" address can overflow in syssegfree() causing wrong
number of pages to be passed to mfreeseg(). with the current
implementation of mfreeseg() however, this doesnt cause any
data corruption but was just freeing an unexpected number of
pages.
this change checks for this condition in syssegfree() and
errors out instead. also mfreeseg() was changed to take
ulong argument for number of pages instead of int to keep
it consistent with other routines that work with page counts.
sdbio() tests if it can pass the buffer pointer directly to
the driver when it is already in kernel memory. we also need
to check if the buffer is properly aligned but alignment
requirement is handled in system specific sdmalloc() and
was not known to devsd.
to solve this, we *always* page align sd buffers and get rid
of the system specific sdmalloc() macro (was only used in bcm
kernel).
ignore physical segments in mcountseg() and mfreeseg(). physical
segments are not backed by user pages, and doing putpage() on
physical segment pages in mfreeseg() is an error.
do now allow physical segemnts to be resized. the segment size
is only checked in segattach() to be within the physical segment!
ignore physical segments in portcountpagerefs() as pagenumber()
does not work on the malloced page structures of a physical segment.
get rid of Physseg.pgalloc() and Physseg.pgfree() indirection as
this was never used and if theres a need to do more efficient
allocation, it should be done in a portable way.