on thinkpad x1v4, the PCMP structure resides in upper reserved memory
pa=0xd7f49000 - while system memory ends at 0x0ffff000; so we have to
vmap() it instead of KADDR().
the RSD structure for ACPI might reside in low memory, so we sould
KADDR() in that case.
on some modern machines like the x250, the bios arranges the mtrr's
and the framebuffer membar in a way that doesnt allow us to mark
the framebuffer pages as write combining, leading to slow graphics.
since the pentium III, the processor interprets the page table bit
combinations of the WT, CD and bit7 bits as an index into the
page attribute table (PAT).
to not change the semantics of the WT and CD bits, we preserve
the bit patterns 0-3 and use the last entry 7 for write combining.
(done in mmuinit() for each core).
the new patwc() function takes virtual address range and changes
the page table marking the range as write combining. no attempt
is made on invalidating tlb's. doesnt matter in our case as the
following mtrr() call in screen.c does it for us.
apparently, this causes some quadcore ramnode vm to hang on boot,
even tho all cores successfully started up and are operational.
i suspect some side effect from timersinit()... this would also
mean *notsc= would break it (syncclock() would continue)...
its unclear.
i'm reverting this for now until the problem is better understood.
when testing in qemu, launching each ap became slower and slower
because all the ap's where spinning in syncclock() waiting for
cpu0 to update its mach0->tscticks, which happens only much later
after all cpu's have been started up.
now we wait for each cpu to do its timer callibration and
manually update our tscticks while we wait and each cpu will
not spin but halt while waiting for active.thunderbirdsarego.
this reduces the system load and noise for timer callibration
and makes the mp startup linear with regard to the number of
cores.
introduce cpushutdown() function that does the common
operation of initiating shutdown, returning once all
cpu's got the message and are about to shutdown. this
avoids duplicated code which isnt really machine specific.
automatic reboot on panic only when *debug= is not set
and the machine is a cpu server or has no display,
otherwise just hang.
the psaux driver is not used in any kernel configuration and theres
no userspace mouse daemon. i8042auxcmds() is wrong as access
to the user buffer can fault and we are holding an ilocks.
little cleanups in devkbd.
on vmware, loading a new kernel sometimes reboots when
wiggling the mouse. disabling keyboard and mouse on
shutdown fixes the issue.
make sure ps2 mouse is disabled on init, will get re-enabled
in i8042auxenable().
keyboard isnt special anymore, we can just use the devreset
entry point in the device to do the keyboard initialization,
so kbdinit()/kbdenable() are not needed anymore.
Wnode gets two new counters: txcount and txerror
and actrate pointer that will be between minrate
and maxrate.
driver should use actrate instead of maxrate for
transmission when it can provide error feedback.
when a driver detects a transmission failed, it calls
wifitxfail() with the original packet. wifitxfail() then
reduces wn->actrate.
every 256th packet, we optimistically increase wn->actrate
before transmitting.
qemu puts multiboot data after the end of the kernel image, so
to be able to KADDR() that memory early, we extend the initial
identity mapping by 16K. right now we just got lucky with
the pc kernel as it rounds the map to 4MB pages.
the following hooks have been added to the ehci Ctlr
structore to handle cache coherency (on arm):
void* (*tdalloc)(ulong,int,ulong);
void* (*dmaalloc)(ulong);
void (*dmafree)(void*);
void (*dmaflush)(int,void*,ulong);
tdalloc() is used to allocate descriptors and the periodic
frame schedule array. on arm, this needs to return uncached
memory. tdalloc()ed memory is never freed.
dmaalloc()/dmafree() is used for io buffers. this can return
cached memory when when hardware maintains cache coherency (pc)
or dmaflush() is provided to flush/invalidate the cache (zynq),
otherwise needs to return uncached memory.
dmaflush() is used to flush/invalidate the cache. the first
argument tells us if we need to flush (non zero) or
invalidate (zero).
uncached.h is gone now. this change makes the handling explicit.
there are no kernels currently that do page coloring,
so the only use of cachectl[] is flushing the icache
(on arm and ppc).
on pc64, cachectl consumes 32 bytes in each page resulting
in over 200 megabytes of overhead for 32gb of ram with 4K
pages.
this change removes cachectl[] and adds txtflush ulong
that is set to ~0 by pio() to instruct putmmu() to flush
the icache.
intrdisable() will always be able to unregister the interrupt
now, so there is no reason to have it return an error value.
all drivers except uart8250 already assumed it to never fail
and theres no need to maintain that complexity.
mpshutdown() used to call acpireset() making it impossible to build
a kernel without archacpi. now, mpshutdown() is a helper function
that only shuts down the application processors that gets used from
mpreset() and acpireset().
the generic machine reset code in exported by devarch's archreset()
function that is called by mpreset() and from acpireset() as a fallback.
so the code duplication that was in mpshutdown() is avoided.
there is no use for "bootdisk" variable parametrization
of /boot/boot and no point for the boot section with its
boot methods in the kernel configuration anymore. so
mkboot and boot$CONF.out are gone.
move the rules for bootfs.paq creation in 9/boot/bootmkfile.
location of bootfs.proto is now in 9/boot/bootfs.proto.
our /boot/boot target is now just "boot".
we add new function convmemsize() that returns the size of
*usable* conventional memory that does some sanity checking
and reserves the last KB below the top of memory pointer.
this avoids lowraminit() overriding potential bios tables
and sigsearch() going off the rails looking for tables
at above 640K.
traditionally, the pc kernel mapped the first 8MB of physical
address space. when the kernel size grows beyond that memory mapping,
it will crash on boot and theres no checking in the build process
making sure it fits.
with the pc64 kernel, it is not hard to always map the whole
kernel memory image from KZERO to end[], so that the kernel will
always fit into the initial mapping.
x230 booted in efi only (no csp) mode hangs
when traditional i8042reset() keyboard reset
is tried.
so we try acpireset() first which discoveres
and writes the acpi reset register.
to make it possible to mark the bootscreen framebuffer
as write combining in early initialization, mtrr() is
changed not not to error() but to return an error string.
as bootscreen() is used before multiprocessor initialization,
we have to synchronize the mtrr's for every processor as
it comes online. for this, a new mtrrsync() function is
provided that is called from cpuidentify() if mtrr support
is indicated.
the boot processor runs mtrrsync() which snarfs the
registers. later, mtrrsync() is run again from the
application processors which apply the values from the
boot processor.
checkmtrr() from mp.c was removed as its task is also
done by mtrrsync() now.
rampage() cannot be used after meminit(), so test for
conf.mem[0].npage != 0 and use xalloc()/mallocalign()
instead. this allows us to use vmap() early before
mmuinit() which is needed for bootscreeninit() and
acpi.
to get memory for page tables, pc64 needs a lowraminit().
with EFI, the RSDT pointer is passed in *acpi= parameter
from the efi loader. as the RSDT is ususally at the end of
the physical address space (and not to be found in
bios areas), we cannot KMAP() it so we need to vmap().
_sl reported crash:
stats 593: suicide: sys: trap: fault write addr=0xffffffff8258d1b0 pc=0x204cc7
; acid 593
/proc/593/text:amd64 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/amd64
acid: lstk()
notejmp(ret=0x1,j=0x40ac90)+0x13 /sys/src/libc/amd64/notejmp.c:10
alarmed(a=0xffffffff8258d1b0,s=0x7ffffeffea58)+0x3f /sys/src/cmd/stats.c:718
notifier+0x3e /sys/src/libc/port/atnotify.c:15
acid:
note how a in alarmed is a kernel address!
the first Ureg* argument is passed to the note handler in the
RARG (BX) register, which was not loaded when returning to
userspace from syscall() thru forkret(). fix by returning thru
noteret() from syscall().
the 6c compiler reserves R14 and R15 for extern register variables,
which is used by the kernel to hold the m and up pointers. until
now, the meaning of R14 and R15 was undefined for userspace and
extern register would not work as the kernel trashes R14 and R15
on syscalls. with this change, user extern registers R14 and R15
are zeroed on exec and otherwise preserved across syscalls. so
userspace *could* use them for per process variables like the
kernel does.
use Ureg.bp (RARG) for syscall number instead of Ureg.ax. this is
less confusing and mirrors the amd64 calling convention.
as with the Block refcount changes, _xinc() and _xdec() arent
used anymore, so remove them.
architecure can still define ainc()/adec() when it needs them.
the palloc.pages array takes arround 5% of the upages which
gives us:
16GB = ~0.8GB
32GB = ~1.6GB
64GB = ~3.2GB
we only have 2GB of address space above KZERO so this will not
work for long.
instead, pageinit() was altered to accept a preallocated memory
in palloc.pages. and preallocpages() in pc64/main.c allocates the
in upages memory, mapping it in the VMAP area (which has 512GB).
the drawback is that we cannot poke at Page structures now from
/proc/n/mem as the VMAP area is not accessible from it.
as we do system reset and reboot only from boot processor cpu0 now,
theres no need for active.rebooting conditional variable.
mpshutdown() will unconditionally park application processors and
and cpu0 boots the new kernel or calls mpshutdown() causing system
reset.
number of bank slots in Conf.mem[4] was too small
for kenjis machine, set it to maximum 16 (the
size of the RAM map in pc64/memory.c).
also increasing the UPA memory map to 64. the
e820 map on my x200s has 31 entries and many
holes. this gets rid of the "mapfree: ... losing"
messages on boot.
the rule that was used to copy header files from ../pc
accidently overwrote dat.h when ../pc/dat.h was updated
because it matched on all *.h files that was also found
in ../pc directory. change to exact match on $PCHEADERS
to prevent this.
do not store Block* pointer in packet descriptor, assumed
pointer would fit in a long. we use pointer table now to
record the Block* pointer and store index instead.
the comment about Physseg.size being in pages is wrong,
change type to uintptr and correct the comment.
change the length parameter of segattach() and isoverlap()
to uintptr as well. segments can grow over 4GB in pc64 now
and globalsegattach() in devsegment calculates len argument
of isoverlap() by s->top - s->bot. note that the syscall
still takes 32bit ulong argument for the length!
check for integer overflow in segattach(), make sure segment
goes not beyond USTKTOP.
change PTEMAPMEM constant to uvlong as it is used to calculate
SEGMAXSIZE.
modifying the kernel pdp (CPU0PDP) hangs vmware. so
we initialize the pdp with KZERO and KZERO+1GB map
in l.s and never change it. (except when removing
the zero double map which seems to work).
VMAP has its own pdp now allowing to map 512GB of
physical address space. this simplifies the code
a bit and gives nice virtual addresses.
on intel processors, a general protection exception is fired if a non-canonical address is loaded into PC during SYSRET. this will cause the kernel to panic.
see http://www.kb.cert.org/vuls/id/649219 and the intel software developer manual for more information.
kmapindex has to be per process, not per mach, as the process
can be switched to another processor while the mapping is
established.
to bootstrap the first process, we have to temporarily set up
so the kmap MMU's can be attached to the process. previously
we assumed that the first two pages for the initial process
where below 2GB and could be accessed with KADDR() directly.
with 16GB machine, all the 2GB above KZERO are dedicated to
the kernel so the user pages returned by newpage() need to
be mapped.
we have to keep kmap page tables in ther own list
because user tables are subject to (virtual) tlb flushing.
we never free kmap page tables except in mmurelease()
where we just link the kmap mmu list in front of the user
mmus and call mmufree() which will free all the mmu's
of the process.