The new pci code is moved to port/pci.[hc] and shared by
all ports.
Each port has its own PCI controller implementation,
providing the pcicfgrw*() functions for low level pci
config space access. The locking for pcicfgrw*() is now
done by the caller (only port/pci.c).
Device drivers now need to include "../port/pci.h" in
addition to "io.h".
The new code now checks bridge windows and membars,
while enumerating the bus, giving the pc driver a chance
to re-assign them. This is needed because some UEFI
implementations fail to assign the bars for some devices,
so we need to do it outselfs. (See pcireservemem()).
While working on this, it was discovered that the pci
code assimed the smallest I/O bar size is 16 (pcibarsize()),
which is wrong. I/O bars can be as small as 4 bytes.
Bit 1 in an I/O bar is also reserved and should be masked off,
making the port mask: port = bar & ~3;
we have to disable interrupts during mmuwalk() of user pages
as we can get preempted during mmu walk and the original
m->pml4 might become one of a different process.
the page attribute table was initialized in mmuinit(), which is
too late for bootscreen(). So now we check for PAT support and
insert the write-combine entry early in cpuidentify().
this might have been the cause of some slow EFI framebuffers on
machines with overlapping or insufficient MTRR entries.
The swcursor used a 32x32 image for saving/restoring
screen contents for no reason.
Add a doflush argument to swcursorhide(), so that
disabling software cursor with a double buffered
softscreen is properly hidden. The doflush parameter
should be set to 0 in all other cases as swcursordraw()
will flushes both (current and previours) locations.
Make sure swcursorinit() and swcursorhide() clear the
visibility flag, even when gscreen is nil.
Remove the cursor locking and just do everything within
the drawlock. All cursor functions such as curson(),
cursoff() and setcursor() will be called drawlock
locked. This also means &cursor can be read.
Fix devmouse cursor reads and writes. We now have the
global cursor variable that is only modified under
the drawlock. So copy under drawlock.
Move the pc software cursor implementation into vgasoft
driver, so screen.c does not need to handle it as
a special case.
Remove unused functions such as drawhasclients().
most pc's are multiprocessors these days, that use apic or
msi interrupts, then the irq does not matter anymore. and
uefi does not even assign irq to pci devices anymore. if
we have a problem enabling an interrupt, we will print.
This replaces the memory map code for both pc and pc64
kernels with a unified implementation using the new
portable memory map code.
The main motivation is to be robust against broken
e820 memory maps by the bios and delay the Conf.mem[]
allocation after archinit(), so mp and acpi tables
can be reserved and excluded from user memory.
There are a few changes:
new memreserve() function has been added for archinit()
to reserve bios and acpi tables.
upareserve() has been replaced by upaalloc(), which now
has an address argument.
umbrwmalloc() and umbmalloc() have been replaced by
umballoc().
both upaalloc() and umballoc() return physical addresses
or -1 on error. the physical address -1 is now used as
a sentinel value instead of 0 when dealing with physical
addresses.
archmp and archacpi now always use vmap() to access
the bios tables and reserve the ranges. more overflow
checks have been added.
ramscan() has been rewritten using vmap().
to handle the population of kernel memory, pc and pc64
now have pmap() and punmap() functions to do permanent
mappings.
replace machine specific userinit() by a portable
implemntation that uses kproc() to create the first
process. the initcode text is mapped using kmap(),
so there is no need for machine specific tmpmap()
functions.
initcode stack preparation should be done in init0()
where the stack is mapped and can be accessed directly.
replacing the machine specific userinit() allows some
big simplifications as sysrfork() and kproc() are now
the only callers of newproc() and we can avoid initializing
fields that we know are being initialized by these
callers.
rename autogenerated init.h and reboot.h headers.
the initcode[] and rebootcode[] blobs are now in *.i
files and hex generation was moved to portmkfile. the
machine specific mkfile only needs to specify how to
build rebootcode.out and initcode.out.
when a process does an exec syscall, procsetup() is called and
we have to disable the debug watchpoint registers. just clearing
p->dr is not enougth as we are not going thru a procsave() and
procrestore() cycle which would disable and reload the saved
debug registers.
instead of clearing debug registers in procfork(), we should
clear the saved debug registers before a process goes to die
(pexit() calls sched() with up->state = Moribund) as the Proc
structure can get reused for kernel processes (kproc) which
never call procfork() and would therefore have debug registers
loaded.
pexit() and pprint() can get called outside of a syscall
(from procctl()) with a process that is in active note
handling and require floating point in the kernel on amd64
for aesni (devtls).
the idea is to catch bugs and make kernel exploitation
harder by mapping the kernel text section readonly
and everything else no-execute.
l.s maps the KZERO address space using 2MB pages so
to get the 4K granularity for the text section we use
the new ptesplit() function to split that mapping up.
we need to set EFER no-execute enable bit early
in apbootstrap so secondary application processors
will understand the NX bit in our shared kernel page
tables. also APBOOTSTRAP needs to be kept executable.
rebootjump() needs to mark REBOOTADDR page executable.
fault() now has an additional pc argument that is
used to detect fault on a non-executable segment.
that is, we check on read fault if the segment
has the SG_NOEXEC attribute and the program counter
is within faulting page.
a portable SG_NOEXEC segment attribute was added to allow
non-executable (physical) segments. which will set the
PTENOEXEC bits for putmmu().
in the future, this can be used to make non-executable
stack / bss segments.
the SG_DEVICE attribute was added to distinguish between
mmio regions and uncached memory. only matterns on arm64.
on arm, theres the issue that PTEUNCACHED would have
no bits set when using the hardware bit definitions.
this is the reason bcm, kw, teg2 and omap kernels use
arteficial PTE constants. on zynq, the XN bit was used
as a hack to give PTEUNCACHED a non-zero value and when
the bit is clear then cache attributes where added to
the pte.
to fix this, PTECACHED constant was added.
the portable mmu code in fault.c will now explicitely set
PTECACHED bits for cached memory and PTEUNCACHED for
uncached memory. that way the hardware bit definitions
can be used everywhere.
preallocate 2% of user pages for page tables and MMU structures
and keep them mapped in the VMAP range. this leaves more space
in the KZERO window and avoids running out of kernel memory on
machines with large amounts of memory.
the temporary stack segment used to be at a fixed address above or
below the user stack. these days, the temp stack is mapped dynamically
by sysexec so TSTKTOP is obsolete.
instead of having application processors spin in mpshutdown()
with mmu on, and be subject to reboot() overriding kernel text
and modifying page tables, park the application processors in
rebootcode idle loop with the mmu off.
pcienable() puts a device in fully powered on state
and does some missing initialization that UEFI might
have skipped such as I/O and Memory requests being
disabled.
pcidisable() is ment to shutdown the device, but
currently just disables dma to prevent accidents.
nobody passes us the "RSD PTR " address when doing multiboot/kexec
on UEFI systems. so we search for it manually in the ACPI reserved
area as indicated in the e820 memory map.
this driver makes regions of physical memory accessible as a disk.
to use it, ramdiskinit() has to be called before confinit(), so
that conf.mem[] banks can be reserved. currently, only pc and pc64
kernel use it, but otherwise the implementation is portable.
ramdisks are not zeroed when allocated, so that the contents are
preserved across warm reboots.
to not waste memory, physical segments do not allocate Page structures
or populate the segment pte's anymore. theres also a new SG_CHACHED
attribute.
fpurestore() unconditionally changed fpstate to FPinactive when
the kernel used the FPU. but in the FPinit case, the registers are
not saved by mathemu(), resulting in all zero initialized registers
being loaded once userspace uses the FPU so the process would have
wrong MXCR value.
the index overflow check was wrong with using shifted value.
the only architecture dependence of devether was enabling interrupts,
which is now done at the end of the driver's reset() function now.
the wifi stack and dummy ethersink also go to port/.
do the IRQ2->IRQ9 hack for pc kernels in intrenabale(), so not
every caller of intrenable() has to be aware of it.
flushing tlb once the index wraps arround is not enougth
as in use pte's can be speculatively loaded. so instead
use invlpg() and explicitely invalidate the tlb of the
page mapped.
this fixes wired mount cache corruption for reads approaching
2MB which is the size of the KMAP window.
invlpg() was broken, using wrong operand type.