The new pci code is moved to port/pci.[hc] and shared by
all ports.
Each port has its own PCI controller implementation,
providing the pcicfgrw*() functions for low level pci
config space access. The locking for pcicfgrw*() is now
done by the caller (only port/pci.c).
Device drivers now need to include "../port/pci.h" in
addition to "io.h".
The new code now checks bridge windows and membars,
while enumerating the bus, giving the pc driver a chance
to re-assign them. This is needed because some UEFI
implementations fail to assign the bars for some devices,
so we need to do it outselfs. (See pcireservemem()).
While working on this, it was discovered that the pci
code assimed the smallest I/O bar size is 16 (pcibarsize()),
which is wrong. I/O bars can be as small as 4 bytes.
Bit 1 in an I/O bar is also reserved and should be masked off,
making the port mask: port = bar & ~3;
widen and move the KMAP window to a new address so we can
handle the 8GB of physical memory of the new raspberry pi4.
the new memory map on pi4 uses the following 4 banks:
0x000000000 0x03e600000
0x040000000 0x0fc000000 <- soc.dramsize (only < 4GB)
0x100000000 0x180000000
0x180000000 0x200000000
On the 8GB variant of the raspberry pi 4,
the eeprom chip for the xhci controller is missing and
instead loaded from sdram (by the gpu firmware).
for this, the gpu firmware needs to be notified of
the xhci controllers pci bus address (after reset)
that was assigned by our pci enumeration code.
spectacular bug. cmpswap() had a sign extension bug
using sign extending MOV to load the old compare
value and LDXRW using zero extension while the CMP
instruction compared 64 bit registers.
this caused cmpswap with negative old value always
to fail.
interestingly, libc's version of this function was
fine.
replace machine specific userinit() by a portable
implemntation that uses kproc() to create the first
process. the initcode text is mapped using kmap(),
so there is no need for machine specific tmpmap()
functions.
initcode stack preparation should be done in init0()
where the stack is mapped and can be accessed directly.
replacing the machine specific userinit() allows some
big simplifications as sysrfork() and kproc() are now
the only callers of newproc() and we can avoid initializing
fields that we know are being initialized by these
callers.
rename autogenerated init.h and reboot.h headers.
the initcode[] and rebootcode[] blobs are now in *.i
files and hex generation was moved to portmkfile. the
machine specific mkfile only needs to specify how to
build rebootcode.out and initcode.out.
there was a small window between modifying mmutop and switching the
asid where the core could bring in the new entries under the old asid
into the tlb due to speculation / prefetching.
this change moves the entering of the page tables into mmutop after
setttbr() to prevent this scenario.
due to us switching to the resereved asid 0 on procsave()->putasid(),
the only asid that could have potentially been poisoned would be asid 0
which does not have any user mappings. so this did not show any noticable
effect.
fault() now has an additional pc argument that is
used to detect fault on a non-executable segment.
that is, we check on read fault if the segment
has the SG_NOEXEC attribute and the program counter
is within faulting page.
a portable SG_NOEXEC segment attribute was added to allow
non-executable (physical) segments. which will set the
PTENOEXEC bits for putmmu().
in the future, this can be used to make non-executable
stack / bss segments.
the SG_DEVICE attribute was added to distinguish between
mmio regions and uncached memory. only matterns on arm64.
on arm, theres the issue that PTEUNCACHED would have
no bits set when using the hardware bit definitions.
this is the reason bcm, kw, teg2 and omap kernels use
arteficial PTE constants. on zynq, the XN bit was used
as a hack to give PTEUNCACHED a non-zero value and when
the bit is clear then cache attributes where added to
the pte.
to fix this, PTECACHED constant was added.
the portable mmu code in fault.c will now explicitely set
PTECACHED bits for cached memory and PTEUNCACHED for
uncached memory. that way the hardware bit definitions
can be used everywhere.
on the 2GB and 4GB raspberry pi 4 variants, there are two
memory regions for ram:
[0x00000000..0x3e600000)
[0x40000000..0xfc000000)
the framebuffer is somewhere at the end of the first
GB of memory.
to handle these, we append the region base and limit
of the second region to *maxmem= like:
*maxmem=0x3e600000 0x40000000 0xfc000000
the mmu code has been changed to have non-existing
ram unmapped and mmukmap() now uses small 64K pages
instead of 512GB pages to avoid aliasing (framebuffer).
the VIRTPCI mapping has been removed as we now have
a proper vmap() implementation which assigns vritual
addresses automatically.
this adds a 4GB KMAP window into the kernel address space
so we can access all physical ram on raspberry pi 4 for
user pages.
note that kernel memory above KZERO is still limited
to 1GB because of DMA restrictions.
the raspberry pi 4 has a new interrupt controller and
pci support, so get rid of intrenable() macro and
properly make intrenable function with tbdf argument.
the raspberry pi4 firmware refuses to enable the GIC interrup controller
for arm64 when the .img file is not a multiple of 4 bytes. yes, this
is insane and nowhere documented.
the new raspberry pi 4 firmware for arm64 seems to have
broken atag support. so we now parse the device tree
structure to get the bootargs and memory configuration.
when the exclusive monitor is cleared, a event is generated
which we can use to wake up idlehands. that way we do not
need to wait for the next timer interrupt until a cpu takes
work from the run queue.