To avoid a MAXMACH limit of 32 and make
txtflush into an array for the bitmap.
Provide portable macros for testing and clearing
the bits: needtxtflush(), donetxtflush().
On pc/pc64, define inittxtflush()/settxtflush()
as no-op macros, avoiding the storage overhead of
the txtflush array alltogether.
SSL is implemented by devssl. It's extremely
obsolete by now, and is not used anywhere but
cpu, import, and oexportfs.
This change strips out the devssl bits, but
does not (yet) remove the code from libsec.
This makes vmap()/vunmap() take a vlong size argument,
and change the type of Pci.mem[].size to vlong as well.
Even if vmap() wont support large mappings, it is nice to
get the original unruncated value for error checking.
pc64 needs a bigger VMAP window, as system76 pangolin
puts the framebuffer at a physical address > 512GB.
The new interface uses pci capability structures to locate the
registers in a rather fine granular way making it more complicated
as they can be located anywhere in any pci bar at any offset.
As far as i can see, qemu (6.0.50) never uses i/o bars in
non-legacy mode, so only mmio is implemented for now.
The previous virtio drivers implemented the legacy interface only
which uses i/o ports for all register accesses. This is still
the preferred method (and also qemu default) as it is easier to
emulate and most likely faster.
However, some vps providers like vultr force the legacy interface
to disabled with qemu -device option "disable-legacy=on" resulting
on a system without a disk and ethernet.
Remove unused fields and factor common fields into a
new PMach struct in port/portdat.h.
The fields machno, splpc and proc are not moved to
PMach as they are part of the known offsets from
assembly (l.s).
We can take advantage of the fact that xinit() allocates
kernel memory from conf.mem[] banks always at the beginning
of a bank, so the separate palloc.mem[] array can be eleminated
as we can calculate the amount of non-kernel memory like:
upages = cm->npage - (PGROUND(cm->klimit - cm->kbase)/BY2PG)
for the number of reserved kernel pages,
we provide the new function: ulong nkpages(Confmem*)
This eleminates the error case of running out of slots in
the array and avoids wasting memory in ports that have simple
memory configurations (compared to pc/pc64).
This adds the new function pointer PCArch.clockinit(),
which is a timer dependent initialization routine.
It also takes over the job of guesscpuhz(). This way, the
architecture ident code can switch between different
timers (i8253, HPET and XEN timer).
we might as well handle the per process cycle
counter in the portable part instead of duplicating the code
in every arch and have inconsistent implementations.
we now have a portable kenter() and kexit() function,
that is ment to be used in trap/syscall from user,
which updates the counters.
some kernels missed initializing Mach.cyclefreq.
the tulip driver is used in microsofts hypver-v
as the legacy ethernet adapter for pxe booting.
to make the driver work on pc64, we need to
store the Block* pointers in a separate array
instead of stuffing them into buffer address 2
of the hardware descriptor.
also, enable the driver in the pc64 kernel.
This implements proper intrdisable() support for all
interrupt controllers.
For enable, (*arch->intrassign)(Vctl*) fills in the
Vctl.enable and Vctl.disable pointers with the
appropriate routines and returns the assigned
vector number.
Once the Vctl struct has been linked to its vector
chain, Vctl.enable(Vctl*, shared) gets called with a
flag if the vector has been already enabled (shared).
This order is important here as enabling the interrupt
on the controller before we have linked the chain can
cause spurious interrupts, expecially on mp system
where the interrupt can target a different cpu than
the caller of intrenable().
The intrdisable() case is the other way around.
We first disable the interrupt on the controller
and after that unlink the Vctl from the chain.
On a multiprocessor, the xfree() of the Vctl struct
is delayed to avoid freeing it while it is still
in use by another cpu.
The xen port now also uses pc/irq.c which has been
made generic enougth to handle xen's irq scheme.
Also, archgeneric is now a separate file to avoid
pulling in dependencies from the 8259 interrupt
controller code.
It appears that our IDT overlaps with the data structures
passed from grub in multiboot load.
So defer setup of the interrupt table after the multiboot
parameters have been processed.
loading the interrupt vector table early allows
us to handle traps during bootup before mmuinit()
which gives better diagnostics for debugging.
we also can handle general protection fault on
rdmsr() and wrmsr() which helps during
cpuidentify() and archinit() when probing for
cpu features.
With some newer UEFI firmware, not all pci bars get
programmed and we have to assign them ourselfs.
This was already done for memory bars. This change
adds the same for i/o port space, by providing a
ioreservewin() function which can be used to allocate
port space within the parent pci-pci bridge window.
Also, the pci code now allocates the pci config
space i/o ports 0xCF8/0xCFC so userspace needs to
use devpnp to access pci config space now. (see
latest realemu change).
Also, this moves the ioalloc()/iofree() code out
of devarch into port/iomap.c as it can be shared
with the ppc mtx kernel.
The new pci code is moved to port/pci.[hc] and shared by
all ports.
Each port has its own PCI controller implementation,
providing the pcicfgrw*() functions for low level pci
config space access. The locking for pcicfgrw*() is now
done by the caller (only port/pci.c).
Device drivers now need to include "../port/pci.h" in
addition to "io.h".
The new code now checks bridge windows and membars,
while enumerating the bus, giving the pc driver a chance
to re-assign them. This is needed because some UEFI
implementations fail to assign the bars for some devices,
so we need to do it outselfs. (See pcireservemem()).
While working on this, it was discovered that the pci
code assimed the smallest I/O bar size is 16 (pcibarsize()),
which is wrong. I/O bars can be as small as 4 bytes.
Bit 1 in an I/O bar is also reserved and should be masked off,
making the port mask: port = bar & ~3;
we have to disable interrupts during mmuwalk() of user pages
as we can get preempted during mmu walk and the original
m->pml4 might become one of a different process.
the page attribute table was initialized in mmuinit(), which is
too late for bootscreen(). So now we check for PAT support and
insert the write-combine entry early in cpuidentify().
this might have been the cause of some slow EFI framebuffers on
machines with overlapping or insufficient MTRR entries.
The swcursor used a 32x32 image for saving/restoring
screen contents for no reason.
Add a doflush argument to swcursorhide(), so that
disabling software cursor with a double buffered
softscreen is properly hidden. The doflush parameter
should be set to 0 in all other cases as swcursordraw()
will flushes both (current and previours) locations.
Make sure swcursorinit() and swcursorhide() clear the
visibility flag, even when gscreen is nil.
Remove the cursor locking and just do everything within
the drawlock. All cursor functions such as curson(),
cursoff() and setcursor() will be called drawlock
locked. This also means &cursor can be read.
Fix devmouse cursor reads and writes. We now have the
global cursor variable that is only modified under
the drawlock. So copy under drawlock.
Move the pc software cursor implementation into vgasoft
driver, so screen.c does not need to handle it as
a special case.
Remove unused functions such as drawhasclients().
most pc's are multiprocessors these days, that use apic or
msi interrupts, then the irq does not matter anymore. and
uefi does not even assign irq to pci devices anymore. if
we have a problem enabling an interrupt, we will print.
This replaces the memory map code for both pc and pc64
kernels with a unified implementation using the new
portable memory map code.
The main motivation is to be robust against broken
e820 memory maps by the bios and delay the Conf.mem[]
allocation after archinit(), so mp and acpi tables
can be reserved and excluded from user memory.
There are a few changes:
new memreserve() function has been added for archinit()
to reserve bios and acpi tables.
upareserve() has been replaced by upaalloc(), which now
has an address argument.
umbrwmalloc() and umbmalloc() have been replaced by
umballoc().
both upaalloc() and umballoc() return physical addresses
or -1 on error. the physical address -1 is now used as
a sentinel value instead of 0 when dealing with physical
addresses.
archmp and archacpi now always use vmap() to access
the bios tables and reserve the ranges. more overflow
checks have been added.
ramscan() has been rewritten using vmap().
to handle the population of kernel memory, pc and pc64
now have pmap() and punmap() functions to do permanent
mappings.
replace machine specific userinit() by a portable
implemntation that uses kproc() to create the first
process. the initcode text is mapped using kmap(),
so there is no need for machine specific tmpmap()
functions.
initcode stack preparation should be done in init0()
where the stack is mapped and can be accessed directly.
replacing the machine specific userinit() allows some
big simplifications as sysrfork() and kproc() are now
the only callers of newproc() and we can avoid initializing
fields that we know are being initialized by these
callers.
rename autogenerated init.h and reboot.h headers.
the initcode[] and rebootcode[] blobs are now in *.i
files and hex generation was moved to portmkfile. the
machine specific mkfile only needs to specify how to
build rebootcode.out and initcode.out.
when a process does an exec syscall, procsetup() is called and
we have to disable the debug watchpoint registers. just clearing
p->dr is not enougth as we are not going thru a procsave() and
procrestore() cycle which would disable and reload the saved
debug registers.
instead of clearing debug registers in procfork(), we should
clear the saved debug registers before a process goes to die
(pexit() calls sched() with up->state = Moribund) as the Proc
structure can get reused for kernel processes (kproc) which
never call procfork() and would therefore have debug registers
loaded.
pexit() and pprint() can get called outside of a syscall
(from procctl()) with a process that is in active note
handling and require floating point in the kernel on amd64
for aesni (devtls).
the idea is to catch bugs and make kernel exploitation
harder by mapping the kernel text section readonly
and everything else no-execute.
l.s maps the KZERO address space using 2MB pages so
to get the 4K granularity for the text section we use
the new ptesplit() function to split that mapping up.
we need to set EFER no-execute enable bit early
in apbootstrap so secondary application processors
will understand the NX bit in our shared kernel page
tables. also APBOOTSTRAP needs to be kept executable.
rebootjump() needs to mark REBOOTADDR page executable.
fault() now has an additional pc argument that is
used to detect fault on a non-executable segment.
that is, we check on read fault if the segment
has the SG_NOEXEC attribute and the program counter
is within faulting page.