according to the following linux change, BCM2711 uses a different
method for changing pullup/down mode:
abcfd09286 (diff-cf078559c38543ac72c5db99323e236d)
gpiomeminit() was broken, using virtual address for the gpio physseg
instead of the physical one.
cleanup the code, avoid repetition by declaring static u32int *regs
variable. make local variable names consistent.
the raspberry pi 4 has a new interrupt controller and
pci support, so get rid of intrenable() macro and
properly make intrenable function with tbdf argument.
the raspberry pi4 firmware refuses to enable the GIC interrup controller
for arm64 when the .img file is not a multiple of 4 bytes. yes, this
is insane and nowhere documented.
the new raspberry pi 4 firmware for arm64 seems to have
broken atag support. so we now parse the device tree
structure to get the bootargs and memory configuration.
Ori Bernstein had Sunrise Point-H USB 3.0 xHCI Controller that would mysteriously
crash on the 5th ENABLESLOT command. This was reproducable by even just allocating
slots in a loop right after init.
It turns out, the 1.2 spec extended the Max Scratchpad Buffers in HCSPARAMS2 so our
driver would not allocate enougth scratchpad buffers and controller firmware would
crash once it went beyond our allocated scratchpad buffer array.
This change also fixes:
- ignore bits 16:31 in PAGESIZE register
- preserve bits 10:31 in the CONFIG register
- handle ADDESSDEV command failure (so it can be retried)
preallocate 2% of user pages for page tables and MMU structures
and keep them mapped in the VMAP range. this leaves more space
in the KZERO window and avoids running out of kernel memory on
machines with large amounts of memory.
always clean AND invalidate caches before dma read,
never just invalidate as the buffer might not be
aligned to cache lines...
we have to invalidate caches again *AFTER* the dma
read has completed. the processor can bring in data
speculatively into the cache while the dma in in
flight.
we override atag memory on reboot, so preserve
the memsize learned from atag as *maxmem plan9
variable. the global memsize variable is not
needed anymore.
avoid trashing the following atag when zero
terminating the cmdline string.
zero memory after plan9.ini variables.
the Ipselftab is designed to not require locking on read
operation. locking the selftab in ipselftabread() risks
deadlock when accessing the user buffer creates a fault.
remove unused fields from the Ipself struct.
initialize the rate limits when the device gets
bound, not when it is created. so that the
rate limtis get reset to default when the ifc
is reused.
adjust the burst delay when the mtu is changed.
this is to make sure that we allow at least one
full sized packet burst.
make a local copy of ifc->m before doing nil
check as it can change under us when we do
not have the ifc locked.
specify Ebound[] and Eunbound[] error strings
and use them consistently.
remove references to the unused Conv.car qlock.
ipifcregisterproxy() is called with the proxy
ifc wlock'd, which means we cannot acquire the
rwlock of the interfaces that will proxy for us
because it is allowed to rlock() multiple ifc's
in any order. to get arround this, we use canrlock()
and skip the interface when we cannot acquire the
lock.
the ifc should get wlock'd only when we are about
to modify the ifc or its lifc chain. that is when
adding or removing addresses. wlock is not required
when we addresses to the selfcache, which has its
own qlock.
mark reader process pointers with (void*)-1 to mean
not started yet. this avoids the race condition when
media unbind happens before the kproc has set its
Proc* pointer. then we would not post the note and
the reader would continue running after unbind.
etherbind can be simplified by reading the #lX/addr
file to get the mac address, avoiding the temporary
buffer.
when the exclusive monitor is cleared, a event is generated
which we can use to wake up idlehands. that way we do not
need to wait for the next timer interrupt until a cpu takes
work from the run queue.
we did not interpret the $rootdir and $rootspec environment
variables right. $rootdir is what gets bound to / (usually /root)
and $rootspec is the mountspec of /root.
i'v been seeing the error condition described above in the
Slowbulkin comment. so i'm enabling the work arround which
seems to fix the lockup.
in the split transaction case where we want to start the
transaction at frame start, acquire the ctlr lock *before*
checking if we are in the right frame number. so the start
will happen atomically. checking the software ctlr->sofchan
instead of checking the interrupt mask register seems to
be quicker.
setting the haint mask bit for the chan under ctlr lock
in chanio() instead of chanwait() avoids needing to acquire
the ctlr lock twice.
mask wakechan bits with busychan bitmap in interrupt handlers
so we will not try to wake up released chans by accident.
sleep() and tsleep() might get interrupted so we have to
release the split qlock in the split transaction case and
in all cases, make sure to halt the channel before release.
add some common debug functions to dump channel and controller
registers.
we have to ensure that all stores saving the process state
have completed before setting up->mach = nil in the scheduler.
otherwise, another cpu could observe up->mach == nil while
the stores such as the processes p->sched label have not finnished.
attached is a patch to fix receive in the 8169 chip on my thinkpad
A485. i'm not sure why, but the same thing was done in 3d56a0fc4645
for Macv45.
nick
some control transactions can confuse the xhci controller so
much that it even fails to respond to command abort or STOPEP
control command. with no way for us to abort the transaction
but a full controller reset.
we give the controller 5 seconds to abort our initial
transaction and if that fails we wake the recover process
to reset the controller.
thanks mischief for testing.
the temporary stack segment used to be at a fixed address above or
below the user stack. these days, the temp stack is mapped dynamically
by sysexec so TSTKTOP is obsolete.
between being commited to a machno and having acquired the lock, the
scheduler could come in an schedule us on a different processor. the
solution is to have dtmachlock() take a special -1 argument to mean
"current mach" and return the actual mach number after the lock has
been acquired and interrupts being disabled.
using ~IP_DF mask to select offset and "more fragments" bits
includes the evil bit 15. so instead define a constant IP_FO
for the fragment offset bits and use (IP_MF|IP_FO). that way
the evil bit gets ignored and doesnt cause any useless calls
to ipreassemble().
tested on a t43 with igfx and a 1600x1200 t43p screen
what works: lvds, blanking
what doesn't: hwgc (not visible), snarfing edid
untested: vga
based on realemu traces.
unfraglen() had the side effect that it would always copy the
nexthdr field from the fragment header to the previous nexthdr
field. this is fine when we reassemble packets but breaks
fragments that we want to just forward unchanged.
given that we now keep the block size consistent with the
ip packet size, the variable header part of the ip packet
is just: BLEN(bp) - fp->flen == fp->hlen.
fix bug in ip6reassemble() in the non-fragmented case:
reload ih after ip header was moved before writing ih->ploadlen.
use concatbloc() instead of pullupblock().
some protocols assume that Ip4hdr.length[] and Ip6hdr.ploadlen[]
are valid and not out of range within the block but this has
not been verified. also, the ipv4 and ipv6 headers can have variable
length options, which was not considered in the fragmentation and
reassembly code.
to make this sane, ipiput4() and ipiput6() now verify that everything
is in range and trims to block to the expected size before it does
any further processing. now blocklen() and Ip4hdr.length[] are conistent.
ipoput4() and ipoput6() are simpler now, as they can rely on
blocklen() only, not having a special routing case.
ip fragmentation reassembly has to consider that fragments could
arrive with different ip header options, so we store the header+option
size in new Ipfrag.hlen field.
unfraglen() has to make sure not to run past the buffer, and hadle
the case when it encounters multiple fragment headers.
all screen implementations use a Memimage* internally
for the framebuffer, so we can return a shared reference
to its Memdata structure in attachscreen() instead of
a framebuffer data pointer.
this eleminates the softscreen == 0xa110c hack as we
always use shared Memdata* now.
Under the normal close sequence, when we receive a FIN|ACK, we enter
TIME-WAIT and respond to that LAST-ACK with an ACK. Our TCP stack would
send an ACK in response to *any* ACK, which included FIN|ACK but also
included regular ACKs. (Or PSH|ACKs, which is what we were actually
getting/sending).
That was more ACKs than is necessary and results in an endless ACK storm
if we were under the simultaneous close sequence. In that scenario,
both sides of a connection are in TIME-WAIT. Both sides receive
FIN|ACK, and both respond with an ACK. Then both sides receive *those*
ACKs, and respond again. This continues until the TIME-WAIT wait period
elapses and each side's TCP timers (in the Plan 9 / Akaros case) shut
down.
The fix for this is to only respond to a FIN|ACK when we are in TIME-WAIT.
always start the pager kproc in swapinit(), simplifying kickpager().
allow zero conf.nswap and conf.nswppo. avoid allocating the reference
map and iolist arrays in that case.
use ulong for ioptr and iolist indices.
don't panic when writing pages out to the swapfile fails. just
requeue the page in the io transaction list so we will try
again next time executeio() is run or just free the page when
the swap reference was dropped.
remove unused pagersummary() function.
the FCA registers 0x28, 0x2C have been reassigned to
to FEXTNVM on i217, i218 and i219 so add Fnofca flag
and avoid writing the registers.
make link detection more robust on i217 by delaying the
phy status read after link status change by 150ms. we'd
otherwise get a "phy wedged" (power saving state?) and
not update the link status until the next link change.
the max packet size is configured in 1K increments on these chips,
which can result in the card receiving a 10K packet but the
driver having only allocated 9.5K of buffer. this actually caued
pool corruption with i210, i217, i218, i219, i350.
for 82598 and x550, we explicitely round rbsz to avoid similar bugs
in the future, even tho the Rbsz constant was already a multiple of
1K and is not affected by the bug.
segclock() has to be called from hzclock(), otherwise
only processes running on cpu0 would catche the interrupt
and the time delta would be wrong.
lock the segment when allocating Seg->profile as
profile ctl might be issued from multiple processes.
Proc->debug qlock is not sufficient.
Seg->profile can never be freed or reallocated once
set as the timer interrupt accesses it without any
locking.
instead of having application processors spin in mpshutdown()
with mmu on, and be subject to reboot() overriding kernel text
and modifying page tables, park the application processors in
rebootcode idle loop with the mmu off.
linux will send small, unpadded arp packets which may arrive over
wifi, so allow small packets into the bridge and pad any packets that
are too small when going out.
coproc.c generated the instrucitons anew each time,
requiering a i+d cache flush for each operation.
instead, we can speed this up like this:
given that the coprocessor registers are per cpu, we can
assume that interrupts have already been disabled by
the caller to prevent a process switch to another cpu.
we cache the instructions generated in a static append
only buffer and maintain separate end pointers for each
cpu.
the cache flushes only need to be done when new
operations have been added to the buffer.
reference: https://github.com/raspberrypi/firmware/issues/542
procsave(Proc* p)
{
uvlong t;
cycles(&t);
p->pcycles += t;
// TODO: save and restore VFPv3 FP state once 5[cal] know the new registers.
fpuprocsave(p);
/*
* Prevent the following scenario:
* pX sleeps on cpuA, leaving its page tables in mmul1
* pX wakes up on cpuB, and exits, freeing its page tables
* pY on cpuB allocates a freed page table page and overwrites with data
* cpuA takes an interrupt, and is now running with bad page tables
* In theory this shouldn't hurt because only user address space tables
* are affected, and mmuswitch will clear mmul1 before a user process is
* dispatched. But empirically it correlates with weird problems, eg
* resetting of the core clock at 0x4000001C which confuses local timers.
*/
if(conf.nmach > 1)
mmuswitch(nil);
}
- clean dcache before turning off caches and mmu (rebootcode.s)
- use WFE and inter-core mailboxes for cpu startup (rebootcode.s)
- disable SMP during dcache invalidation before enabling caches and mmu (in armv7.s)
- synchronize rebootcode installation
- handle the 1MB identity map in mmu.c (mmuinit1())
- do not overlap CONFADDR with rebootcode, the non boot
processors are parked there.
- make REBOOTADDR physical address
- disable local clock on interrupt to prevent accidents when reenabling
- always regitster local clock interrupt handler, even for cpu0
- simplify microdelay()
- don't mess with watchdog
when clering smi enable bits in the legacy control/status register,
preserve the reserved bits. clear the RW1C bits.
linux code claims intel xhci controller needs a 1ms delay before
accessing any register after reset.
pcienable() puts a device in fully powered on state
and does some missing initialization that UEFI might
have skipped such as I/O and Memory requests being
disabled.
pcidisable() is ment to shutdown the device, but
currently just disables dma to prevent accidents.
on Samsung ATIV Smart PC Pro XE00T1C-A01CL, the EHCI handoff
causes the system to freeze in UEFI mode as soon as we assert
the os semaphore bit.
until a general solution is found, provide these parameters to
disable the handoff for now as it seems to otherwise work fine.
when a prefix is added with the onlink flag clear, packets
towards that prefix needs to be send to the default gateway
so we omit adding the interface route.
when the on-link flag gets changed to 1 later, we add the
interface route.
the on-link flag is sticky, so theres no way to clear it back
to zero except removing and re-adding the prefix.
Once a second rebalance() is called on cpu0 to adjust priorities,
so cpu-bound processes won't lock others out. However it was only
adjusting processes which were running on cpu0. This was observed
to lead to livelock, eg when a higher-priority process spin-waits
for a lock held by a lower priority one.
syntax: reboot!bootfile[!method...]
this echos bootfile to /dev/reboot, causing bootfile kernel
to be started.
when method is given, we first connect to the filesystem and
set bootargs so that bootfile can be loaded from the target
network or local fileserver.
note, when no bootfile is given, this causes the kernel to
reboot to bios.
the end condition port < offset+n could never become
false when offset truncated to 32 bit signed port is
negative. change the condition variables to unsigned
int.
msr's are not byte addressible, so advance reads by
one instead of 8.
sending multicast was broken when ipconfig assigned the 0
address for dhcp as they would wrongly classified as Runi.
this could happen when we do slaac and dhcp in parallel,
breaking the sending of router solicitations.
now handle the supported rates element properly, only
providing the intersecting set of rates that the bss
advertises and what the driver supports, putting the
basic rates first.
also avoid using usupported rates.
nobody passes us the "RSD PTR " address when doing multiboot/kexec
on UEFI systems. so we search for it manually in the ACPI reserved
area as indicated in the e820 memory map.
we did not apply the special case to store 0xFFFF (-0)
in the checksum field when the checksum calculation
returned zero. we survived this for v4 as RFC768 states:
> If the computed checksum is zero, it is transmitted as
> all ones (the equivalent in one's complement arithmetic).
>
> An all zero transmitted checksum value means that the
> transmitter generated no checksum (for debuging or for
> higher level protocols that don't care).
for ipv6 however, the checksum is not optional and receivers
would drop packets with a zero checksum.
during dhcp, ipconfig assigns the null address :: which makes
ipforme() return Runi for any destination, which can trigger
arp resolution when we attempt to reply. so have v4local()
skip the null address and have sendarp() check the return
status of v4local(), avoing the spurious arp requests.
closeconv() calls ipifcremmulti() like:
while((mp = cv->multi) != nil)
ipifcremmulti(cv, mp->ma, mp->ia);
so we have to defer freeing the entry after doing:
if((lifc = iplocalonifc(ifc, ia)) != nil)
remselfcache(f, ifc, lifc, ma);
which accesses the otherwise free'd ia and ma arguments.
on HZ 100 systems like pc and pc64, the minium sleep time
was 10ms, which is quite high. the cap isnt really needed
as arch specific timerset() enforces its own limit, but on
a higher resolution.
background:
from Charles Forsyth:
I haven't really got an opinion on it. The 10ms interval was first used on
machines that were much slower.
I thought someone did set HZ to a bigger value, partly to support better
in-kernel timing. I haven't done it because I never had a need for it.
If I were doing (say) protocol implementation in user mode, I'd certainly
reconsider. Sleep itself forces at best ms granularity,
and for some applications that's too big.
initial mail from qwx raising the issue:
> Hello,
>
> I found out recently that sleep(2)'s resolution on 386 and 9front's amd64
> kernel is 10 ms rather than 1 ms. The reason is that on those kernels,
> HZ is set to 100 rather than say 1000. In syssleep, we get 1 tich every
> 10 ms.
>
> What is unclear is why.
>
> To paraphrase cinap_lenrek's answer to my question:
>
> In syssleep:
> if(ms < TK2MS(1))
> ms = TK2MS(1);
> tsleep(&up->sleep, return0, 0, ms);
>
> "TK2MS(1)" can be replaced with just "1", and the arch specific
> timerset() routine would do its own capping of the period if it's too
> small for the timer resolution, and make better decisions based on what
> the minimum timer period should be given the latency overhead of the
> given arch's interrupt handling and performance characteristics.
>
> Alternatively, HZ could be raised to 500 or 1000.
>
> It seems it's just trying to prevent excessive context switches and
> interrupts, but it seems somewhat arbitrary. A ton of syscalls can be
> done in 1 ms, and it's the lowest we can go without changing the unit.
>
>
> What do you think?
>
> Thanks in advance,
>
> qwx
devdir internally replicates the qid in ther perm stat field
already and the practice of explicitely passing just causing
confusion when done inconsistently.
this driver makes regions of physical memory accessible as a disk.
to use it, ramdiskinit() has to be called before confinit(), so
that conf.mem[] banks can be reserved. currently, only pc and pc64
kernel use it, but otherwise the implementation is portable.
ramdisks are not zeroed when allocated, so that the contents are
preserved across warm reboots.
to not waste memory, physical segments do not allocate Page structures
or populate the segment pte's anymore. theres also a new SG_CHACHED
attribute.
fpurestore() unconditionally changed fpstate to FPinactive when
the kernel used the FPU. but in the FPinit case, the registers are
not saved by mathemu(), resulting in all zero initialized registers
being loaded once userspace uses the FPU so the process would have
wrong MXCR value.
the index overflow check was wrong with using shifted value.
there appears to be confusion about the refresh flag of arpenter().
when we get an arp reply, it makes more sense to just refresh
waiting/existing entries instead creating a new one as we do not
know if we are going to communicate with the remote host in the future.
when we see an arp request for ourselfs however, we want to always
enter the senders address into the arp cache as it is likely the sender
attempts to communicate with us and with the arp entry, we can reply
immidiately.
reject senders from multicast/broadcast mac addresses. thats just silly.
we can get rid of the multicast/broadcast ip checks in ethermedium and
do it in arpenter() instead, checking the route type for the target to
see if its a non unicast target.
enforce strict separation of interface's arp entries by passing a
rlock'd ifc explicitely to arpenter, which we compare against the route
target interface. this makes sure arp/ndp replies only affect entries for
the receiving interface.
handle neighbor solicitation retransmission in nbsendsol() only. that is,
both ethermedium and the rxmitproc just call nbsendsol() which maintains
the timers and counters and handles the rotation on the re-transmission
chain.
no need to rlock ifc in targetttype() as we are called from icmpiput6(),
which the ifc rlocked.
for icmpadvise, the lport, destination *AND* source have to match.
a connection gets a packet when the packets destination matches the source
*OR* the packets source matches the destination.
v4lookup() and v6lookup() do not acquire the routelock, so it is
possible to hit routes that are on the freelist. to detect these,
we set ref to 0 and check for this case, avoiding overriding the ifc.
re-evaluate routes when the ifcid on the route hint doesnt match.
rfc4861 7.2.2:
If the source address of the packet prompting the solicitation is the
same as one of the addresses assigned to the outgoing interface, that
address SHOULD be placed in the IP Source Address of the outgoing
solicitation.
this change adds ndbsendsol() which handles the source address selection
and also handles the arp table locking; avoiding access to the arp entry
after the arp table is unlocked.
cleanups:
- use ipmove() instead of memmove().
- useless extern qualifiers
ipv4local() and ipv6local() now take remote address argument,
returning the closest local address to the source. this
implements the standartized source address selection rules
instead of just returning the first local v4 or v6 address.
the source address selection was broken for esp, rudp an udp,
blindly assuming ifc->lifc->local being a valid v4 address.
use ipv6local() instead.
the v6 routing code used to lookup source address route to
decide to drop the packet instead of checking the interface
on the destination route.
factor out the route hint from Conv and put it in Routehint
structure. avoiding stack bloat in v4 routing. implement the
same trick for v6 avoiding second route lookup in ipoput6.
fix memory leak in icmpv6 router solicitation handling.
remove old unfinished handling of multiple v6 routers. should
implement source specific routes instead.
avoid duplication, use common convipvers() function.
use isv4() instead of memcmp v4prefix.
everything was broken. strting with hsinit not even chaining
the itd's into a ring. followed by broken buffer pointer pages.
finally, the interrupt handler's read transaction length
calculation was completely bugged, using the *FRAME* index
to access descriptors csw[] fields and not reseting tdi->ndata
thru the loop.
minor stuff:
iso->data needs to be freed with ctlr->dmafree()
put ival in iso->ival so ctl message cannot override the endpoints
pollival and screw up deallocation.
we allow devether to create ethernet cards on attach. this is useull
for virtual cards like the sink driver, so we can create a sink
by simply: bind -a '#l2:sink ea=112233445566' /net
the detach routine was never called, so remove it from the few drivers
that attempted to implement it.
the only architecture dependence of devether was enabling interrupts,
which is now done at the end of the driver's reset() function now.
the wifi stack and dummy ethersink also go to port/.
do the IRQ2->IRQ9 hack for pc kernels in intrenabale(), so not
every caller of intrenable() has to be aware of it.
the td index "x" was incremented twice, once in for loop
and in the body expression. so r->rp only got updated
every second completion. this is wrong, but harmless.
flushing tlb once the index wraps arround is not enougth
as in use pte's can be speculatively loaded. so instead
use invlpg() and explicitely invalidate the tlb of the
page mapped.
this fixes wired mount cache corruption for reads approaching
2MB which is the size of the KMAP window.
invlpg() was broken, using wrong operand type.
windows 7 just drops the default router when it tries to
probe for router reachability but gets a neighbor avertisement
from the router with the router bit clear.
so set the R-flag when sendra is active, which implies that
we are a router.
the driver doesnt implement multicast filter, but just turns
on promiscuous mode when a multicast address is added. but this
breaks when one actually enables and then disables promiscuous
mode with say, running snoopy.
we have to keep promisc mode active as long as multicast table
is not empty.
broadcast traffic was received back on the wire causing
duplicate address detection to break with dmat proy as
the rewritten broadcasts where observable.
the fix is to just ignore packets from ourselfs received
from the air. devether already handles loopback.
when kernel memory is exhausted, rtl8169replenish() can fail
to plant more receive descriptors and rtl8169receive() would
run over the receive tail and crash on the nil ctlr->rb[x].
rtl8169receive() is called on "Receive Descriptor Unavailable"
and "Packet Underrun" so we will try to replenish descriptors
in the beginning first in case memory was exhausted and memory
is available again and make sure not to run over the tail.
in case we continue to send traffic (like ping) with the ap gone,
the sending would keep updating bss->lastseen which prevents the
timeout to happen to switch bss.
- remove arbitrary limits on screen size, just check with badrect()
- post resize when physgscreenr is changed (actualsize ctl command)
- preserve physgscreenr across softscreen flag toggle
- honor panning flag on resize
- fix nil dereference in panning ctl command when scr->gscreen == nil
- use clipr when drawing vga plan 9 console (vgascreenwin())
to capture bios and bootloader messages, convert the contents
on the screen to kmesg.
on machines without legacy cga, the cga registers read out as
0xFF, resuting in out of bounds cgapos. so set cgapos to 0 in
that case.