From richard:
A couple of patches applied yesterday should make debugging on ARM a
bit more reliable. Using db or acid on ARM, you may have noticed that
a program being debugged would sometimes execute through a breakpoint
without stopping, or run away while being single stepped. It turns out,
as often happens, that one symptom had two separate causes. For details:
/n/sources/patch/applied/5db-condcode/readme
/n/sources/patch/applied/arm-bkpt-cond/readme
To take advantage of the patches, rebuild libmach.a, then acid and db.
On machines with a kw kernel (sheevaplug et al), you'll also want to
rebuild /arm/9plug; otherwise breakpoints will stop working at all.
The new 9plug will, however, still work with the old libmach; and
the bcm and teg2 kernels are already compatible with the new libmach.
the heuristics that limits kernel memory on a cpu server to
a fixed amout (64MB + size for page tables) makes using devdraw
impractical.
if *imagemaxmb= is specified, we can assume that the draw device
will be used so we want to get a reasonable amount (30% default)
of kernel memory.
link status not working on 82567 was due to wrong phy number
used. instead of hardcoding the phy numbers, probe the phys
by reading id1 and id2 registers (code stolen from ethermii).
on the 82567, reading any phy register just gives 0 back.
however, the card works just fine and no action is required
to (re-)start auto negotiation. so we add maclproc() which just
reads the speed setting and link status from the mac status
register instead of reading the phy registers.
we'v probably seen this symptom on other cards (link: 0) like
82566. we should test if we can make link status work on
these cards as well by just using the maclproc().
rx pool exhaustion causes the system to deadlock when netbooted.
queue management should (etheroq) already makes sure the systen
can keep up with the data thowing away buffers.
icansleep() violates the lock ordering due to the following cases:
rbfree(): ilock(Rbpool.Lock) -> wakeup(): spli(), lock(Rbpool.Rendez)
sleep(): splhi(), lock(Rbpool.Rendez) -> icansleep(): ilock(Rbpool.Lock)
erik fixed this moving the wakeup() out of the ilock() in rbfree(),
but i think it is an error to try acquiering a ilock in sleeps wait
condition function in general.
so this is what we do:
in the icansleep() function, we check for the *real* event we care about;
that is, if theres a buffer available in the Rbpool. this is to handle
the case when rbfree() makes a buffer available *before* it sees us
setting p->starve = 1.
p->starve is now just used to gate rbfree() from calling wakeup() as
an optimization.
this might cause spurious wakeups but they are not a problem. missed
wakeups is the thing we have to prevent.
this patch consists of two bits of work submitted as one
patch.
the first bit fixed a "pacing" problem, where a tcp connection
rate-limited by the reading process would experience 10%
of the expected throughput, and could even get into live
lock. it was noticed at the time of this initial work that
the stack often sent tiny grams. some good bits from nix'
original tcp were merged in. the test program
/n/sources/contrib/quanstro/tcptest.c
will verify that under most conditions, a reader-paced connection
now gets the expected throughput. expected arguments
would be
tcptest -s1 -n 5000 -l
the second bit is a first step in preparing tcp to handle
modest (1-2MB) bandwidth-delay products. the strategy
was to completely implement NewReno. the testing network
was a 7/35/70ms by 100Mbit wan emulator with 0/.05/.1% loss.
here are the performance comparisons from the changes after
the first round "old" to the submitted patch "new". the
smallest improvement was 80%, the largest was 11x.
loss% rtt old new
0.10 7 4.40 7.85
0.10 35 0.88 1.79
0.10 70 0.47 0.84
0.05 7 4.80 9.38
0.05 35 1.00 2.02
0.05 70 0.52 1.77
0.01 7 5.33 11.87
0.01 35 1.14 10.97
0.01 70 0.54 4.75
0.00 7 4.49 11.92
0.00 35 1.04 11.35
0.00 70 0.58 10.56
since the diff is not very easy to read, i wrote a small
paper detailing the changes
http://www.quanstro.net/plan9/tcp/tcp.pdf
- erik
the driver should work for standard sdhc
(see http://www.sdcard.org/) controllers,
but matches for the ricoh controller only
as it was the only one i have for testing.
it could happen that we unblanked while vesaproc was
currently blanking (when manually blanking using vgactl
for example). the wakeup of the unblank is lost.
disabled LAPIC entries overwrote the bootstrap processor
apic causing the machine panic with: "no bootstrap processor".
(problem with lenovo X230)
just ignore entries that are disabled or collide with
entries already found. (should not happen)
the issues with the previous tsc change where not related to the tsc
but where problems with timesync using an old frequency file. a
patch to fix timesync was commited, so so we reintroduce the *notsc=
again.
aux/wpa needs to reset its reply counter on deassociation to
properly restart key negotiation. we signal this with a zero
length read on the connections filtering for eapol protocol.
allow the driver to associate the node with a new aid right after
we receive the association response, not just when we transmit
a packet which usualy does not happen as eapol is initiated by
the access point so there are no transmit calls. we just call
transmit from the wifiproc with a nil block to introduce the node.
the splhi() and apictimerlock in the Mach isnt neccesary, as
portclock always holds the ilock of the per mach timer queue
when calling timerset().
as fastticks() and the portclock timers are all handled on a
per processor basis, i think it should be theoretically possible
for the lapics to run at different frequencies. so we measure
the lapic frequency for each individual lapic and keep them in
a per processor Apictimer structure instead of assuming them
to be the same.
loading the divider before programming one shot mode *sometimes*
gives the wrong frequency. (X200s got 192Mhz vs. 266Mhz, after
5 boot attempts)
also reload the divider after programming periodic mode. (from
http://wiki.osdev.org/APIC_timer)
we previously used tsc only on cpu kernel. now that
we use it on terminal kernel too, there might be some
surprises ahead.
so make it possible to disable tsc for machines where
the tsc rate is not kept constant across cores or is
dynamically adjusted by power management.