icansleep() violates the lock ordering due to the following cases:
rbfree(): ilock(Rbpool.Lock) -> wakeup(): spli(), lock(Rbpool.Rendez)
sleep(): splhi(), lock(Rbpool.Rendez) -> icansleep(): ilock(Rbpool.Lock)
erik fixed this moving the wakeup() out of the ilock() in rbfree(),
but i think it is an error to try acquiering a ilock in sleeps wait
condition function in general.
so this is what we do:
in the icansleep() function, we check for the *real* event we care about;
that is, if theres a buffer available in the Rbpool. this is to handle
the case when rbfree() makes a buffer available *before* it sees us
setting p->starve = 1.
p->starve is now just used to gate rbfree() from calling wakeup() as
an optimization.
this might cause spurious wakeups but they are not a problem. missed
wakeups is the thing we have to prevent.
this patch consists of two bits of work submitted as one
patch.
the first bit fixed a "pacing" problem, where a tcp connection
rate-limited by the reading process would experience 10%
of the expected throughput, and could even get into live
lock. it was noticed at the time of this initial work that
the stack often sent tiny grams. some good bits from nix'
original tcp were merged in. the test program
/n/sources/contrib/quanstro/tcptest.c
will verify that under most conditions, a reader-paced connection
now gets the expected throughput. expected arguments
would be
tcptest -s1 -n 5000 -l
the second bit is a first step in preparing tcp to handle
modest (1-2MB) bandwidth-delay products. the strategy
was to completely implement NewReno. the testing network
was a 7/35/70ms by 100Mbit wan emulator with 0/.05/.1% loss.
here are the performance comparisons from the changes after
the first round "old" to the submitted patch "new". the
smallest improvement was 80%, the largest was 11x.
loss% rtt old new
0.10 7 4.40 7.85
0.10 35 0.88 1.79
0.10 70 0.47 0.84
0.05 7 4.80 9.38
0.05 35 1.00 2.02
0.05 70 0.52 1.77
0.01 7 5.33 11.87
0.01 35 1.14 10.97
0.01 70 0.54 4.75
0.00 7 4.49 11.92
0.00 35 1.04 11.35
0.00 70 0.58 10.56
since the diff is not very easy to read, i wrote a small
paper detailing the changes
http://www.quanstro.net/plan9/tcp/tcp.pdf
- erik
the driver should work for standard sdhc
(see http://www.sdcard.org/) controllers,
but matches for the ricoh controller only
as it was the only one i have for testing.
it could happen that we unblanked while vesaproc was
currently blanking (when manually blanking using vgactl
for example). the wakeup of the unblank is lost.
disabled LAPIC entries overwrote the bootstrap processor
apic causing the machine panic with: "no bootstrap processor".
(problem with lenovo X230)
just ignore entries that are disabled or collide with
entries already found. (should not happen)
the issues with the previous tsc change where not related to the tsc
but where problems with timesync using an old frequency file. a
patch to fix timesync was commited, so so we reintroduce the *notsc=
again.
aux/wpa needs to reset its reply counter on deassociation to
properly restart key negotiation. we signal this with a zero
length read on the connections filtering for eapol protocol.
allow the driver to associate the node with a new aid right after
we receive the association response, not just when we transmit
a packet which usualy does not happen as eapol is initiated by
the access point so there are no transmit calls. we just call
transmit from the wifiproc with a nil block to introduce the node.
the splhi() and apictimerlock in the Mach isnt neccesary, as
portclock always holds the ilock of the per mach timer queue
when calling timerset().
as fastticks() and the portclock timers are all handled on a
per processor basis, i think it should be theoretically possible
for the lapics to run at different frequencies. so we measure
the lapic frequency for each individual lapic and keep them in
a per processor Apictimer structure instead of assuming them
to be the same.
loading the divider before programming one shot mode *sometimes*
gives the wrong frequency. (X200s got 192Mhz vs. 266Mhz, after
5 boot attempts)
also reload the divider after programming periodic mode. (from
http://wiki.osdev.org/APIC_timer)
we previously used tsc only on cpu kernel. now that
we use it on terminal kernel too, there might be some
surprises ahead.
so make it possible to disable tsc for machines where
the tsc rate is not kept constant across cores or is
dynamically adjusted by power management.
- same problem with wstat, if we error then owner has been already updated...
- avoid smalloc while holding qlock in wstat, do it before
- pikeshedd style...
- wstat would half update the Srv data structure if name was too long
- srvname() walked the linked srv list without holding srv qlock
- dont access sp->chan while not holding srv qlock in srvopen()
- dont modify sp->owner while not holding srv qlock in srvcreate()
- avoid smalloc() allocations while holding srv qlock
- style pikeshedding, sorry
on usb ethernet, it can happen that we read truncated packets smaller
than the ethernet header size. this produces a warning in pullupblock()
later like: "pullup negative length packet, called from 0xf0199e46"