/n/bugs/open/multicasts_and_udp_buffers
http://bugs.9front.org/open/multicasts_and_udp_buffers/readmemichal@Lnet.pl
I have ported my small MPEG-TS analisis tool to Plan9.
To allow this application working I had to fix a bug in the kernel IPv4 code and increase UDP input buffer.
Bug is related to listening for IPv4 multicast traffic. There is no problem if you listen for only one group or multiple groups with different UDP ports. This works:
Write to UDP ctl:
anounce PORT
addmulti INTERFACE_ADDR MULTICAST_ADDR
headers
and you can read packets from data file.
You need to set headers option because otherwise every UDP packet for MULTICAST_ADDR!PORT is treat as separate connection. This is a bug and should be fixed too, but I didn't tried it.
There is a problem when you need to receive packets for multiple multicast groups. Usually the same destination port is used by multiple streams and above sequence of commands fails for second group because the port is the same.
Simple and probably non-intrusive fix is adding "|| ipismulticast(addr)" to if statement at /sys/src/9/ip/devip.c:861 line:
if(ipforme(c->p->f, addr) || ipismulticast(addr))
This fixes the problem and now you can use the following sequence to listen for multiple multicast groups even if they all have the same destination port:
announce MULTICAST_ADDR!PORT
addmulti INTERFACE_ADDR MULTICAST_ADDR
headers
After that my application started working but signals packet drops at >2 Mb/s input rate. The same is reported by kernel netlog. Increase capacity of UDP connection input queue fixes this problem /sys/src/9/ip/udp.c:153
c->rq = qopen(512*1024, Qmsg, 0, 0);
--
Michał Derkacz
- fix missing runlock(ifc) when ifcid != a->ifcid in rxmitsols() (thanks erik quanstro)
- don't leak packets when transfering blocks from arp entry hold list to droplist
- free rest of droplist when bwrite() errors in arpenter(), remove useless checks (ifc != nil)
- free arp entry hold list from cleanarpent()
- consistent use of nil for pointers
for incoming connection, we used s->laddr to lookup the interface
for the incoming call, but this does not work when the announce
address is tcp!*!123, then s->laddr is all zeros "::". instead,
use the incoming destination address for interface mtu lookup.
thanks mycroftix for troubleshooting!
yoann padioleaus report on 9fans:
> I think I’ve found a bug in the network stack.
> in 9/ip/ip.h there is
> struct Ipht
> {
> Lock;
> Iphash *tab[Nipht];
> };
>
> where Night is 521,
>
> but then in 9/ip/ipaux.c there is
>
> ulong
> iphash(uchar *sa, ushort sp, uchar *da, ushort dp)
> {
> return ((sa[IPaddrlen-1]<<24) ^ (sp << 16) ^ (da[IPaddrlen-1]<<8) ^ dp ) % Nhash;
> }
>
> where Nhash is just 64,
David du Colombier wrote:
> The slowness issue only appears on the loopback, because
> it provides a 16384 MTU.
>
> There is an old bug in the Plan 9 TCP stack, were the TCP
> MSS doesn't take account the MTU for incoming connections.
>
> I originally fixed this issue in January 2015 for the Plan 9
> port on Google Compute Engine. On GCE, there is an unusual
> 1460 MTU.
>
> The Plan 9 TCP stack defines a default 1460 MSS corresponding
> to a 1500 MTU. Then, the MSS is fixed according to the MTU
> for outgoing connections, but not incoming connections.
>
> On GCE, this issue leads to IP fragmentation, but GCE didn't
> handle IP fragmentation properly, so the connections
> were dropped.
>
> On the loopback medium, I suppose this is the opposite issue.
> Since the TCP stack didn't fix the MSS in the incoming
> connection, the programs sent multiple small 1500 bytes
> IP packets instead of large 16384 IP packets, but I don't
> know why it leads to such a slowdown.
other operating systems always set the "don't fragment" bit
in ther outgoing ipv4 packets causing us to unnecesarily
call ip4reassemble() looking for a fragment reassembly queue.
the change excludes the "don't fragment" bit from the test
so we now call ip4reassemble() only when the "more fragmens"
bit is set or a fragment offset other than zero is given.
this optimization was discovered from akaros.
devip can only handle Maskconv+1 conversations per
protocol depending on how many bits it uses in the
qid to encode the conversation number.
we check this when the protocol gets registered.
if we do not do this, the kernel will mysteriously
panic when the conversaion numbers collide which
took some time to debug.
after running ip/ipconfig -6, we are unable to ping our
own link-local address and the arp daemon sends out useless
neighbor solicitation requests to itself. this change
adds an arp entry for our ipv6 address. however, this
must not be done for tentative interface configuration.
allocate the Iplifc structure on the stack instead.
i assuming that it was allocated on heap in fear of
causing stack oveflow. on 386, this adds arround
88 bytes on the stack but it doesnt seem to cause
any trouble. (checked with poolcheck after ctl write)
ipifcunbind() could error out from ipifcremlifc() and Medium.unbind()
*after* decrementing ifc->conv->inuse! move the decrement after
calling these functions.
make ipifcremlifc() never raise error but return error string.
the only places where it could error is when it calls into
medium functions like Medium.remroute() and Medium.remmulti().
Ignore these errors as they could happen when the ethernet driver
crashed (think imported ethernet device or usb ethernet
in userspace), so we will be able to unbind.
add waserror() handlers as neccesary to deal with errors from
Medium.addmulti(), Medium.areg() and arpenter() to properly
unlock the data structures.
this patch consists of two bits of work submitted as one
patch.
the first bit fixed a "pacing" problem, where a tcp connection
rate-limited by the reading process would experience 10%
of the expected throughput, and could even get into live
lock. it was noticed at the time of this initial work that
the stack often sent tiny grams. some good bits from nix'
original tcp were merged in. the test program
/n/sources/contrib/quanstro/tcptest.c
will verify that under most conditions, a reader-paced connection
now gets the expected throughput. expected arguments
would be
tcptest -s1 -n 5000 -l
the second bit is a first step in preparing tcp to handle
modest (1-2MB) bandwidth-delay products. the strategy
was to completely implement NewReno. the testing network
was a 7/35/70ms by 100Mbit wan emulator with 0/.05/.1% loss.
here are the performance comparisons from the changes after
the first round "old" to the submitted patch "new". the
smallest improvement was 80%, the largest was 11x.
loss% rtt old new
0.10 7 4.40 7.85
0.10 35 0.88 1.79
0.10 70 0.47 0.84
0.05 7 4.80 9.38
0.05 35 1.00 2.02
0.05 70 0.52 1.77
0.01 7 5.33 11.87
0.01 35 1.14 10.97
0.01 70 0.54 4.75
0.00 7 4.49 11.92
0.00 35 1.04 11.35
0.00 70 0.58 10.56
since the diff is not very easy to read, i wrote a small
paper detailing the changes
http://www.quanstro.net/plan9/tcp/tcp.pdf
- erik
on usb ethernet, it can happen that we read truncated packets smaller
than the ethernet header size. this produces a warning in pullupblock()
later like: "pullup negative length packet, called from 0xf0199e46"
Fsprotoclone() is not supposed to raise error, but return nil.
ipopen() seemed to assume otherwise as it setup error label
before calling Fsprotoclone(). fix ipopen(), make Fsprotoclone()
return nil instead of raising error.
Fsprotocone():
qopen() and qbypass() can fail and return nil, so make sure
the connection was not partially created by checking if read
and write queues have been setup by the protocol create hanler.
on error, free any resources of the partial connection and
error out.
netlogopen(): check malloc() error.