plan9fox

Author	SHA1	Message	Date
cinap_lenrek	ca313087c1	ip(3): use flags instead of tag for 8 column route add/remove This avoids ipconfig having to explicitely specify the tag when we want to set route type, as the tag can be provided implicitely thru the "tag" command.	2022-03-14 18:45:27 +00:00
cinap_lenrek	6e4a1fda8c	devip: allow setting the "trans" flag on a logical interface This makes the interface route have the "t"-flag, which causes packets routed to the interface to get source translated.	2022-03-13 17:16:54 +00:00
cinap_lenrek	d2a7d88662	devip: implement network address translation routes This adds a new route "t"-flag that enables network address translation, replacing the source address (and local port) of a forwarded packet to one of the outgoing interface. The state for a translation is kept in a new Translation structure, which contains two Iphash entries, so it can be inserted into the per protocol 4-tuple hash table, requiering no extra lookups. Translations have a low overhead (~200 bytes on amd64), so we can have many of them. They get reused after 5 minutes of inactivity or when the per protocol limit of 1000 entries is reached (then the one with longest inactivity is reused). The protocol needs to export a "forward" function that is responsible for modifying the forwarded packet, and then handle translations in its input function for iphash hits with Iphash.trans != 0. This patch also fixes a few minor things found during development: - Include the Iphash in the Conv structure, avoiding estra malloc - Fix ttl exceeded check (ttl < 1 -> ttl <= 1) - Router should not reply with ttl exceeded for multicast flows - Extra checks for icmp advice to avoid protocol confusions.	2022-03-12 20:53:17 +00:00
cinap_lenrek	7289f371a0	devip: dont hold ifc wlock during medium bind/unbind Wlock()'ing the ifc causes a deadlock with Medium bind/unbind as the routine can walk /net, while ndb/dns or ndb/cs are currently blocked enumerating /net/ipifc/*. The fix is to have a fake medium, called "unbound", that is set temporarily during the call of Medium bind and unbind. That way, the interface rwlock can be released while bind/unbind is in progress. The ipifcunbind() routine will refuse to unbind a ifc that is currently assigned to the "unbound" medium, preventing any accidents.	2022-02-16 22:31:31 +00:00
cinap_lenrek	b51d7ca3ba	devip: improve tcp error handling for ipoput The ipoput4() and ipoput6() functions can raise an error(), which means before calling sndrst() or limbo() (from tcpiput()), we have to get rid of our blist by calling freeblist(bp). Makse sure to set the Block pointer to nil after freeing in ipiput() to avoid accidents. Fix wrong panic string in sndsynack, and make any sending functions like sndrst(), sndsynack() and tcpsendka() return the value of ipoput(), so we can distinguish "no route" error. Add a Enoroute[] string constant. Both htontcp4() and htontcp6() can never return nil, as they will allocate new or resize the existing block. Remove the misleading error handling code that assumes that it can fail. Unlock proto on error in limborexmit() which can be raised from sndsynack() -> ipoput() -> error(). Make sndsynack() pass a Routehint pointer to ipoput*() as it already did the route lookup, so we dont have todo it twice.	2021-10-11 15:55:46 +00:00
cinap_lenrek	ad1ab7089d	devip: add comment to ip.h explaining Routehint struct	2021-10-11 12:16:21 +00:00
cinap_lenrek	365e63b36a	devip: properly rlock() the routelock for v4lookup() and v6lookup() i'm not confident about mutating the route tree pointers and have concurrent readers walking the pointer chains. given that most route lookups are bypassed now for non-routing case and we are not building a high performance router here, lets play it safe.	2021-10-10 14:27:08 +00:00
cinap_lenrek	e687d25478	devip: use top bit (type) \| subnet-id for V6H() route hash macro theres no structure in the lower 32 bits of an ipv6 address. use the top bit to distinguish special stuff like multicast and link-local addresses, and use the 16-bit subnet-id bits for the rest.	2021-10-10 14:22:14 +00:00
cinap_lenrek	1a6324970d	devip: cache arp entry in Routehint Instead of having to do an arp hash table lookup for each outgoing ip packet, forward the Routehint pointer to the medium's bwrite() function and let it cache the arp entry pointer. This avoids route and arp hash table lookups for tcp, il and connection oriented udp. It also allows us to avoid multiple route and arp table lookups for the retransmits once an arp/neighbour solicitation response arrives.	2021-10-09 18:26:16 +00:00
cinap_lenrek	6ebb8b9e35	devip: use better hashipa() macro, use RWlock for arp cache	2021-10-03 15:58:58 +00:00
cinap_lenrek	d43d79bda4	devip: implement ipv4 arp timeout with icmp host unreachable notification The IPv4 ARP cache used to indefinitely buffer packets in the Arpent hold list. This is bad in case of a router, because it opens a 1 second (retransmit time) window to leak all the to be forwarded packets. This change makes the ipv4 arp code path similar to the IPv6 neighbour solicitation path, using the retransmit process to time out old entries (after 3 arp retransmits => 3 seconds). A new function arpcontinue() has been added that unifies the point when we schedule the (ipv6 sol retransmit) / (ipv4 arp timeout) and reduce the hold queue to the last packet and unlock the cache. As a bonus, we also now send a icmp host unreachable notification for the dropped packets.	2021-09-26 18:43:29 +00:00
cinap_lenrek	5474646164	devip: implement ipv6 support in ipmux packet filter Added a ver= field to the filter to distinguish the ip version. By default, a filter is parsed as ipv6, and after parsing proto, src and dst fields are converted to ipv4. When no ver= field is specified, a ip version filter is implicitely added and both protocols are parsed. This change also gets rid of the fast compare types as the filed might not be aligned correctly in the packet. This also fixes the ifc= filter, as we have to check any local address.	2020-06-07 16:56:01 +02:00
cinap_lenrek	85ffa283f6	devip: fix parseipmask() prototype in ip.h	2020-06-07 16:45:55 +02:00
cinap_lenrek	e46000f076	devip: pick less surprising interface address in header for incoming UDP packets We used to just return the first address of the incoming interface regardless of if the address matches the source ip type and scope. This change tries to find the best interface address that will match the source ip so it can be used as a source address when replying to the packet.	2020-06-06 23:46:01 +02:00
cinap_lenrek	27fc79b04b	devip: fix ifc recursive rlock() deadlock ipiput4() and ipiput6() are called with the incoming interface rlocked while ipoput4() and ipoput6() also rlock() the outgoing interface once a route has been found. it is common that the incoming and outgoing interfaces are the same recusive rlocking(). the deadlock happens when a reader holds the rlock for the incoming interface, then ip/ipconfig tries to add a new address, trying to wlock the interface. as there are still active readers on the ifc, ip/ipconfig process gets queued on the inteface RWlock. now the reader finds the outgoing route which has the same interface as the incoming packet and tries to rlock the ifc again. but now theres a writer queued, so we also go to sleep waiting four outselfs to release the lock. the solution is to never wait for the outgoing interface rlock, but instead use non-queueing canrlock() and if it cannot be acquired, discard the packet.	2020-05-10 22:51:40 +02:00
cinap_lenrek	f12744b5db	devip: fix packet loss when interface is wlocked to prevent deadlock on media unbind (which is called with the interface wlock()'ed), the medias reader processes that unbind was waiting for used to discard packets when the interface could not be rlocked. this has the unfortunate side effect that when we change addresses on a interface that packets are getting lost. this is problematic for the processing of ipv6 router advertisements when multiple RA's are getting received in quick succession. this change removes that packet dropping behaviour and instead changes the unbind process to avoid the deadlock by wunlock()ing the interface temporarily while waiting for the reader processes to finish. the interface media is also changed to the mullmedium before unlocking (see the comment).	2020-01-05 18:20:47 +01:00
cinap_lenrek	b638c7753d	devip: use the routing table for local source ip address selection when making outgoing connections, the source ip was selected by just iterating from the first to the last interface and trying each local address until a route was found. the result was kind of hard to predict as it depends on the interface order. this change replaces the algorithm with the route lookup algorithm that we already have which takes more specific desination and source prefixes into account. so the order of interfaces does not matter anymore.	2019-11-10 19:50:46 +01:00
cinap_lenrek	5993760e14	devip: fix permission checking permission checking had the "other" and "owner" bits swapped plus incoming connections where always owned by "network" instead of the owner of the listening connection. also, ipwstat() was not effective as the uid strings where not parsed. this fixes the permission checks for data/ctl/err file and makes incoming connections inherit the owner from the listening connection. we also allow ipwstat() to change ownership to the commonuser() or anyone if we are eve. we might have to add additional restrictions for none at a later point...	2019-09-21 23:28:37 +02:00
cinap_lenrek	197ff3ac2f	devip: if the server does not support TCP ws option, disable window scaling (thanks joe9) if the server responds without a window scale option in its syn-ack, disable window scaling alltogether as both sides need to understand the option.	2019-05-22 22:20:31 +02:00
cinap_lenrek	157d7ebdbd	devip: do not lock selftab in ipselftabread(), remove unused fields from Ipself the Ipselftab is designed to not require locking on read operation. locking the selftab in ipselftabread() risks deadlock when accessing the user buffer creates a fault. remove unused fields from the Ipself struct.	2019-05-12 01:20:21 +02:00
cinap_lenrek	333c320204	devip: reset speed and delay on bind, adjust burst on mtu change, ifc->m nil check, consistent error strings initialize the rate limits when the device gets bound, not when it is created. so that the rate limtis get reset to default when the ifc is reused. adjust the burst delay when the mtu is changed. this is to make sure that we allow at least one full sized packet burst. make a local copy of ifc->m before doing nil check as it can change under us when we do not have the ifc locked. specify Ebound[] and Eunbound[] error strings and use them consistently.	2019-05-11 17:22:33 +02:00
cinap_lenrek	7186be0424	devip: make sure ifc is bound in add6 ctl command	2019-05-11 14:54:10 +02:00
cinap_lenrek	3a0d5f41a8	devip: remove unused c->car qlock, avoid potential deadlock in ipifcregisterproxy() remove references to the unused Conv.car qlock. ipifcregisterproxy() is called with the proxy ifc wlock'd, which means we cannot acquire the rwlock of the interfaces that will proxy for us because it is allowed to rlock() multiple ifc's in any order. to get arround this, we use canrlock() and skip the interface when we cannot acquire the lock. the ifc should get wlock'd only when we are about to modify the ifc or its lifc chain. that is when adding or removing addresses. wlock is not required when we addresses to the selfcache, which has its own qlock.	2019-05-11 14:01:26 +02:00
cinap_lenrek	a25819c43a	devip: avoid media bind/unbind kproc reader startup race, simplify etherbind mark reader process pointers with (void)-1 to mean not started yet. this avoids the race condition when media unbind happens before the kproc has set its Proc pointer. then we would not post the note and the reader would continue running after unbind. etherbind can be simplified by reading the #lX/addr file to get the mac address, avoiding the temporary buffer.	2019-05-11 07:22:34 +02:00
cinap_lenrek	83c7a727e0	devip: reject bad numeric ports (such as 9fs -> 9)	2019-04-14 03:22:05 +02:00
cinap_lenrek	4b8f7a2110	devip: ignore the evil bit in fragment info field using ~IP_DF mask to select offset and "more fragments" bits includes the evil bit 15. so instead define a constant IP_FO for the fragment offset bits and use (IP_MF\|IP_FO). that way the evil bit gets ignored and doesnt cause any useless calls to ipreassemble().	2019-03-07 22:39:50 +01:00
cinap_lenrek	4885c75526	devip: ignore icmp advise about laggard fragments icmp has to advise protocols about the first fragment only. all other fragments should be ignored.	2019-03-07 01:25:11 +01:00
cinap_lenrek	57284d07ca	devip: ignore reserved fragment offset bits	2019-03-04 12:07:40 +01:00
cinap_lenrek	e2d310e623	devip: handle packet too big advise for icmp6, remove fragment header	2019-03-04 03:13:29 +01:00
cinap_lenrek	2af6b08960	devip: use common code in icmp for handling advise	2019-03-04 03:09:39 +01:00
cinap_lenrek	827020f686	devip: zero fragment offset after reassembly, remove tos magic, cleanup	2019-03-04 03:08:27 +01:00
cinap_lenrek	a1fceabd5b	devip: fix fragment forwarding unfraglen() had the side effect that it would always copy the nexthdr field from the fragment header to the previous nexthdr field. this is fine when we reassemble packets but breaks fragments that we want to just forward unchanged.	2019-03-04 03:05:30 +01:00
cinap_lenrek	fa97c3dd10	devip: simplify ip reassembly functions, getting rid of Ipfrag.hlen given that we now keep the block size consistent with the ip packet size, the variable header part of the ip packet is just: BLEN(bp) - fp->flen == fp->hlen. fix bug in ip6reassemble() in the non-fragmented case: reload ih after ip header was moved before writing ih->ploadlen. use concatbloc() instead of pullupblock().	2019-03-03 18:56:18 +01:00
cinap_lenrek	a859f05837	devip: fix block list handling for icmp/icmp6, use proper MinAdvise for icmp6	2019-03-03 09:01:23 +01:00
cinap_lenrek	5b972a9aea	devip: fix ip fragmentation handling issues with header options some protocols assume that Ip4hdr.length[] and Ip6hdr.ploadlen[] are valid and not out of range within the block but this has not been verified. also, the ipv4 and ipv6 headers can have variable length options, which was not considered in the fragmentation and reassembly code. to make this sane, ipiput4() and ipiput6() now verify that everything is in range and trims to block to the expected size before it does any further processing. now blocklen() and Ip4hdr.length[] are conistent. ipoput4() and ipoput6() are simpler now, as they can rely on blocklen() only, not having a special routing case. ip fragmentation reassembly has to consider that fragments could arrive with different ip header options, so we store the header+option size in new Ipfrag.hlen field. unfraglen() has to make sure not to run past the buffer, and hadle the case when it encounters multiple fragment headers.	2019-03-03 05:25:00 +01:00
cinap_lenrek	06912e53e4	devip: remove unused eipconvtet.c and ptclbsum.c files	2019-02-13 17:42:20 +01:00
cinap_lenrek	57ed5cc3f0	devip: ipv6 loopback ::1 has link-local scope	2019-02-13 08:46:49 +01:00
cinap_lenrek	7102a23245	devip: use parseipandmask() for ipifc and route control message parsing	2019-02-11 23:43:14 +01:00
cinap_lenrek	8152e9d075	devip: tcp: Don't respond to FIN-less ACKs during TIME-WAIT (thanks Barret Rhoden) Under the normal close sequence, when we receive a FIN\|ACK, we enter TIME-WAIT and respond to that LAST-ACK with an ACK. Our TCP stack would send an ACK in response to any ACK, which included FIN\|ACK but also included regular ACKs. (Or PSH\|ACKs, which is what we were actually getting/sending). That was more ACKs than is necessary and results in an endless ACK storm if we were under the simultaneous close sequence. In that scenario, both sides of a connection are in TIME-WAIT. Both sides receive FIN\|ACK, and both respond with an ACK. Then both sides receive those ACKs, and respond again. This continues until the TIME-WAIT wait period elapses and each side's TCP timers (in the Plan 9 / Akaros case) shut down. The fix for this is to only respond to a FIN\|ACK when we are in TIME-WAIT.	2019-01-27 22:12:50 +01:00
cinap_lenrek	099da8cb82	devip: fix arpread, dont return partial entries	2018-11-28 12:41:18 +01:00
cinap_lenrek	196da4ec6f	devip: fix swapped tcp snd.scale and recv.scale in tcpstate() format (thanks joe9)	2018-11-18 04:14:41 +01:00
cinap_lenrek	b56450471f	devip: remove unused QLock from udp and icmpv6 control blocks (thanks brho)	2018-10-03 00:47:34 +02:00
cinap_lenrek	02b867f01e	devip: only add interface route for "on-link" prefixes when a prefix is added with the onlink flag clear, packets towards that prefix needs to be send to the default gateway so we omit adding the interface route. when the on-link flag gets changed to 1 later, we add the interface route. the on-link flag is sticky, so theres no way to clear it back to zero except removing and re-adding the prefix.	2018-09-28 18:13:01 +02:00
cinap_lenrek	94333ce6a6	devip, ipconfig: avoid overflow on lifetime checks	2018-09-23 22:07:56 +02:00
cinap_lenrek	70c6bd0397	devip: valid and prefered life-time should be unsigned, add remove6 ctl command	2018-09-23 19:09:48 +02:00
cinap_lenrek	4a92a8f6b2	devip: fix default parameter calculation for router life-time router life time is in seconds, while max ra interval is in milliseconds!	2018-09-23 19:08:16 +02:00
cinap_lenrek	259ce5e3de	devip: make updating ra6 router parameters atomic when we fail to parse and validate the command, no update should take place.	2018-09-23 17:24:59 +02:00
cinap_lenrek	11d1947814	arp: interface address only specifies the interface, not the source address for route lookup	2018-08-30 21:17:54 +02:00
cinap_lenrek	5c945a0b48	devip: fix router adv/sol options validation (options padded to 8 bytes)	2018-08-27 20:58:48 +02:00
cinap_lenrek	e49f7fc1f7	devip: fix multicastarp() when ipconfig assigned the 0 address sending multicast was broken when ipconfig assigned the 0 address for dhcp as they would wrongly classified as Runi. this could happen when we do slaac and dhcp in parallel, breaking the sending of router solicitations.	2018-08-11 16:18:12 +02:00

1 2 3

137 commits