txt and caa rr strings might contain binary control characters
such as newlines and double quotes which mess up the output
in ndb(6) format.
so handle them as binary blobs internally and escape special
characters as \DDD where D is a octal digit when printing.
txtrr() will unescape them when reading into internal
binary representation.
remove the undocumented nullrr ndb attribute parsing code.
introduce our own RR* format %P for pretty
printing and call %R format internally,
then use it to print the rest of the line
after the tab, prefixed with the padded
output.
have todo multiple fmtprint() calls for idnname()
as the buffer is shared.
do not idnname() rp->os and rp->cpu, these are symbols.
always quote txt= records.
- enforce same behaviour as cachedb server in dblookup():
- force Taaaa record type on ipv6= attributes, regardless of value
- return Taaaa records for ip= attributes containing ipv6 values
- return Ta records only for ip= attributes containing ipv4 values
- for compatibility, bring back support for txtrr= type, but handle consistently
---
To: 9front@9front.org
Date: Sun, 07 Feb 2021 14:56:39 +0100
From: kvik@a-b.xyz
Subject: Re: [9front] transient dns errors cause smtp failure
Reply-To: 9front@9front.org
I think I found a reason for DNS failing on known good domains.
/sys/src/cmd/ndb/dns.h:156,157
/* tune; was 60*1000; keep it short */
Maxreqtm= 8*1000, /* max. ms to process a request */
So, 8 seconds is how much the resolver will bother with a request it
has been handed, before dropping it on the floor with little
explanation.
It seems quite possible that this is too short a timeout on a machine
during a spam queue run, which predictably stresses the compute and
network resources.
In turn, negative response caching might explain why a particular
unlucky domain would basically stop receiveing any mail for a while.
I'm dying to know if bumping this limit would clear up the queue of
such DNS errors.
---
[narrator: it did.]
On 12/18/20, Jacob Moody wrote:
> Hello,
>
> I recently ran in to some issues with pointing an unbound server towards a
> 9front dns server as its upstream.
> The parsing seemed to fail when ndb/dns received a DNSKEY RR from it's own
> upstream source on behalf of unbound.
> This patch catches and stores the DNSKEY from the upstream server to prevent
> this.
I have the problem that i need to delegate a subdomain
to another name server that is confused about its own zone
(and its own name) returning unusable ns records.
With this, one can make up a nameserver entry in ndb that
is authoritative and owned by us for that nameserver,
and then put it in the soa=delegated ns entry.
This promotes the ns record in the soa=delegated to
Authoritative, which avoids overriding the ns rr's from
the confused server for the delegated zone.
The de-duplication of txt, nullrr, cert, key and sig records
reduced all records to a single one.
Also, dblookup1() missed the txt record case and did not return
a unique list of rr's.
Now we consider these records unique if their value is different.
The new txtequiv() function does that for TXT records, which is
a bit tricky as it needs to take different segmentation into account.
version(5) says:
If the server does not understand the client's version
string, it should respond with an Rversion message (not
Rerror) with the version string the 7 characters
``unknown''.
Pre-lib9p file servers -- all except cwfs(4) -- do return Rerror.
lib9p(2) follows the above spec, although ignoring the next part
concerning comparison after period-stripping. It assumes an
Fcall.version starting with "9P" is correctly formed and returns
the only supported version of the protocol, which seems alright.
This patch brings pre-lib9p servers in accordance with the spec.
The mount() and bind() syscalls return -1 on error,
and the mountid sequence number on success.
The manpage states that the mountid sequence number
is a positive integer, but the kernels implementation
currently uses a unsigned 32-bit integer and does not
guarantee that the mountid will not become negative.
Most code just cares about the error, so test for
the -1 error value only.
kvik writes:
dnsquery(8) prints the interactive prompt on stdout together with
query results, making scripted usage unnecessarily difficult.
A straightforward solution is prompting on stderr instead: as
practiced by rc(1), among many others -- promptly taking care of
the issue:
; echo 9front.org mx | ndb/dnsquery >[2]/dev/null
the target has to be encoded as a domain name (the individual
name components as separate labels followed by . (empty) label),
not as a literal string.
to disable compression, pass nil dictionary to pname().
avoid returning ip addresses that cannot be reached due
to lack of a compatible ip address. this means when here
is no ipv4 address configured, we wont return ipv4 addresses
and would not query dns for an A record.
likewise, when here is no ipv6 address configured then
we wont query dns for an AAAA record.
ipv6 lookups can still be disabled with the -4 flag just
as before.
the dnsquery() library function should not start mouting /srv/dns on
its own. this problem arrises only for ndb/cs as it is started before
ndb/dns.
the issue with mounting /srv/dns before /net is when ndb/cs attempts
to read the list of interfaces, accessing /net/ipifc, which triggers
a rpc to ndb/dns as it is ontop of the mount. this can yield a deadlock
when ndb/dns blocks its 9p loop waiting for requests to complete on
a refresh and the requests are stuck waiting for ndb/cs to translate
a dial string for announce().
dblookup() used to only return the first matching entry. in
case of ipv6, we want all entries returned to get both v4
and v6 addresses... and these might not neccesarily be in
the same entry (see /lib/ndb/common). note also this makes
it behave the same as in cachedb mode which reads in the
whole database.
we do not know if v4 or v6 routing works, so the simplest
is just to query v4 and v6 nameservers in parallel. this is
done by changing serveraddrs() to return one address type,
and we make sure to get at least one v4 and one v6 address
each round.
get rid of the weigthed timeout code... there where too many
assumptions. instead, we give a round 500ms timeout (or 1 second
in patient mode) and honor the maximum query time.
we have to maintain the ->line chain for ndbreorder() to work, so add
a little helper: ndbline() which replicates the ->entry chain and links
the last tuple to the first; makeing the whole list into a single line.