On Thu, Oct 27, 2011 at 04:37:20PM +1030, Brett Lymn wrote:
>
> OK, but, the 2.7 stable 6 machines that work well share the same parents
> as the 3.1.15 machines - they even talk to the same DNS servers.
>
I had a bit of a dig at this on the weekend and can confirm that the
problem is a DNS issue and is a combination of broken DNS and the way
squid does lookups. It looks like the new directive in 3.1.16 would
help in this case.
What looks to be happening is that squid never tries to look up the A
address, the remote server just times out on the AAAA lookup but it
takes so long that the timeout clobbers the DNS request in the queue. I
see this on a tcpdump:
192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
19:38:25.132968 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68)
192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
19:38:30.154854 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68)
192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
19:38:31.177449 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68)
192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
19:38:36.197481 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68)
192.168.3.3.65473 > 192.231.203.132.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
19:38:37.217890 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 68)
192.168.3.3.65472 > 192.231.203.3.domain: [udp sum ok] 10060+ AAAA? www.my.commbank.com.au. (40)
And this in the cache.log with debug_options 78,3:
2011/10/30 19:22:57.089| idnsRead: starting with FD 11
2011/10/30 19:22:57.089| idnsRead: FD 11: received 40 bytes from 192.231.203.3:53
2011/10/30 19:22:57.089| idnsGrokReply: ID 0xcbe0, -2 answers
2011/10/30 19:22:57.089| idnsGrokReply: error Server Failure: The name server was unable to process this query. (2)
2011/10/30 19:22:57.089| idnsGrokReply: Query result: SERV_FAIL
2011/10/30 19:23:58.160| idnsCheckQueue: ID 0x54bftimeout
2011/10/30 19:24:58.996| idnsCheckQueue: ID 0x54bftimeout
2011/10/30 19:24:58.996| idnsCheckQueue: ID 54bf: giving up after 4 tries and 121.91 seconds
In the code I can see that the A record is supposed to be tried after a
SERV_FAIL has happened a few times but in this case the retries take so
long the DNS request gets killed out of the queue before that part of
the code is executed.
What I eventually did at home was rebuild squid with --disable-ipv6
(actually, it would be nice if this was a config directive rather than
compile time....). Once I had done this the comm bank site was actually
reasonably useable since the AAAA lookups were not being tried at all.
-- Brett Lymn "Warning: The information contained in this email and any attached files is confidential to BAE Systems Australia. If you are not the intended recipient, any use, disclosure or copying of this email or any attachments is expressly prohibited. If you have received this email in error, please notify us immediately. VIRUS: Every care has been taken to ensure this email and its attachments are virus free, however, any loss or damage incurred in using this email is not the sender's responsibility. It is your responsibility to ensure virus checks are completed before installing any data sent in this email to your computer."Received on Mon Oct 31 2011 - 01:01:06 MDT
This archive was generated by hypermail 2.2.0 : Mon Oct 31 2011 - 12:00:03 MDT