Henrik Nordstrom wrote:
> > The counts above also reinforce my feeling that RTT estimates should
> > influence routing even for requests that get ICP timeouts from all peers
> > (assuming that's not because all peers are down... :-) - the FIRST_UP_PARENT
> > choice was cam0.sites but that parent is clearly being avoided by the
> > ICP-based routing and it seems to have problems at present - continual
> > stream of TCP connection failed (with occasional succeeded), etc. Being
> > first in the list does not mean it's a good (or even reasonable) choice.
> 
> Attached is a patch designed to select the parent with lowest
> statistical RTT when ICP times out.
I've now looked at the cases where I still got TIMEOUT_FIRST_UP_PARENT and 
they appear to be due to the chosen FIRST_PARENT_MISS peer rejecting Squid's 
connection (reason unknown, it was the fastest parent for most requests :-)
resulting in it dropping back to FIRST_UP_PARENT.
That prompts two followup questions:
(1) Why is it logged as TIMEOUT_FIRST_UP_PARENT? That's misleading, since 
it wasn't chosen due to ICP timeout... (I'm not sure what it should be 
logged as instead, though; FIRST_UP_PARENT would tend to imply it was the 
first-choice route, when it wasn't; perhaps TIMEOUT_ is has the most 
appropriate meaning. The more general question, I suppose, with Squid 
retrying failed retrievals automatically, is whether there is intended to be 
a distinction between the description in the log of requests that are routed 
in a particular way as a result of failure by a preferred retrieval option, 
of if they're intended to be logged as if the routing that was actually used 
had been the first choice.
(2) Is there a check somewhere to avoid the same parent being chosen as 
FIRST_PARENT_MISS peer and also as the fallback FIRST_UP_PARENT peer? Not a 
major problem, I suppose, unless a *lot* of requests were routed in that way 
and the peer was unresponsive, so that the requests would take twice as long 
to fail (assuming the problem affected all requests; it's possible the first 
would fail and the second succeed, if the peer status was fluctuating 
rapidly e.g. close to running out of filedescriptors for peer sockets).
 
                                John
-- University of Cambridge WWW manager account (usually John Line) Send general WWW-related enquiries to webmaster@ucs.cam.ac.ukReceived on Tue Jul 29 2003 - 13:15:58 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST