Henrik Nordstrom wrote:
> 
> There is a small design error in my patch, making it perform slightly
> different than I described.
> 
> 	if (parent_timeout > sibling_timeout)
> 
> should read
> 
> 	if (parent_exprep)
> 
> Updated patch attached.
> 
> Which one is the correct I don't know. Should Squid bother waiting for
> slow siblings when the parents are much faster (at least twice as fast)?
I'd guess probably not... 
I've been trying the earlier version of the patch over the last few hours, 
and it looks like it works. After it had been running for a while, counts 
for the last 2000 access log entries mentioning PARENT or SIBLING in the 
status (omitting others) I saw 
 312 FIRST_PARENT_MISS/cam1.sites.wwwcache.ja.net
 932 FIRST_PARENT_MISS/cam2.sites.wwwcache.ja.net
  86 FIRST_UP_PARENT/cam0.sites.wwwcache.ja.net
  14 PARENT_HIT/cam0.sites.wwwcache.ja.net
   7 PARENT_HIT/cam1.sites.wwwcache.ja.net
  89 PARENT_HIT/cam2.sites.wwwcache.ja.net
   2 SIBLING_HIT/wwwcache.damtp.cam.ac.uk
 196 TIMEOUT_FIRST_PARENT_MISS/cam1.sites.wwwcache.ja.net
 329 TIMEOUT_FIRST_PARENT_MISS/cam2.sites.wwwcache.ja.net
  33 TIMEOUT_FIRST_UP_PARENT/cam0.sites.wwwcache.ja.net
and the ICP ping timeouts mentioned in debugging output were much more 
reasonable for the parent RTTs. Initially, I thought it must be getting it 
wrong as the logged timeouts seemed far too low, but checking with cachemgr 
I found that the actual peer RTTs were much lower than when I was testing 
last night - in fact, over a couple of hours this morning the peer RTTs 
increased by a factor of ten as load (on our cache, the parents, and the 
net) picked up, which servers to emphasise the importance of dynamic 
adjustment of the timeout.
The counts above also reinforce my feeling that RTT estimates should 
influence routing even for requests that get ICP timeouts from all peers 
(assuming that's not because all peers are down... :-) - the FIRST_UP_PARENT 
choice was cam0.sites but that parent is clearly being avoided by the 
ICP-based routing and it seems to have problems at present - continual 
stream of TCP connection failed (with occasional succeeded), etc. Being 
first in the list does not mean it's a good (or even reasonable) choice.
A few other miscellaneous points that arose in this morning's testing:
(1) The first couple of times I attempted to start the recompiled Squid 2.2
with the patch, startup failed with an "Arithmetic Exception" message 
appearing in the middle of the startup script's output. It's not clear where 
that came from, though presumably from squid rather than something else (but 
no way to be sure). I couldn't see any core files to help pin it down. At 
that point, I had some extra debugging enabled -
debug_options 11,9                                       
debug_options 15,9                            
debug_options 17,9
debug_options 44,9         
in addition to my normal ALL,1 , and as I'd become suspicious that the 
debuyg_options could have odd side-effects (see below!), I tried again with 
those removed. That worked, but subsequently the Arithmetic Exception did 
not recur even when I reinstated the debug_options and reconfigured, or 
later when I restarted the server with those options in the config file at 
startup. Very odd.
(2) I've mentioned before that adding an extra debug option sometimes seem 
to cause all cache.log output to cease, and it was happening again, though 
not entirely the same as I'd noticed before. Previously, I'd found that 
adding an extra option in addition to those mentioned above would cause all 
cache.log output to cease. What I saw today was that with just those options 
in the config, normal Squid startup messages etc., were omitted completely
(though I'm sure that's not always happened with earlier testing) *but*
the debugging output (ICP ping details etc.) were logged as normal. Some 
shutdown messages still got logged, but not all - e.g. when the cache wasn't 
handling live load, it just reported
CPU Usage: 364.210 seconds
Maximum Resident Size: 0 KB
Page faults with physical i/o: 4175
without any of the messages about saving metadata, etc. 
(3) I've noticed that when running with live load (but not when essentially 
idle), Squid 2.2 often (but maybe not always) produces a list of open FDs at 
shutdown, like
1999/05/08 13:29:58|   Finished.  Wrote 384017 entries.
1999/05/08 13:29:58|   Took 2 seconds (192008.5 entries/sec).
CPU Usage: 605.030 seconds
Maximum Resident Size: 0 KB
Page faults with physical i/o: 4747
1999/05/08 13:29:58| Open FD               6 /opt/squid-logs-v2/useragent.log
1999/05/08 13:29:58| Open FD WRITING      39 /opt/squid-logs-v2/access.log
1999/05/08 13:29:58| Open FD WRITING      41 /opt/squid-logs-v2/store.log
1999/05/08 13:29:58| Open FD WRITING      43 squid -> unlinkd
1999/05/08 13:29:58| Open FD WRITING      44 /opt/squid-cache-v2/swap.state
1999/05/08 13:29:58| Open FD WRITING      91 /opt/squid-cache-v2/03/45/00034556
1999/05/08 13:29:58| Open FD             106 HTTP Request
1999/05/08 13:29:58| Open FD WRITING     178 /opt/squid-cache-v2/03/44/000344D1
1999/05/08 13:29:58| Open FD WRITING     240 /opt/squid-cache-v2/03/45/00034529
1999/05/08 13:29:58| Squid Cache (Version 2.2.STABLE2): Exiting normally.
I don't see that with Squid 2.1 (and don't remember it from any earlier 
version) - is it just reporting something that was ignored before, purely 
for information, or is it an indicator that there's a problem or bug?
My guess (without checking the source code) is that something's not allowing 
enough time for a clean shutdown. After getting messages like the above, I 
tend to see comments like 
1999/05/08 11:50:15| storeSwapInFileOpened: /opt/squid-cache-v2/00/00/000000C9: 
Size mismatch: 11884(fstat) != 1229(object)
subsequently, implying that writing out objects from memory really was 
interrupted - though I cannot tell if that was because it really was taking 
an unreasonable amount of time, or because something was impatient and 
didn't allow a reasonable amount of time for cleaning up.
                                John
-- University of Cambridge WWW manager account (usually John Line) Send general WWW-related enquiries to webmaster@ucs.cam.ac.ukReceived on Tue Jul 29 2003 - 13:15:58 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST