I am running squid on over a thousand computers that are filtering data
coming out of one of the particle collision detectors on the Large
Hadron Collider. There are two origin servers, and the application
layer is designed to try the second server if the local squid returns a
5xx HTTP code (server error). I just recently found that before squid
2.7 this could never happen because squid would just return stale data
if the origin server was down (more precisely, I've been testing with
the server up but the listener process down so it gets 'connection
refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
the origin server sends 'Cache-Control: must-revalidate' then squid will
send a 504 Gateway Timeout error. Unfortunately, this timeout error
does not get cached, and it gets sent upstream every time no matter what
negative_ttl is set to. These squids are configured in a hierarchy
where each feeds 4 others so loading gets spread out, but the fact that
the error is not cached at all means that if the primary origin server
is down, the squids near the top of the hierarchy will get hammered with
hundreds of requests for the server that's down before every request
that succeeds from the second server.
Any suggestions? Is the fact that negative_ttl doesn't work with
max_stale a bug, a missing feature, or an unfortunate interpretation of
the HTTP 1.1 spec?
By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
same as squid.conf's 'max_stale 0' but I never see an error come back
when the origin server is down; it returns stale data instead. I wonder
if that's intentional, a bug, or a missing feature. I also note that
the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
stale) header attached if stale data is returned and I'm not seeing
those.
- Dave
Received on Thu Sep 25 2008 - 19:04:13 MDT
This archive was generated by hypermail 2.2.0 : Tue Sep 30 2008 - 12:00:04 MDT