I found out a little bit more by looking in the source code and the
generated headers and setting a few breakpoints. The squid closest to
the origin server that is down (the one at the top of the cache_peer
parent hierarchy) never attempts to store the negative result. Worse,
it sets an Expires: header that is equal to the current time. Squids
further down the hierarchy do call storeNegativeCache() but they see
an expiration time that is already past so it isn't of any use.
Those things make it seem like squid is far from being able to
effectively handle failing over from one origin server to another
at the application level.
- Dave
On Tue, Sep 30, 2008 at 10:32:43AM -0500, Dave Dykstra wrote:
> Do any of the squid experts have any answers for this?
>
> - Dave
>
> On Thu, Sep 25, 2008 at 02:04:09PM -0500, Dave Dykstra wrote:
> > I am running squid on over a thousand computers that are filtering data
> > coming out of one of the particle collision detectors on the Large
> > Hadron Collider. There are two origin servers, and the application
> > layer is designed to try the second server if the local squid returns a
> > 5xx HTTP code (server error). I just recently found that before squid
> > 2.7 this could never happen because squid would just return stale data
> > if the origin server was down (more precisely, I've been testing with
> > the server up but the listener process down so it gets 'connection
> > refused'). In squid 2.7STABLE4, if squid.conf has 'max_stale 0' or if
> > the origin server sends 'Cache-Control: must-revalidate' then squid will
> > send a 504 Gateway Timeout error. Unfortunately, this timeout error
> > does not get cached, and it gets sent upstream every time no matter what
> > negative_ttl is set to. These squids are configured in a hierarchy
> > where each feeds 4 others so loading gets spread out, but the fact that
> > the error is not cached at all means that if the primary origin server
> > is down, the squids near the top of the hierarchy will get hammered with
> > hundreds of requests for the server that's down before every request
> > that succeeds from the second server.
> >
> > Any suggestions? Is the fact that negative_ttl doesn't work with
> > max_stale a bug, a missing feature, or an unfortunate interpretation of
> > the HTTP 1.1 spec?
> >
> > By the way, I had hoped that 'Cache-Control: max-stale=0' would work the
> > same as squid.conf's 'max_stale 0' but I never see an error come back
> > when the origin server is down; it returns stale data instead. I wonder
> > if that's intentional, a bug, or a missing feature. I also note that
> > the HTTP 1.1 spec says that there MUST be a Warning 110 (Response is
> > stale) header attached if stale data is returned and I'm not seeing
> > those.
> >
> > - Dave
Received on Tue Sep 30 2008 - 22:13:10 MDT
This archive was generated by hypermail 2.2.0 : Wed Oct 01 2008 - 12:00:03 MDT