Mark Nottingham wrote:
> I've got a user who's running a pair of peered accelerators, using both 
> stale-while-revalidate and max_stale.
> 
> Occasionally, they see extremely old content being served; e.g., if CC: 
> max-age is 60s, they might see something go by which is 1000-3000 
> seconds old (but still within the max_stale window).
> 
> The pattern that appears to trigger this is when a resource with an 
> in-cache 200 response starts returning 404s; when this happens, Squid 
> will start returning TCP_NEGATIVE_HIT/200's. E.g. (traffic driven by 
> squidclient),
> 
> 1713703.815      0 127.0.0.1 TCP_STALE_HIT/200 5234 GET 
> http://server1//5012904 - NONE/- application/json
> 1221713703.979    164 0.0.0.0 TCP_ASYNC_MISS/404 193 GET 
> http://server1/5012904 - FIRST_UP_PARENT/back-end-server1 text/plain
> 1221713711.431      0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET 
> http://server1/5012904 - NONE/- application/json
> 1221713720.978      0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET 
> http://server1/5012904 - NONE/- application/json
> 1221713723.483      0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET 
> http://server1/5012904 - NONE/- application/json
> 
> As you can see, stale-while-revalidate kicks in, and the async refresh 
> brings back a 404, but that doesn't get stored properly.
> 
> Looking at the code, I *think* the culprit is storeNegativeCache(), 
> which will, assuming that max_stale is set (either in squid.conf or 
> response headers), block the new response from updating the cache -- no 
> matter what its status code is.
> 
> It makes sense to do this for 5xx status codes, because they're often 
> transient, and reflect server-side problems. It doesn't make as much 
> sense to do this for 4xx status codes, which reflect client-side issues. 
> In those cases, you always want to update the cache with the most recent 
> response (and potentially negative cache it, if the server is silly 
> enough to not put a freshness lifetime on it).
> 
> The interesting thing, BTW, is that this only happens when collapsed 
> forwarding is on, because this in httpReplyProcessHeader:
> 
>       if (neighbors_do_private_keys && !Config.onoff.collapsed_forwarding)
>     httpMaybeRemovePublic(entry, reply);
> 
> masks this behaviour.
> 
> Thoughts? I'm not 100% on this diagnosis, as the use of peering and 
> stale-while-revalidate make things considerably more complex, but I've 
> had pretty good luck reproducing it... I'm happy to attempt a fix, but 
> wanted input on what approach people preferred. Left to my own devices, 
> I'd add another condition to this in storeNegativeCache():
> 
> if (oe && !EBIT_TEST(oe->flags, KEY_PRIVATE) && !EBIT_TEST(oe->flags, 
> ENTRY_REVALIDATE))
> 
> to limit it to 5xx responses.
> 
I'd agree with you based on that analysis. Can you add a bugzilla entry 
with a patch that does it?
Amos
-- Please use Squid 2.7.STABLE4 or 3.0.STABLE9Received on Thu Sep 18 2008 - 12:39:42 MDT
This archive was generated by hypermail 2.2.0 : Fri Sep 19 2008 - 12:00:04 MDT