David J N Begley wrote:
> > > - I thought we had fairly aggressive caching practices; for example:
> > >
> > > #hierarchy_stoplist cgi-bin ? -- (ie., default, nothing uncommented)
> > > #no_cache -- nothing
> > > refresh_pattern . 1440 200% 43200
> >
> > While this is perfectly legal thing to do, you should know the
> > implications of doing so..
>
> Uh-huh .. it's an old hang-over config that doesn't "appear" to have broken
> anything or upset anyone so it hasn't been updated (as I said, "fairly
> aggressive caching practices" so by rights if there was a problem we should
> have tripped it with this site).
I've been a little concerned about some of the refresh patterns going
around on this list recently. The implications of the above are as
follows:
- No document will have its freshness checked for the first 24 minutes
in cache (min 1440). For documents that are rapidly modified this could
cause a staleness problem. Apparently this has not been an issue since
it has not caused complaints, so that's good.
- Every document in the cache longer than 12 hours will generate an IMS
request (max 43200). What we and others have found is that most
documents are rarely modified. A recent paper by Douglis, Feldmann,
Krishnamurthy, and Mogul explores this in depth (sorry I don't have it
with me so can't report what the mean and median ages were, but if
memory serves they were weeks to months, not hours to days). The impact
of max is then to cause IMS requests for not-modified documents more
often than necessary. While these requests are small in bandwidth they
contribute directly to user latency. Given the observed modification
patterns a larger max would make sense.
- The tradeoff in not checking freshness is increased stale documents.
The Squid default for percent is 20 (versus 200 above). This means
after 20 percent of the document's age it will be checked for
freshness. So if a document was 2.5 days old when it was first
retrieved, its freshness will be checked again in 12 hours (with the 20%
Squid default). This is the same as the max in the refresh pattern
above, but note that if the document was 2.5 months old (fairly typical
I should think, Bala chip in if you're listening :-) it would only be
checked after 360 hours in cache.
- The Alex protocol is based upon the assumption that a document
modified recently will be more likely to be modified soon than a
document modified a long time ago. This is based upon a lot of prior
work (decades of it in fact - pre web if you can remember back that far!
:-), and seems to behave quite well, in other words it matches human
behavior patterns. The percent factor says how old you'll let the
document get before checking to see if it's been modified. 200% says
let it get to be twice its age before checking - but this seems kind of
long to me. Instead, I think the percent parameter should be smaller
(20%, 50%), and increase max to prevent needless IMS requests. The
Squid default for max is, I believe, 3 weeks; but documents won't get
more than 20% of their age in the cache without refreshing. This
exhibits geometric scaling in network traffic, which is pretty
attractive.
The interaction of refresh pattern (which drives TTL in cache),
staleness, and network traffic is fairly complicated. One thing I've
been interested in lately is the relationship of the Squid
administrator's desire to reduce bandwidth demand and the user and
content publisher's desire for lower latency and staleness. How much of
an issue has this been for the ISPs and other cache administrators out
there? Have others tried such "fairly aggressive caching practices" and
run into problems? I'd be interested in your feedback.
Thanks,
-- jad --
John Dilley
HP Laboratories
Received on Fri Dec 11 1998 - 18:04:54 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:38 MST