On Mon, 31 May 1999, Alex Rousskov wrote:
> Terence Kelly wrote:
>
> > Recent research suggests
> > that policies such as GreedyDual-Size and variants of LFU
> > offer substantial advantages over LRU-based algorithms.
>
> I may not be up-to-date, but all improvements of LRU I have seen so far
> fail when applied to real cache sizes. Clearly, having caching policy as a
> module would benefit a few researchers, but I still have to see a solid
> proof that a stupid LRU-TH can be outperformed by a "traditional" smart
> algorithm. If you have a pointer to a corresponding article (that uses
> caches size more than 300% of high volume daily traffic), please send it to
> me.
I'm glad you've noticed that in too many Web caching research papers,
trace-driven simulations use very small traces and consequently must
simulate implausibly small caches. I've seen papers that simulate
PalmPilot-sized caches. Good engineers aren't impressed with
meaningless exercises like this. I'm aware of exactly one
widely-cited paper that deals with reasonable cache sizes: Rizzo &
Vicisano's "Replacement Policies for a Proxy Cache". They use the
DEC trace (22M requests before filtering) and simulate up to 10GB.
In my WCW99 presentation I showed data on cache sizes up to 8GB on
two-week NLANR request streams of 5-7 million requests. Now I'm
working with six 28-day NLANR request streams, each of which contains
between 10 and 35 million requests. I'm simulating caches up to
32GB. My data show that LFU with an aging mechanism offers
substantially higher byte hit rates than LRU; for instance, on our
largest trace, an 8GB LRU cache gives you a 51% BHR whereas LFU gives
you 55%. I'll have a paper ready for submission in a few weeks.
In the more interesting & general case where you want to optimize
an arbitrary objective function (e.g., minimize sum over cache misses
of a per-document retrieval cost), my data show that GreedyDual-Size
and weighted LFU with aging both outperform cost-insensitive
algorithms such as LRU and LFU (not surprising).
Again, I'm glad you're skeptical about the Web caching literature.
We in the academic research community often poorly serve
practitioners like the Squid developers. You can help us make our
work more useful by laughing loudly when we publish meaningless
results. And by making the modifications to Squid that I mentioned
in my original message. :)
While we're on the subject...
How big are current & foreseeable Squid caches, i.e., how much data
content can they store?
How many Squid caches are currently in operation? Where are they?
Are the Squid developers interested in replacement policies
well-suited to small (RAM-sized) caches? Would you consider using
two different removal policies, one to manage the Squid "Hot Object"
in-core store and one to manage the overall cache?
Received on Tue Jul 29 2003 - 13:15:58 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:09 MST