Hi. I've been working to add prefetching to squid3. It works by
analyzing HTML and looking for various tags that a graphical browser an
be expected to request.
So far, it seems to just-barely work. What works is checking the
content-type of the document, avoiding encoded (gzip'ed) documents,
analyzing the HTML using libxml2 in "tag soup" mode, resolving the full
URL from relative references, and fetching the files into the cache. (I
would, of course, appreciate code reviews of the branch before I diverge
too far!)
However, I've run into a few problems.
To prefetch a page, we call clientBeginRequest. I've already had to
extend the richness of this interface a little. The main problem is that
it will open up a new socket for each call. On a page with 100
prefetchables, it will open 100 TCP connections to the remote server.
That's not nice. I need a way to re-use a connection for multiple
requests. How should I do this? I'd like clientBeginRequest to be smart
enough to handle this behind the scenes.
Occasionally I see duplicate prefetches. I think what's going on here is
that the object is uncacheable. The only way I can think of solving this
is by adding an "uncacheable" entry type to the store -- but that just
seems wrong, conceptually. On a related note, maybe we could terminate a
prefetch as soon as we receive the headers and notice that it's
uncacheable. Currently, we download the whole thing and just discard it
(after analyzing it for more prefetchables if it's HTML).
Finally, does anyone have suggestions for how to test for performance
improvement due to prefetching?
Thanks,
Nick Lewycky
Received on Thu May 12 2005 - 07:36:26 MDT
This archive was generated by hypermail pre-2.1.9 : Tue May 31 2005 - 12:00:03 MDT