(cross-posted to squid-users for more minds looking at the problem)
How does one get around the discrepancy caused by IP-based versus name-based  
cache object storage when working in a hierarchy of mixed transparent and  
proxy-based caches?
As a prime example, if I have a transparent cache implemented at my site,  
that means that every object I have a theoretical object in my cache "looks"  
like this after a transparently-delivered request (forgive my word wrapping):
--------
856211777.474   2205 206.131.27.68 TCP_MISS/200 1415 GET  
http://206.79.203.152/news.html - DIRECT/206.79.203.152 text/html
--------
Now, if someone on my network has their browser "correctly" configured to  
use my cache as a proxy, the will create a request that looks like this:
--------
856211822.733   5362 206.205.169.42 TCP_MISS/200 1415 GET  
http://www.softwareforum.org/news.html - DIRECT/www.softwareforum.org  
text/html
--------
So I have two different "objects" that contain the exact same data, but  
their pointers are going to be different, and they'll take up disk space  
twice in my cache.
  I know how I can get this to work in a "broken" fashion - by putting an  
"intercept" routine in my squid cache in front of proxy requests and turning  
the name into a number, I can store things consistiently by number within the  
cache.  However, that may be seen as a sub-optimal solution. (Discussions of  
RFC compliance with transparent proxy servers will be left out of this  
message, though that war may force itself into any replies...)
I _could_ theoretically do an inverse lookup on the IP address, check to see  
if it has a comparable forward (eg: is it correctly in-addr'ed?) and then  
store based on the resultant name if successful.  However, this also is  
sub-optimal, since many web servers do not have inverses correctly  
provisioned (eg: look at www.netscape.com's broken inverses as a glaring case  
in point.)
Further complicating the matter, and this is the real heart of my question:  
How do you communicate with a cache hierarchy of mixed-method caches?   
Duplication of objects will be the end result unless someone has a magic  
bullet that solves these problems.  The "ugly" magic bullet is to turn  
everything into IP addresses and then hope that your peer/parent caches have  
a goodly population of objects that is IP address based.  If not, there will  
be significant duplication of objects if they are stored soley on object  
name, and your transparency will disasterously detract from the overall  
effectiveness of your cache in a hierarchy.
JT
Received on Tue Feb 17 1998 - 16:05:15 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:38:55 MST