Last time long ago There was a talk about URL storing the original 
request URL at the swap_file Meta data.
Now it strikes me again while testing something.
the code of:
http://bazaar.launchpad.net/~squid/squid/trunk/view/head:/src/StoreMetaURL.cc#L39 
   (25 lines of code)
##start
bool
StoreMetaURL::checkConsistency(StoreEntry *e) const
{
     assert (getType() == STORE_META_URL);
     debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URL 
checkConsistency wasn't used  ");
             return true;
     if (!e->mem_obj->original_url)
         return true;
     if (strcasecmp(e->mem_obj->original_url, (char *)value)) {
         debugs(20, DBG_IMPORTANT, "storeClientReadHeader: URL mismatch");
         debugs(20, DBG_IMPORTANT, "\t{" << (char *) value << "} != {" 
<< e->mem_obj->original_url << "}");
         return false;
     }
     return true;
}
##end
The code responsible to check the consistency of a cached file\object 
URL against the current requested URL.
It's being used at store_client.cc and move from there in newer revisions.
In the old revision 4338 it states that the meaning of this code is:
"Check the meta data and make sure we got the right object."
The problem is that it only being checked while a file is being fetched 
from UFS(what I have checked) while from RAM it wont be checked.
The result is that when store_url_rewrite feature is being used the 
check points on inconsistency between the request url and the object in 
cache_dir (naturally).
Disabling this check will make my life easy with store_url making it 
from "not" to "works".
So I have couple options how to "fix" the issue:
1. disable this check.
2. disable this check for only store_url_rewritten requests.
3. adding the store_url meta object into the cache file and use it to 
identify the expected url.
4. add on\off switch to disable this check.
5. others?
After a small talk with alex I sat down and made some calculations about 
MD5 collision risks.
The hash used to make the index hash is a string from "byte + url".
For most caches that I know of there is a very low probability for 
collision considering the amount of objects and urls.
Yes we are talking about many many objects and it is possible but it's 
not only the URL hash but some other unknowns like request and response 
headers which makes this whole calculation a bit far from reality to hit 
and taking it from 2^64 chance of collision to more then 2^124.
It seems to me like it will take some amount of time until I will 
see(never seen) hash collision.
What do you think?
Have you seen real world scenario of collision?
Eliezer
Received on Wed Nov 21 2012 - 19:06:26 MST
This archive was generated by hypermail 2.2.0 : Thu Nov 22 2012 - 12:00:08 MST