Re: Store_url_rewrite for squid 3+ from Alex Rousskov on 2012-09-07 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Sep 2012 16:36:07 -0600

On 09/07/2012 01:32 PM, Eliezer Croitoru wrote:
> On 09/07/2012 05:10 PM, Alex Rousskov wrote:
>
> I am not sure what "option" you are referring to in the above. The
> Store::get(key) API I have described is not optional -- it is the
> primary way of detecting a hit.
>
> +1 to use the api.
> are we talking about the "Store::Root().get" ?

That is where it starts, but other Store kids implement the get() method
as well IIRC. From the caller perspective, you should also look at
storeGetPublic*() and similar functions.

> The URL rewriting must happen before or during Store key calculation.
>
> The problem is that if I am rewriting the url, unlike
> url_rewrite\redirect the connection should be initiated based on the
> original url and not the rewritten one.

Sure, but that is not a problem you need to worry about much. Since you
are not going to alter the original URL, the connection initiation code
will not be affected by your changes (except for some minor adjustments
if some data members get renamed or Store::get() API-related calls get
adjusted as discussed below).

> What I do not know yet is what and where and based on what the request
> is done.

For this project, you do not need to know much about connection
initiation code. You need to make a copy of the original URL, rewrite
that copy using the new helper, and put it in a new HttpRequest data
member (for example). This should probably happen (or be initiated from)
client_side.cc or client_side_request.cc.

You then need to make sure that every time Store calculates a store key,
the new member (if any) is used instead of the original URL.

Since the current code is using the original request URL where you want
the rewritten/store URL to be used, I recommend _renaming_ all members
that keep or extract the original/request URL (e.g., MemObject::url).
This will force you to go through the relevant callers and make sure
that the right URL is used, depending on the caller context.

Once you are done, we can review whether the vast majority of changes in
the diff are a simple renaming of "url" to "requestUrl". If yes, you can
revert that renaming change. If there are many cases where old "url"
became "storeUrl" and many cases where it became "requestUrl" then the
change should stay.

Please note that the above is just a sketch. I am not suggesting any
specific new names, and I am using existing class names only as an example.

> so we need to store the original url and the store_url at least for the
> period of time the request is being served.

Yes, we need to keep both URLs around during the transaction lifetime.
When I was talking about storing a URL, I was only talking about Store
(i.e., storing something in the cache).

>(answer about storing or not the store_url in meta)

That is a separate question. I cannot answer it, unfortunately. I do not
know why Squid2 stored either of those URLs. I know Squid3 does not
store the original/request URL.

> url_rewrite api gets the http requests as the background to any action
> so it cannot be used for store_url since any change to the request will
> lead to the change we are trying to not do exactly.
>
> mem_object contains:
> {...
> char *url;
> HttpRequest *request;
> ...}
> and the url is being used to make the hash but I still dont know what is
> being taken from the mem_object to do the request.

HttpStateData::buildRequestPrefix() builds request headers. It may use
mem_object::url. I do not think you should change that code (apart from
renaming and Store::Get() API changes discussed elsewhere).

You should only be concerned with calls to storeGetPublic*() and such.
Make sure the callers do not use mem_object::url without checking
whether there is a "rewritten" store URL as well. You will probably want
to change profile of most of those calls so that the caller does not
need to worry about picking the right URL to use (the decision would
happen inside storeGetPublic() and friends).

>> Why do we need to store it?
> I do not know why we need to store it and i think that the only things
> that should be saved is the full request full reply and some data\meta
> the being will be used by the replacement policy.
>
> since I dont know how everything works and about all the api in the code
> after reviewing the 2.7 code related to store_url it's seems like the
> approach wasn't that bad using the meta data.

How is that metadata used in Squid2? What is it used for? Nobody is
suggesting that Squid2 does something wrong here, of course. We just
need to understand _what_ it does so that we can either mimic what it
does in Squid3 or do something better.

> since couple hours ago I noticed that it was done in order to make
> things possible and not really to think in a more wide angle about the
> effect of it.

What things does it make possible?

I hope we will not have to commit some code without understanding its
effects.

> the main problem I think the STORE_META_STOREURL was maybe used for is
> if and while rebuilding the cache_dir data to be able not pass the url
> to the store_url helper.
> from this point I think it's costs space but when squid rebuilds the
> cache_dir I still dont now if it's good to pass url into the helper.

IIRC, when Squid rebuilds in-memory index from swap.state or from the
directory directly, it does not need URLs. It needs keys which are
stored in swap.state and in directory entries.

> And for the question that pops out: if a cache_dir swap.data get's
> corrupted, what the fate of the cache in the squid3.head ?

If a corrupted swap.state entry is detected, it is removed/ignored. If
the whole swap.state is missing or its header is corrupted, then we
rebuild the in-memory index from disk instead. Please note that not all
cache_dir types use swap.state.

HTH,

Alex.
Received on Fri Sep 07 2012 - 22:36:30 MDT

This archive was generated by hypermail 2.2.0 : Sun Sep 09 2012 - 12:00:05 MDT