Following some IRC chat, I thought I'd start a discussion on a  
possible improvement of refresh_pattern in Squid3.
The starting point for this discussion is the fact that  
refresh_pattern is a source of confusion for many users, even expert  
admins. It's not obvious what it does, how to achieve certain things,  
or under what circumstances different bits of it apply or don't apply.
Currently refresh_pattern means different things depending on how the  
response freshness was calculated: whether by explicit header set by  
the origin server (Cache-Control, Expires), by invoking the Last- 
Modified algorithm (if it had a Last-Modified header), or whether it  
could not calculate a freshness by either of these methods.
It's quite complicated. I don't know what the right answer is.
Here is an idea though:
We could separate the configuration out into "standard" and "HTTP  
violating" parts. Let us define "standard" as the two mechanisms that  
are most semantically transparent:
1. Explicit expiration set by server (Cache-Control, Expires)
2. Heuristic expiration based on Last-Modified
And let's define "HTTP violating" as anything that either overrides  
these, or anything that enforces cacheability in the absence of any  
of these headers.
What configuration options do we need for each of these two categories?
For the "standard" configuration:
We don't need any options for the explicit expiry mechanism, as  
it's... explicit :)
However, we do need a couple of global options for the Last-Modified  
factor algorithm:
      TAG: refresh_lastmod_factor (percent)
      Default: 20
      TAG: refresh_lastmod_max (minutes)
      Default: 10080
These, then, are the only refresh options I propose for a non-HTTP- 
violating setup.
Now for the "HTTP violating" overrides, which are more complicated.
Defaults are set first:
        
      TAG: refresh_override_default options
      Default: none
These can be refined by regex:
      TAG: refresh_override_match [-i] pattern options
      Default: none
where options can be any of:
      min=xxx
           minimum amount of time this object will be considered fresh
      max=xxx
           maximum amount of time this object will be considered fresh
      ignore-reload=on|off
           ignore all client headers that prevent serving a cached  
response
      reload-into-ims=on|off
           client reload is downgraded from unconditional to  
conditional GET
      ignore-no-cache=on|off
           ignore all server headers that prevent caching a response
      ignore-no-store=on|off
           ignore "Cache-Control: no-store" server header
      ignore-private=on|off
           ignore "Cache-Control: private" server header
      ignore-auth=on|off
           cache authorized responses, even if server didn't specify  
"Cache-Control: public"
      refresh-ims=on|off
           always pass client IMS requests through to the origin,  
even if we think our copy is fresh
For example:
      refresh_override_default     max=4320 reload-into-ims=on
      refresh_override_match     http://host/     ignore-reload=on  
ignore-no-cache=on ignore-no-store=on
      refresh_override_match     /path/     reload-into-ims=off
      refresh_override_match     \.jpe?g$     min=1440
      refresh_override_match     \.css$     max=60
Main  differences in usage:
1. The overrides would always apply, regardless of how the expiration  
time was arrived at - whether by explicit headers or last-modified  
algorithm heuristics. Currently the Min, Max and Percent settings  
only apply in different specific circumstances, e.g. Max and Percent  
only apply to L-M requests, Min only applies in the absence of L-M,  
Expires and CC max-age.
2. The refresh_override_default would always apply (although its  
options may be overridden by those of a refresh_override_match).  
Currently the default refresh_pattern only applies if no patterns  
match the request, meaning you can't ever override default behaviour,  
you can only fall back to it.
3. There is no way of setting the Last-Modified factor percentage by  
regex! This is perhaps a big problem, and it could be added as an  
option. But then it would be the only non-HTTP-violating directive  
possible in the option... and so would spoil it slightly.
4. No need for global counterparts of refresh_pattern directives,  
e.g. refresh_all_ims and reload_into_ims.
5. Frequently used override options could be stated in the default  
instead of every subsequent line
This may be completely the wrong way of looking at it, or it may be  
just going too far. A smaller, but still helpful, step might be to  
introduce a refresh_pattern_default whose values would be inherited  
by any subsequent refresh_pattern match.
Any help or input into this would be very welcome indeed
Doug
On 1 Jun 2006, at 20:06, Doug Dixon wrote:
> Hi
>
> I'm fixing bug 1202 (it's a simple fix) and am cleaning up  
> refresh.cc at the same time.
>
> I'd like to review the various refresh_pattern options, as some of  
> them are mutually exclusive in practice (although you can configure  
> all of them) and it's not clear from the documentation what they  
> all mean. They're quite hard to understand and use correctly.
>
>
> 1. reload-into-ims
>
> The following is legal:
>
> refresh_pattern     html$       5     20%     60      ignore-reload  
> reload-into-ims
>
> but reload-into-ims will not have any effect. You could argue that  
> this is obvious, but I think it should be caught at parse time.
>
> 2. As an aside - but I want to mention it here - we need to make it  
> clearer that if an object does specify an expiry time, the Min,  
> Percent and Max values in refresh_pattern will be completely  
> ignored, but the options won't be. I'll change cf.data.pre accordingly
>
> 3. override-expire
>
> 		override-expire enforces min age even if the server
> 		sent a Expires: header. Doing this VIOLATES the HTTP
> 		standard.  Enabling this feature could make you liable
> 		for problems which it causes.
>
> If you do want to modify the behaviour of blindly obeying the  
> server's explicit expiry time, you can - to an extent.
>
> The override-expire option enforces the Min time in cache, even if  
> the origin stated it should expire before then.
> But it ignores the Max time (surprising!), and the L-M factor (more  
> expected - not obvious what this would do anyway)
>
> It's not very intuitive. I think we should probably make this  
> option enforce the Max time as well. Possibly even ignore the  
> explicit expiry of the object altogether and fall back to last- 
> modified factor??
>
> It could be a naming thing... override-expire doesn't really say  
> what it does. enforce-min might be better. But then you've already  
> stated a min and might expect it to be already enforced.
>
> 4. override-lastmod
>
> 		override-lastmod enforces min age even on objects
> 		that were modified recently.
>
> The Min time isn't enforced even when the last-modified factor  
> algorithm does kick in. If the object was only just modified and  
> the L-M factor algorithm results in a figure lower than the Min, it  
> will be considered fresh for less than the configured Min.
>
> This isn't what I would expect. I know that the override-lastmod  
> exists to let you do this, but it's really non-intuitive. I think  
> the Min should always be enforced if we're using L-M factor  
> algorithm, and that we should therefore lose the override-lastmod  
> option. Can't see the point in the default (null) behaviour of Min  
> otherwise.
>
>
> Thoughts?
>
> Doug
>
Received on Thu Jun 15 2006 - 05:33:13 MDT
This archive was generated by hypermail pre-2.1.9 : Fri Jun 30 2006 - 12:00:02 MDT