Michael.Schroepl@telekurs.de wrote:
> But mod_gzip is doing the decisions process based on informations
> that Squid cannot ever have a clue about.
> Several of these are no HTTP headers at all, but Apache internal
> informations, or they are HTTP response headers, not request
> headers (Content-Length, Content-Type, ...).
Content-Length, Content-Type etc is things Squid does not at all need to care 
about in this context. What Squid needs to care about is how mod_gzip 
responds to different requests for the exact same URL. If the object changes 
obviously new rules might apply and is besides the purpose of Vary (for such 
changes Expires and Cache-Control: max-age= is the proper mechanisms for 
controlling caching).
What you need to care about is the rules for THIS object content for a 
specific URL based on the request headers or other external input. Any static 
rules based on the actual response object does not need to be mentioned, 
neither do you need to mention "random" rules depending on internal server 
state independent of the user unless you really want to (see below). A 
threshold rule telling that all responses above a certain size may be 
compresed is a typical static rule that does not need to be mentioned. For 
the same object the rule will always trigger in the same manner.
If your server have dynamic rules that might give different responses for the 
exact same request and URL with no changes in the content then you should 
include a "Vary: *" header to indicate that special content negotiation rules 
apply that cannot be expressed in terms of HTTP and that the server must 
therefore always be queried on which response entity is the correct one for 
this user. I don't think this really applies to mod_gzip. In such case you 
really SHOULD support ETag and If-None-Match or else caching in shared caches 
is kind of pointless as the cached content then never can be reused..
The minimum requirement of Vary is to include information expressing to caches 
who might receive this kind of reply. For mod_gzip the minimal requirement is 
that compressed content may never be sent to user-agents not supporting 
comression, and this can easily be expressed in terms of Vary. (see below)
> mod_gzip will only serve one of two possible formats: compressed
> and uncompressed.
> Data will never be compressed if the client didn't send "Accept-
> Encoding: gzip"; but there may be _many_ cases when the client
> asks for compressed data and will still get uncompressed content.
>
> Am I right to think that "Vary: Accept-Encoding" for the compres-
> sed content and no "Vary:" header at all will be the best choice
> in this case? This is what the two published patches are doing.
I would suggest:
Alternative 1:  (default)
"Vary: Accept-Encoding" if the reply is such that it might be compressed.
"Vary: Accept-Encoding, User-Agent" if you also want to use the User-Agent 
header to determine if compression might be applied.  (optional, default to 
uncompressed if not enabled and no Accept-Encoding)
This applies to both compressed and uncompressed replies. If the reply is such 
that mod_gzip might compress the reply for certain browsers/users then you 
should include a Vary header.
If the reply is such that mod_gzip would never compress the reply no matter 
who requested it then no Vary header should be included. Likevise if the 
configuration is such that mod_gzip would always compress the reply no matter 
who requested it.
Alternative 2:  (optional, not the default configuration)
"Vary: Accept-Encoding" on any compressed replies, and no Vary: header on 
uncompressed replies.
Alternative 1 is the "correct" one, telling caches exactly what to do and 
provides optimal hit ratio if the HTTP server and cache is capable of ETag 
and If-None-Match..
Alternative 2 is a best effort tradeoff for caches knowing about Vary but not 
capable of making use of it. In this alternative such caches will hopefully 
not cache any compressed results, but still cache uncompressed replies that 
might be shared by all users. Once a uncompressed reply has been seen by the 
cache this will be sent in response to future request (until expired).
> This is the reason why there should be a discussion which Squid
> version would like which mod_gzip/Apache behaviour most.
Isn't that what we are having right now?
Squid does not like to give out incorrect data to it's users, and therefore 
wants servers to mention via the Vary header whenever there is server side 
content negotiation taking place. This applies to all Squid versions.
Giving out incorrect data is much worse than not being able to cache If a 
cache administrator gets irritated on not being able to cache then the 
correct point of approach is Squid, not mod_gzip bending the HTTP rules. You 
are welcome to redirect any mod_gzip flames caused by Squid not caching Vary 
objects to me <hno@squid-cache.org> if you like.
> There have been examples in the past where "mod_gzip_item_exclude
> reqheader" has been used to detect proxy servers that are known
> to unconditionally store compressed content ...
Squid DOES NOT unconditionally cache compressed content unless told so and has 
never done (not in the Squid-2.X series anyhow.. i.e. during the last 4 
years). Neither does it compress/uncompress any content-encodings (disallowed 
for proxies by RFC2616). Squid-2.4 and earlier unconditionally does NOT cache 
content having a Vary header like a server negotiated compression SHOULD 
have.
> Or resulting in the mod_gzip configuration adding the proxy to a
> non-compression blacklist, denying compressing for all requests
> coming from this direction - if the proxy tells who it is.
Sorry, I do not see the point here in this discussion. Squid is doing the best 
it can. mod_gzip has intentionally selected to tell Squid and other caches to 
do wrongly, why should then mod_gzip users blacklist Squid and other caches 
rather than tell them correct information?
> Another possibility that I experienced myself: A proxy that is
> filtering out "Accept-Encoding" headers from forwarded requests,
> as to be sure it may cache each and every response.
> This one even must be a Squid 2.4, if I read my HTTP header
> traces correctly ...
Probably an administrator who have enabled a bit too agressive request 
anonymization, selecting to not reveal to your server what kind of browser 
the user is using. Defenitely not done in a default configuration.
What you should do in such case is to fall back onto the failsafe approach and 
send back a uncompressed reply. Do not build "whitelists" of browsers known 
to send Accept-Encoding: gzip, for such browsers you should use 
Accept-Encoding exclusively. You do not know why the "Accept-Encoding" header 
has been excluded.
Note: Squid-2.X always sends a Via: header in the request unless intentionally 
disabled by the cache administrator for privacy reasons. Not that I 
presonally think this is something you should make use of in mod_gzip, but 
you asked..
Regards
Henrik
Received on Mon Aug 26 2002 - 13:10:00 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:13 MST