On 7/02/2012 9:40 p.m., Henrik Nordström wrote:
> tis 2012-02-07 klockan 14:01 +1300 skrev Amos Jeffries:
>> We have a long history of questions and bugs mentioning negative
>> numbers in the byte hit ratio.
>>
>> I've always thought it was a bug we had not tracked down, but the FAQ
>> says it is correct.
>> http://wiki.squid-cache.org/SquidFaq/InnerWorkings#Why_do_I_see_negative_byte_hit_ratio.3F
> Yes.. it's based on the difference between traffic squid<-servers and
> clients<-squid. This can be negative (more traffic squid<-servers than
> clients<-squid) in some situations.
>
> - retried requests
> - range retreival being processed by Squid
> - continued download after client disconnects (quick_abort_...)
Wiki also mentions cache digests but ...
" /*
* This ugly hack is here to prevent the user from seeing a
* negative byte hit ratio. When we fetch a cache digest from
* a neighbor, it gets treated like a cache miss because the
* object is consumed internally. Thus, we subtract cache
* digest bytes out before calculating the byte hit ratio.
*/
cd = CountHist[0].cd.kbytes_recv.kb -
CountHist[minutes].cd.kbytes_recv.kb;
"
>> I've discussed this with a professional statistician I work with and
>> she agrees the algorithm is not calculating hit ratio as per our
>> definition of what a HIT is. What is does seem to be calculating is a
>> net traffic GAIN ratio.
> Yes.
>
>> What I propose is make the numbers reported as HIT ratios use the same
>> algorithm. The current request ratio one. And to add alongside this a
>> record for Gain/Loss Ratio as output by this byte calculation.
> Why is it interesting to calculate a nicer but very inaccurate number?
Which one is inaccurate?
"Hits as % of traffic sent" with calculation of (net traffic / client
bytes)
or
"Net traffic gain/loss" with calculation of (net traffic / client_bytes)
or
"Hits as % of client traffic" with calculation of ( sum_hits /
client_bytes )
One guess which one we have today ...
> To hide that the proxy cache may actually cause higher bandwidth usage
> than not having the proxy cache?
This is where the mistake rears its head. The excess server-side traffic
is not related to HITs, but to normal proxy behaviour. The HIT % of
client traffic may in fact be reducing that negative from some other
larger negative.
This is why I am more in favour of adding gain ratio alongside the hit
ratios or just changing the descriptive text. The negative is not lost
but explained.
Making HIT % use the same calculation as request ratio would mean adding
HIT traffic byte counters which don't exist now.
>
> I would argue that the request hit ratio calculation is the broken one
> from a statistical point of view.
The byte ratio calculation is simply that a byte ratio, no relevance to
HIT or MISS.
Traffic we classify as MISS is included in the divisor for the existing
byte algorithm.
If it were actually (client_traffic - server_traffic) / hit_bytes or
hit_bytes / (client_traffic - server_traffic) that would be an accurate
HIT bytes algorithm.
Instead we currently have (client_traffic - server_traffic) /
client_traffic which is the gain score for net traffic.
We get asked about "bandwidth gain" often, I think it would be useful to
have something in the report using the term "gain".
Amos
Received on Tue Feb 07 2012 - 12:00:41 MST
This archive was generated by hypermail 2.2.0 : Tue Feb 07 2012 - 12:00:10 MST