[PATCH] icap_oldest_service_failure option

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 18 Feb 2010 15:28:13 -0700

Added icap_oldest_service_failure option to forget old ICAP errors.

A busy or remote ICAP server may produce a steady but shallow stream of
errors. Any ICAP server may become nearly unusable in a short period of
time, producing a burst of errors. To avoid disabling a generally usable
service, it is important to distinguish these two cases. Just counting
the number of errors and suspending the service after
icap_service_failure_limit is reached often either suspends the service
in both cases or never suspends it at all, depending on the option
value.

One way to distinguish a large burst of errors from a steady but shallow
error stream is to forget about old errors. The added
icap_oldest_service_failure option instructs Squid to ignore errors that
are "too old" to be counted as a part of a burst.

Another way to look at this feature is to say that the combination of
the old icap_service_failure_limit and the new
icap_oldest_service_failure limits the ICAP error _rate_. For example,
   # suspend service usage after 10 failures in 5 seconds:
   icap_service_failure_limit 10
   icap_oldest_service_failure 5 seconds

Squid does not remember every transaction error that occurred within the
allowed "oldest error" time period. That would be result in a precise
but too expensive implementation, especially during error bursts on a
busy server. Instead, Squid divides the period in ten slots, counts the
number of errors that occurred in each slot, and forget the oldest
slot(s) as needed. Thus, the algorithm has about 90% precision as far as
timing of the failures is concerned. That 90% precision ought to be good
enough for any deployment.

The patch is for Squid v3.1+ but we will port to trunk if approved.

Received on Thu Feb 18 2010 - 22:28:29 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 20 2010 - 12:00:08 MST