Re: [squid-users] squid3.0.25 hoggeing the CPU, serving little from Amos Jeffries on 2010-08-11 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 12 Aug 2010 01:15:28 +1200

Ralf Hildebrandt wrote:
> * Amos Jeffries <squid3_at_treenet.co.nz>:
>> Ralf Hildebrandt wrote:
>>> 3.0.STABLE25 is showing the following behaviour during normal operation:
>> Hi Ralf,
>> Thank you for all this, but I'm wondering why you are putting so
>> much work into 3.0?
>
> 3.1 sucks even more? See my other bug reports! That stuff is crashing
> all over the place. Need to get some stability here :)

>
>> I ask because the major performance gains are aimed at 3.2. It could
>> do with this type of analysis as part of the polish up.
>
> I COULD run 3.2 if you like.

>
>> Okay, add to that a sudden extreme loss of known clients. And a
>> sudden 'instant' drop in memory usage before the growth.
>>
>> This looks to me like the usual culprit:
>> A Squid crash followed by dirty rebuild of a large caches' index.
>
> Could be
>
>> The behaviour in such a situation is complete non-response on the
>> ports for a short period (extreme service times for existing clients,
>> they simply get no further traffic and time out).
>
> Yes, but that should only be a problem for the clients.
>
>> Followed by a period of heavy reads as the entire cache_dir get
>> scanned file-by-file for meta data to build the index. Some of which
>> will fail as the un-closed files from previous instance are found.
>> Accompanied by heavy writes as the swap.state journal gets rebuilt
>> from each of those meta-data reads.
>>
>> Under heavy client load this extra disk IO can lead to delays
>> processing other actions and slower new client service times.
>
> OK, but for such a long time?

Yes. Depending on the cache size. Some people have reported it taking
IIRC a dozen minutes or more for GB+ caches.

If I'm reading those graph scales right your time scale was ~20 minutes
before the server maxed out to overload?

>
>> Potentially a huge backlog of buffered in-transit data waiting to be
>> stored in the cache. Which can't be written to until the index is
>> loaded properly.
>>
>> This latter can be alleviated by a sufficiently large in-memory
>> cache, though older versions did not permit that space to be used
>> until after the rebuild either.
>>
>>> proxy 10426 88.5 17.3 380636 357040 ? R 10:42 110:54 /usr/sbin/squid3 -NsYC
>>>
>>> % strace -c -p 10426
>> Over how long a time was the strace taken? just that 1.6 seconds or
>> something longer?
>
> That trace is from about 15 seconds.
>

Ah so 99% unknown operations inside Squid.

The 24K reads + 24K writes equates to (@4KB pages) roughly 6.25Mbps in
IO each way. Assuming that the network traffic takes up a portion up to
~1/2 thats still a lot of disk IO to sustain.

Does your cache.log show any indication of a crash leading up to all this?

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.6
   Beta testers wanted for 3.2.0.1

Received on Wed Aug 11 2010 - 13:15:36 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 11 2010 - 12:00:02 MDT