Re: Sudden unexplained crash, strange behavior from Niall Doherty on 1998-07-28 (squid-users)

From: Niall Doherty <ndoherty@dont-contact.us>
Date: Tue, 28 Jul 1998 17:16:50 +0100

Hi,

(no help - just letting you know you're not alone :-)

Michael Pelletier wrote:
>
> I recently upgraded to 1.1.22 on my BSD/OS 3.1 system, and after about
> 12 days of uptime, it died with the following error:
>
> 1998/07/28 09:16:02| www.koelner-dom.de(195.14.230.74) marked bad
> 1998/07/28 09:16:02| ERR_CONNECT_FAIL: http://www.koelner-dom.de/
> FATAL: Received Segment Violation...dying.
> 1998/07/28 09:16:25| storeWriteCleanLog: Starting...

I've been running Squid for over 1 year now - up until last Friday
I had never ever ever seen it crash. Then on Friday 1.1.21 (which
I've been running since it was available) crashed twice - with
the same Segment Violation message. It wrote out its log file
though each time. I don't have it restarted automatically coz
I'd prefer to check out what went wrong (I never though I'd have
too :-) All our users use a proxy.pac or MissionControl setup so
everyone is switched to another server automatically - no downtime
in other words.

I couldn't see anything strange at all. I ran truss on it for a
while but it hasn't crashed since *puzzle* I have plenty of CPU
cycles to spare - lots of RAM spare and plenty of swap - no resource
was missing...

Maybe they're related (even though mine is 1.1.21 on Solaris 2.5.1 ?)

I also got no core dumps :-(

So - if you find out anything useful let me know !

Cheers,
Niall

> This is a switch from the previous crashes -- usually it would just
> run into the maxdsiz limit and die due to an xmalloc() failure.
> Turning off memory pools seems to have helped with that particular
> problem.
>
> It spent the next several minutes writing the log, but stopped at
> 192512 lines out of 276459, but continued to chew CPU time for another
> half hour -- it seemed to still be handling requests - there was only
> a couple of calls from the users, though nothing was appearing in
> access.log...
>
> 1998/07/28 09:16:51| 192512 lines written so far.
>
> Half an hour later, I sent it a TERM signal, and it said this:
>
> 1998/07/28 09:16:51| Preparing for shutdown after 825909 connections
> 1998/07/28 09:16:51| Waiting 30 seconds for active connections to finish
>
> And then a couple of minutes after this, I sent it *another* TERM
> signal, and it said this:
>
> 1998/07/28 09:48:39| Pinger exiting.
>
> Then RunCache kicked in and started another instance:
>
> 1998/07/28 09:48:49| Starting Squid Cache version 1.1.22 for i386-pc-bsdi3.1...
> 1998/07/28 09:48:49| With 13196 file descriptors available
> 1998/07/28 09:48:49| Performing DNS Tests...
> 1998/07/28 09:48:49| Successful DNS name lookup tests...
> 1998/07/28 09:48:49| Started 3 'dnsserver' processes
>
> and it proceeded to start up normally. There doesn't seem to be a
> core file anywhere - I looked in /usr/local/squid/*, my cache
> directory, and the log directory.
>
> Any ideas why it might have crashed? Any ideas why it didn't complete
> the log write correctly (there was plenty of disk space)? Thanks for
> any suggestions you might have.
>
> -Mike Pelletier.

-- 
Niall Doherty          | mailto:ndoherty@eei.ericsson.se
Systems Engineer       | http://www.ericsson.ie
                       |
Voice: +353 1 207 7506 | Ericsson Systems Expertise Ltd.,
Fax:   +353 1 207 7115 | Beech Hill, Clonskeagh, Dublin 4, Ireland.
----I-N-T-E-R-N-A-L----|-------------I-N-T-E-R-N-A-L-------------------
                       | Home Page: http://www.eei.ericsson.se/~eeindy/
ECN: 830 7506          | Cache pgs: http://admin.eei.ericsson.se/Cache/

Received on Tue Jul 28 1998 - 09:17:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:41:17 MST