--MimeMultipartBoundary
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Seems like threads lib has gone nuts on context switching.
Perhaps your threads lib uses signals to unblock all threads at once, hoping
that they would lock up again soon, and they indeed do so, pretty fast,
causing lib to unblock all of them again, just to find next runner?
cond_timed_wait(), if I recall, may be interrupted by a signal or fork, and
perhaps it calls gettime() and blocks again.
From your trace, comm_select() has about zillion times returned from select()
with 0 files ready, EINTR? comm.c loves to call gettimeofday almost every
time it gets control.
> System call Count Total time Time per call % total
> gettimeofday 4254 20.0968 0.004724 100.00000
>Total time elapsed: 20.09676
Its about 200/sec, (hmm, pretty slow per call, isn't it?), so, if your
system clock ticks at 100/sec and threads lib does 2 gettimes() per spin,
you're definitely out of luck. Anyway, calling gettimeofday too often is
bad idea overall. Each call to it is unavoidable context switch.
(which might, but might not switch back to the same thread)
Some thread implementations hack around select() to schedule threads, this
might interfere somehow with squid (for eg. each return from select unblocks
all threads, which then look around and block again in select(), just burning
CPU, but I'm not sure here)
I just imagine, how would your box look like, when servicing about 1000
concurrent sessions ;)
In general, I think you use wrong thread lib, find a better one, or drop
the idea of async_io on your box.
> > > Now this squid is completely idle... any idea why it's going moggy about with
> > > the CPU? My machine has no idle CPU when I run it, so I guess things aren't
> > > wonderful..
> >
> > Check to see if it's the main thread burning CPU, or one of the child
> > threads...
>
> Each of the child threads..
>
> 3:24am up 23 days, 9:08h, 21 users, load average: 33.07, 16.04, 6.37
> 126 processes: 91 sleeping, 35 running, 0 zombie, 0 stopped
> CPU states: 26.6% user, 71.4% system, 0.0% nice, 3.7% idle
> Mem: 30312K av, 29632K used, 680K free, 14152K shrd, 968K buff
> Swap: 80604K av, 47784K used, 32820K free 4016K cached
>
> USER PID %CPU %MEM NI VSZ RSS SHRD TT STAT TIME COMMAND
> squid 28884 4.1 2.5 0 2660 768 400 q0 R 0:04 (squid)
> squid 28889 4.1 0.8 0 2100 248 152 q0 R 0:04 squid
...
...
> squid 28896 3.1 0.8 0 2100 248 152 q0 R 0:04 squid
> root 7855 2.8 35.7 0 32068 10832 668 ? R 519:20 X :0 -bpp 16
> squid@newt:/usr/local/squid/bin > time ./squid
> real 3m10.029s
> user 0m1.320s
> sys 0m3.080s
> huh?
AFAIK, many systems do not account for context-switch, page-fault, and some
other low-level (io?) times. So you end up looking at strange figures: CPU
is idle 4% of the time, sys+user times are way below 1% by wall-clock, and
system in general seems busy while actually doing pretty nothing.
----------------------------------------------------------------------
Andres Kroonmaa mail: andre@online.ee
Network Manager
Organization: MicroLink Online Tel: 6308 909
Tallinn, Sakala 19 Pho: +372 6308 909
Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------
--MimeMultipartBoundary--
Received on Tue Jul 29 2003 - 13:15:46 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:43 MST