Re: squid-3 update from Andres Kroonmaa on 2002-10-29 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 29 Oct 2002 09:14:10 +0200

On 28 Oct 2002, at 2:20, Adrian Chadd <adrian@squid-cache.org> wrote:

> The interesting point is the call graphs and the per-function CPU
> utilisation - there's .. well, lots of them. In what could be said as
> the understatement of the year on this list - I believe that the current
> squid codebase (squid-2 _and_ squid-3) does, well, far too much.
>
> The commercial cache vendors out there will snigger. :)
>
> Perhaps we're doing the same thing multiple times on a request
> (like parsing). Perhaps we could look at removing some of the
> steps (eg, using an async version of writev() to write out headers
> instead of packing them into a buffer first).
>
> I have the call graphs available from gprof if anyone is interested.

gprof? isn't it showing you complete beans? thats the whole reason
I wrote profiling branch.

> I'd then like to propose that, instead of throwing in any features,
> we take one good, hard, intense look at the codebase and decide
> what we can do to improve efficiency. I think the two suggestions
> above are a good start. We'll also stop thrashing the poor CPU
> bus if we try to limit the copies that we do.
> (On this note, does anyone know of any way to extract how far we're
> pushing the CPU FSB under FreeBSD/Linux?)

After quick search i found this:
http://www.cs.cornell.edu/courses/cs612/2002sp/assignments/ps1/pmc/pmc.html
You need some soft to play with Pentium performance monitoring counters.
needs kernel support.

Not very intuitive. On Solaris I have such util bundled, and I
don't know how to make sense of what I get. My 2xcpu box with 100%
utilisation shows about 2% or 10% of memory utilised, depending on
view.

eg. 133M fsb and 2x800M P3's
cpustat -c pic0=bus_drdy_clocks,pic1=data_mem_refs,noedge -t 1 | cpcProc
   time cpu event tsc pic0 pic1 p0/p1 % p0/tsc % p1/tsc %
  1.007 0 tick 797363927 1536268 60013808 2.56 % 0.19 % 7.53 %
  1.009 1 tick 802545686 2034812 84851972 2.40 % 0.25 % 10.57 %
  2.007 1 tick 794021456 2384058 94134518 2.53 % 0.30 % 11.86 %
  2.007 0 tick 796039904 1355098 78829770 1.72 % 0.17 % 9.90 %
  3.007 1 tick 795907880 2323467 85958727 2.70 % 0.29 % 10.80 %
  3.007 0 tick 795753418 1396527 62043236 2.25 % 0.18 % 7.80 %
  4.007 1 tick 795956600 2565066 103873431 2.47 % 0.32 % 13.05 %
  4.007 0 tick 795868550 1326913 63386159 2.09 % 0.17 % 7.96 %

I'm not too clueful about P3 counters, what I see is confusing to me.
133M/800M=16%, thus max mem throughput is 16%. My box shares fsb with
2 cpus, thus 16% is max for both summed. bus_drdy_clocks and data_mem_refs
events seemed most suitable, but numbers output don't make much sense.

cpustat -c pic0=bus_req_outstanding,pic1=data_mem_refs,noedge -t 1 | cpcProc
   time cpu event tsc pic0 pic1 p0/p1 % p0/tsc % p1/tsc %
  1.007 1 tick 797165511 195300365 67600653 288.90 % 24.50 % 8.48 %
  1.008 0 tick 797479456 161301828 61597426 261.86 % 20.23 % 7.72 %
  5.007 0 tick 739712417 154360624 44934261 343.53 % 20.87 % 6.07 %
  5.011 1 tick 797789328 209513569 89151843 235.01 % 26.26 % 11.17 %
  6.007 0 tick 795635920 188706220 56499275 334.00 % 23.72 % 7.10 %
  6.008 1 tick 793670400 243688292 90985430 267.83 % 30.70 % 11.46 %
  7.008 0 tick 796595318 186260144 74984972 248.40 % 23.38 % 9.41 %
  7.008 1 tick 795710128 197228017 80779923 244.15 % 24.79 % 10.15 %
  8.007 1 tick 795219784 200534262 96565851 207.67 % 25.22 % 12.14 %
  8.007 0 tick 795226594 184785299 46685458 395.81 % 23.24 % 5.87 %

I'd speculate that above says: there are 2-3 fsb requests outstanding per
data_mem_refs (p0/p1), which is roughly 25% of cputime for each cpu, and
that there is roughly 16-20% cputime spent on all data references, both
cache hits and misses.

chuckle, this is so funny. There is this huge array of counters in P3, and
every time I try to use them, I find them useless..
bus_req is all fsb reqs, including code fetches and io/dma wait time,
data_mem_refs is only data, and not fsb. neither says how much time fsb
is actually busy.
There is also this funny counter: counter of ticks while cpu is NOT halted.
its at about 50% on my box. So, its either that we are waiting for io, or
cpu is stalled for whatever other reasons.

------------------------------------
Andres Kroonmaa <andre@online.ee>
CTO, Microlink Online
Tel: 6501 731, Fax: 6501 725
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Tue Oct 29 2002 - 00:23:01 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:17:01 MST