Hi
Dancer: I know you don't like kernel 2.2, but give it a try. At least
then you can narrow things down, ok?
> Hmm. Thanks for your words. The target filesystem in this case is EXT2,
> and there are six cache-disks. The entire disk-subsystem (including
> cache) never exceeds 400-600 blocks of throughput per second (about half
> in, and half out, at peak)...Physically, the disks spend most of their
> operational time each second idle.
>
> Whatever is causing the problem is throttling us to the tune of maybe 40
> requests/second (or more). Despite not being (apparently) disk bound,
> I'm tempted to try async-io and see what happens.
Try it... trust me.. async-io is a major performance win.
> If you're not using async I/O, try this for kicks. Put gettimeofday()
> calls around open/close/read/write/unlink calls on disk objects and
> record the latencies of these calls. I suspect you'll find that open
> dominates the others.
Michael O'Reilly (remember him? :) gave me a script once which used
'strace -T' to give reasonable "per function call" stats. It prints
out the following values from an 'strace -o" file:
System call Count Total time Time per call % total
Here it is: it's called "elapcount.pl":
#!/usr/bin/perl
# Written by Michael O'Reilly: (C) him
while (<>) {
/^([a-z0-9]*)\(.*<([^>]*)>$/ or (print($_), next);
$diff = $2; $syscall = $1;
$map{$syscall} += $diff;
$count{$syscall} ++;
$total += $diff;
}
foreach $i (sort { $map{$a} <=> $map{$b} } keys %map) {
printf "%20s %5d %10.4f %10.6f %2.5f\n",
$i, $count{$i}, $map{$i}, $map{$i} / $count{$i},
$map{$i} / $total * 100;
}
print "Total time elapsed: $total\n";
-------------
To use it, run "strace -o /tmp/strace-out -T", wait a few minutes (or
seconds, if you don't want to slow things significantly) and then
run "./elapcount /tmp/strace-out"
This doesn't have any code hacks, and ensures that you find _all_ the
functions that are giving you hassles.
Note: "strace -c" gives you the _cpu time_ that a function call
takes. If the function call is waiting for disks to seek (which won't
show up in your vmstat output), you end up measuring nothing... make
sure that you use "-T". When I first fiddled with this my opens and
closes cost next to nothing... in fact, Squid was waiting for them 85%
of the time.
If, however, strace -c gives you _big_ values for open, then it's
probably spending it's time searching memory, not waiting for disks.
Surely this time would be spent in kernel mode according to "vmstat",
though? It looks like not a lot of it is.
Other alternative (this one seems less likely than a "linear search in
kernel space of some field"):
In Michael's words:
-----
More spindles won't do it. Because the squid you're running is a
single process, you're guarenteed that only one disk request can be
outstanding at a time. So if you have N spindles, N-1 will be idle at
any one time.
-----
So: you are only hitting one disk at a time (the kernel can't say "ok,
it's open", wait for the open to complete and then go "oh, sorry:
there are no more inodes on that drive".) If your kernel is waiting
for that disk to seek (from where it last operated on something), you
will still get slow response from the disks. Let's say that your disks
are sitting in the middle of the drive after every write. If you end
up seeking about on a different disk all the time (to balance the
spread of objects, or if you are using raid) even if your seek
time is a few milliseconds you will end up waiting for a long time
for disk-io, with little significant disk-io behind each seek.
I think that there is an iostat-type thing somewhere for linux. I
believe it only works on kernel 2.2, though. No idea where to find it,
but I think that it was written by Stephen Tweedie.
Oskar
Received on Tue Jul 29 2003 - 13:15:58 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST