On 27 Apr 99, at 10:41, Kevin Littlejohn <darius@connect.com.au> wrote:
> There's a better way to measure this stuff - on Linux, strace will give you
> information about how long system calls are taking. A '-c' flag will add them
> up, you can use '-p' to attach to an existing process - leave it running for,
> like, 30 seconds or so, then <ctrl>-c it, it'll detach and spit out times
> for the main system calls. Very useful.
right, its the fastest way to get a clue.
> And I agree, by the way, if you're not using async-io, your open calls will
> be killing you.
Yes, but normally open call (if it is the bottleneck) would make disks busy.
Async-io could mask the real problem here.
> >>> Gideon Glass wrote
> >
> > Hmm. Thanks for your words. The target filesystem in this case is EXT2,
> > and there are six cache-disks. The entire disk-subsystem (including
> > cache) never exceeds 400-600 blocks of throughput per second (about half
> > in, and half out, at peak)...Physically, the disks spend most of their
> > operational time each second idle.
It seems to me that there is something deeper and more serious. Any modern
disks subsystem should handle over 1000 blocks/sec without any problems.
But you say that disks are apparently idle...
System call times could go up, but not that much that requests would be
completed in 50 secs instead of few.
Remember that by using 6 disks for squid as separate swapdisks you impose
6 times the load on metadata buffer cache. Try your tests with single spindle.
BTW, have you checked your hardware, especially scsi controller and drives?
Could it be that OS is missing interrupts from disk subsystem or disks are
failing to communicate every time, tieing scsi bus for some time and stalling
other disks? Do you have all 6 disks on one scsi bus? They usually suggest
not putting more than 3-5 disks on single bus for disk intensive loads.
> > Whatever is causing the problem is throttling us to the tune of maybe 40
> > requests/second (or more). Despite not being (apparently) disk bound,
> > I'm tempted to try async-io and see what happens.
> >
> > If you're not using async I/O, try this for kicks. Put gettimeofday()
> > calls around open/close/read/write/unlink calls on disk objects and
> > record the latencies of these calls. I suspect you'll find that open
> > dominates the others.
----------------------------------------------------------------------
Andres Kroonmaa mail: andre@online.ee
Senior Network Engineer
Organization: MicroLink Online Tel: 6308 909
Tallinn, Sakala 19 Pho: +372 6308 909
Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------
Received on Tue Jul 29 2003 - 13:15:57 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:06 MST