I was following the performance thread with a lot of interest. Can
someone with enough experience tell me if the following nuggets I've
gleaned from it are roughly correct?
1) A properly configured Pentium II-based *BSD box (single or
dual-processor, 350+MHz CPU) running Squid with moderate tuning can
reasonably be expected to put out 10 Mbit/s or 100 requests/sec
total "down-stream" traffic. Over 12 Mbit/s or 120 requests/sec is
probably not sustainable on a single Squid server, regardless of
processor speed, RAM, or disk speed, so for over 10Mbit/s one should
start clustering multiple servers.
2) The Squid box should be running no other applications, except its
own caching nameserver to reduce DNS lookup overhead, and the
minimum required for system maintenance. (Sshd or telnetd, cron,
etc.)
3) Up to a certain (unknown?) point of diminishing returns, maximizing
the available real cache RAM for Squid will be the most effective
performance increase. The cache RAM setting should be tuned less
than 1/3 * (real RAM - (OS and other RAM needs)) - e.g. for a 512MB
RAM machine, the Squid cache parameter should be set no more than
around 160MB. (160*3 = 480MB, leaving 32MB for the OS, etc.)
4) Correct use of any available OS file system tuning options is going
to be the next most important factor in maximizing throughput. This
would include enabling "softupdates" or any similar fast file system
option available under your OS, setting the noatime/noaccesstime
option on the file systems used for the cache spool, setting an
"optimize for time" parameter in tunefs if available, and increasing
the number of directory inodes cached in RAM.
5) A related factor is the performance gain from spreading out file
access and seek times across multiple disk spindles, i.e. spreading
the cache across many drives, maybe even across multiple SCSI
controllers. In other words, achieved cache performance will be
significantly greater (maybe nearly doubled) with 6 9Gb drives, vs.
3 18Gb drives.
6) Finally, peak performance with a standard UNIX file system
counter-intuitively requires the drives to be kept permanently
partly empty. Given a certain size of drive or file system, the
system will actually perform better, if Squid is told to use a
maximum of 50% of that space rather than 80-90%, because the
decreased load time from the half-empty file system will greatly
outweigh the slightly increased number of hits on the fuller file
system. (Optimum percent full = unknown?)
Have I got this right? In general the performance issues sound very
similar to the issues in tuning INN news servers for maximum
throughput.
This brings up a few additional "tweak" questions:
Rules of thumb for directory hashing:
If you're dedicating a series of 9Gb drives for cache, how many
top-level directories should each be broken into? Is it better to
just go for 256 each, to minimize the size of the leaf directories,
or is some smaller number optimal? Is there any advantage (as there
are in some hashing schemes) to using a prime number of hash buckets
(directories) or to using/avoiding powers of two?
Performance drop-off with using RAID 5 vs. standalone disks:
Depending on the access patterns of particular applications, using a
RAID system can lead to anything from a sharp increase in performance
(due to the striping spreading sectors across drives), to a slight
fall-off, to a sharp decline. However, the big benefit IMHO is that
a failed disk can't take the SCSI bus down and hence can't take down
the server. (The protection against loss of data obviously isn't
very important for caching.) Normally with apps like Squid which do
their own hashing to distribute workload across disks, the result is
some decline in performance along with the increase in cost.
If it's a slight performance decline, I'll take that in exchange for
the reliability - as I do on our main news server - but if it's a
huge decline, it might be cheaper to get the reliability by deploying
multiple Squid servers with non-RAID disk systems. Anyone have any
perspective on this?
-- Clifton
-- Clifton Royston -- LavaNet Systems Architect -- cliftonr@lava.net "An absolute monarch would be absolutely wise and good. But no man is strong enough to have no interest. Therefore the best king would be Pure Chance. It is Pure Chance that rules the Universe; therefore, and only therefore, life is good." - ACReceived on Mon Jun 21 1999 - 16:26:10 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:46:57 MST