Re: [squid-users] To RAID or not to RAID...

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Wed, 5 Jun 2002 00:14:03 +0200

In case of Squid the main benefit from RAID is redundancy, but due to
the I/O pattern only mirroring is suitable, and mirroring is fairly
expensive compared to the alternatives.

Striping has little or possibly even negative impact on Squid due to
the I/O pattern of Squid, and Squid is quite good at distributing the
load on all drives in the first place.

striping works best for applications needing to perform large amounts
of sequential I/O.

Squid performs large amounts of random I/O in parallel, usually only a
few KB at a time..

From a recovery point of view it is also much better to have one
cache_dir per drive than to have a single huge cache_dir.

  a) Restarts will be much quicker as the drives can be rebuilt in
parallell.

  b) If a cache drive fails due to hardware or software failure, you
only loose the cache content of that drive, not your whole cache.

  c) If you ever need to fsck your cache drives, all can be fsck:ed in
parallell which is considerably quicker than to fsck a large striped
filesystem.

  d) Enlarging/shrinking your cache is simply a matter of
adding/removing a cache_dir. If you stripe your drives you are locked
into the exact hardware configuration used when creating the stripe.

Mirroring of the cache drives can provide a higher level of
redundancy, but is quite expenside both in terms of hardware and
speed (you will end up with about 3/5 of your seek capacity). A
similar level of redundancy can be acheived by using some simple
system monitoring tools to detect the hardware error and removing the
failed drive from squid.conf if a failure is detected (and also alert
the administrator).

When considering the MTBF of modern drives and the value of the cached
data mirroring becomes even less attractive in my opinion..

Regards
Henrik

On Tuesday 04 June 2002 22:49, Winston Gutkowski wrote:
> At the risk of flogging a dead horse, I'd like to add my 2p to this
> discussion. No matter what implementation of RAID you are using, it
> does 1 of 3 things:
>
> 1. Provides for recovery in the event of a hardware failure by
> storing data redundantly; either (a) mirroring it completely (RAID
> 1), or (b) storing composite information in a parity area (RAID 4
> or 5).
> 2. Speeds up I/O operations on filesystems by spreading them across
> multiple physical drives ("Striping" or RAID 0).
> 3. Both of the above (RAID 0+1).
>
> Problems usually arise when you have more than one system trying to
> do the same thing, because they end up conflicting or competing
> with each other. This most often occurs in the area of striping
> because lots of software can also be configured for load-spreading
> to improve I/O performance. My advice to anyone (not just squid
> bods) would be to decide which product you want to do the task: if
> squid, set up multiple cache_dir's but don't stripe; if RAID, set
> up striping with a single cache_dir on a striped volume.
>
> If you are also worried about recovery then your decision may also
> be directed by the need to set up redundant datasets. The Linux
> native software RAID, for example, does not support RAID 0+1, so
> you would be obliged to use RAID for the redundancy and squid for
> the load-balancing. If performance is an issue, RAID 1 is MUCH
> faster than RAID 5, often outperforming a single disk.
>
> In general, configuring a specific product such as squid often has
> performance benefits over something like RAID; but these may be
> minimal, and striping is a solution which can provide benefit for
> any software on your system.
>
> Winston Gutkowski
Received on Tue Jun 04 2002 - 16:35:35 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:08:26 MST