>>> One should also consider the difference between
>>> simple RAID and extremely advanced RAID disk systems
>>> (i.e. EMC and other arrays).
>>> The external disk arrays like EMC with internal RAID5 are simply faster
>>> than a JBOD of internal disks.
>
> How many write-cycles does EMC use to backup data after one
> system-used write cycle?
> How may CPU cycles does EMC spend figuring out which disk the
> file-slice is located on, _after_ squid has already hashed the file
> location to figure out which disk the file is located on?
>
> Regardless of speed, unless you can provide a RAID system which has
> less than one hardware disk-io read/write per system disk-io
> read/write you hit these theoretical limits.
I can't quote disk cycle numbers, but I know that our fiber-connected HP
EVA8000's (with ginormous caches and LUNs spread over 72 spindles, even
at RAID5) are one hell of a lot faster than the local disks. The 2 Gbps
fiber connection is the limiting factor for most of our high-bandwidth
apps. In our shop, squid is pretty low bandwidth by comparison. We
normally hover around 100 req/sec with occasional peaks at 200 req/sec.
> But its not so much a problem of human-noticable absolute-time as a
> problem of underlying duplicated disk-io-cycles and
> processor-io-cycles and processor delays remains.
>
> For now the CPU half of the problem gets masked by the
> single-threadedness of squid (never though you'd see that being a
> major benefit eh?). If squid begins using all the CPU threads the OS
> will loose out on its spare CPU cycles on dual-core machines and RAID
> may become a noticable problem there.
Your arguments are valid for software RAID, but not for hardware RAID.
Most nicer systems have a dedicated disk controller with its own
processor that handles nothing but the onboard RAID. A fiber-connected
disk array is conceptually similar, but with more horsepower. The CPU
never has to worry about overhead in this case. Perhaps for these
scenarios, squid could use a config flag that tells it to put everything
on one "disk" (as it sees it) and not bother imposing any of its own
overhead for operations that will already be done by the array controller.
This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:05 MDT