> I've been experimenting more with ICP and carp. I've noticed that if I
> have two squid machines configured like so:
>
> squid1.test.rob:
>
> icp_access allow all
> icp_port 3130
> http_port 80 accel vhost vport
> cache_peer squid2.test.rob parent 80 3130 proxy-only
> cache_peer freebsd1.test.rob parent 80 0 originserver
>
> squid2.test.rob:
>
> icp_access allow all
> icp_port 3130
> http_port 80 accel vhost vport
> cache_peer freebsd1.test.rob parent 80 0 originserver
>
> a request to squid1 via a load balancer (lynx --head --dump
> http://freebsd1:81/index3.html) when squid2 has index3.html in cache
> and squid1 does not results in a
>
> X-Cache: HIT from squid2.test.rob
> X-Cache-Lookup: HIT from squid2.test.rob:80
> X-Cache: MISS from squid1.test.rob
> X-Cache-Lookup: MISS from squid1.test.rob:80
>
> That is what I would expect from a properly working array using ICP.
> My question is if I want to distribute the load across an array of
> reverse proxies and I want to use a cache peer protocol (ICP, CARP,
> HTCP, Cache Digest, whatever), do I need all requests to my array /
> mesh to come to a 'master' squid acting as a router/load balancer? Or
> do I put a load balancer in front of the array and distribute http
> requests randomly to the squids in the array?
When setup well you should be able to ask any squid in the array and get a
fast result.
>
> The problem I'm having is if I put a load balancer in front of my
> squid array then each squid must be aware of all other peers in order
> for the squid array to act as a large cache. This results in
> forwarding loops and other problems when using ICP and carp that I've
> been unable to get around so far. But, if I don't put a load balancer
> in front of the array how do I efficiently distribute the load to the
> squids? If all requests come into one 'master' squid then wouldn't
> that squid simply cache everything from the origin server itself?
The 'proxy-only' option setting indicates nothing retrieved from the peer
flagged as such is ever to be stored locally.
The most common setup we've seen here where a master squid is load
balancing, is to have that squid set with 'proxy-only' on all its peer
lines and some form of ICP/carp/round-robin/etc. to select a peer as
source. Usually when done in ISP as an accelerator matrix it also needs
source hashing to get around websites who check that all object requests
come from the same place.
>
> It seems for a cache array to work properly when all squids in the
> array represent content from a single origin server that a load
> balancer must be used to distribute the requests (and therefore the
> cached objects) across the machines in the array. Is my thinking
> correct?
If your definition includes DNS round-robin as a 'load balancer' maybe.
the simplest squid hierarchy is a set of non-linked squid all going back
to a master server. With a single FQDN point at many A/AAAA records, one
of each squid.
The peering algorithms are just a layer of service above that, so peers
can check for a 'better' source than the master for any given object.
> And if so, how do I configure the cache_peers lines on the
> squids? I would assume carp is the best protocol to use in this
> instance as it assumes a linear array instead of a hierarchy, but at
> this point I'd be happy to see ANY cache protocol example that would
> distribute the cached objects in a squid cluster.
Whichever layer of squid is allowed to go direct to master gets a
cache_peer parent pointing at the master.
Whichever layer of squid are allowed to share between get cache_peer
sibling pointing at the others.
Each layer (if any) get a cache_peer parent pointing at one or more
servers closer to the master than themselves. (The direct to master is
only a special case/bottom of this layering).
peer selection in squid includes many algorithms: carp, round-robin,
weighted-round-robin, closest-only, default, userhash, sourcehash,
first-available.
Amos
Received on Mon Aug 11 2008 - 04:53:25 MDT
This archive was generated by hypermail 2.2.0 : Mon Aug 11 2008 - 12:00:02 MDT