Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues from david_at

From: <david_at_lang.hm>
Date: Wed, 4 May 2011 16:36:08 -0700 (PDT)

On Wed, 4 May 2011, Alex Rousskov wrote:

> On 05/04/2011 12:49 PM, david_at_lang.hm wrote:
>
>> I don't know how many developers are working on squid, so I don't knwo
>> if you are the only person who can do this sort of work or not.
>
> I am sure there are others who can do this. The question is whether you
> can quickly find somebody interested enough to spend their time on your
> problem. In general, folks work on issues that are important to them or
> to their customers. Most active developers donate a lot of free time,
> but it still tends to revolve around issues they care about for one
> reason or another. We all have to prioritize.

I do understand this.

>> do you think that I should join the squid-dev list?
>
> I believe your messages are posted to squid-dev so you are not going to
> reach a wider audience if you do. If you want to write Squid code,
> joining is a good idea!

I don't really have the time to do coding on this project

> IMHO, you can maximize your chances of getting free help by isolating
> the problem better. For example, perhaps you can try to reproduce it
> with different kinds of fast ACLs (the simpler the better!). This will
> help clarify whether the problem is specific to IPv6, IP, or ACLs in
> general. Test different number of ACLs: Does the problem happen only
> when there number of simple ACLs is huge? Make the problem easier to
> reproduce by posting configuration files (including Polygraph workloads
> or options for some other benchmarking tool you use).
>
> This is not a guarantee that somebody will jump and help you, but fixing
> a well-triaged issue is often much easier.

that's why I'm speaking up. I just have not known what to test.

are there other types of ACLs that I should be testing?

I'll setup some tests with differnet numbers of ACLs. since I've already
verified that the number of ACLs defined isn't the significant factor,
only the number tested before one succeds (by moving the ACL that allows
my access from the end of the file to the beginning of the file, keeping
everything else the same), I'll see if the slowdown seems proportional to
the number of rules, or if there is something else going on.

any other types of testing I should do?

David Lang

>
> HTH,
>
> Alex.
>
>
>> On Wed, 4 May 2011, Alex Rousskov wrote:
>>
>>> On 05/04/2011 11:41 AM, david_at_lang.hm wrote:
>>>
>>>> anything new on this issue? (including any patches for me to test?)
>>>
>>> If you mean the "ACLs do not scale well" issue, then I do not have any
>>> free cycles to work on it right now. I was happy to clarify the new SMP
>>> architecture and suggest ways to triage the issue further. Let's hope
>>> somebody else can volunteer to do the required legwork.
>>>
>>> Alex.
>>>
>>>
>>>> On Mon, 25 Apr 2011, david_at_lang.hm wrote:
>>>>
>>>>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
>>>>> From: david_at_lang.hm
>>>>> To: Alex Rousskov <rousskov_at_measurement-factory.com>
>>>>> Cc: Marcos <mczueira_at_yahoo.com.br>, squid-users_at_squid-cache.org,
>>>>> squid-dev_at_squid-cache.org
>>>>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
>>>>>
>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>>
>>>>>> On 04/25/2011 05:31 PM, david_at_lang.hm wrote:
>>>>>>> On Mon, 25 Apr 2011, david_at_lang.hm wrote:
>>>>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>>>>>> On 04/14/2011 09:06 PM, david_at_lang.hm wrote:
>>>>>>>>>
>>>>>>>>>> In addition, there seems to be some sort of locking betwen the
>>>>>>>>>> multiple
>>>>>>>>>> worker processes in 3.2 when checking the ACLs
>>>>>>>>>
>>>>>>>>> There are pretty much no locks in the current official SMP code.
>>>>>>>>> This
>>>>>>>>> will change as we start adding shared caches in a week or so, but
>>>>>>>>> even
>>>>>>>>> then the ACLs will remain lock-free. There could be some internal
>>>>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such),
>>>>>>>>> but I
>>>>>>>>> do not know much about them.
>>>>>>>>
>>>>>>>> what are the 3rd party libraries that I would be using?
>>>>>>
>>>>>> See "ldd squid". Here is a sample based on a randomly picked Squid:
>>>>>>
>>>>>> libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol
>>>>>>
>>>>>> Please note that I am not saying that any of these have problems in
>>>>>> SMP
>>>>>> environment. I am only saying that Squid itself does not lock anything
>>>>>> runtime so if our suspect is SMP-related locks, they would have to
>>>>>> reside elsewhere. The other possibility is that we should suspect
>>>>>> something else, of course. IMHO, it is more likely to be something
>>>>>> else:
>>>>>> after all, Squid does not use threads, where such problems are
>>>>>> expected.
>>>>>
>>>>>
>>>>>> BTW, do you see more-or-less even load across CPU cores? If not,
>>>>>> you may
>>>>>> need a patch that we find useful on older Linux kernels. It is
>>>>>> discussed
>>>>>> in the "Will similar workers receive similar amount of work?"
>>>>>> section of
>>>>>> http://wiki.squid-cache.org/Features/SmpScale
>>>>>
>>>>> the load is pretty even across all workers.
>>>>>
>>>>> with the problems descripted on that page, I would expect uneven
>>>>> utilization at low loads, but at high loads (with the workers busy
>>>>> serviceing requests rather than waiting for new connections), I would
>>>>> expect the work to even out (and the types of hacks described in that
>>>>> section to end up costing performance, but not in a way that would
>>>>> scale with the ACL processing load)
>>>>>
>>>>>>> one thought I had is that this could be locking on name lookups. how
>>>>>>> hard would it be to create a quick patch that would bypass the name
>>>>>>> lookups entirely and only do the lookups by IP.
>>>>>>
>>>>>> I did not realize your ACLs use DNS lookups. Squid internal DNS code
>>>>>> does not have any runtime SMP locks. However, the presence of DNS
>>>>>> lookups increases the number of suspects.
>>>>>
>>>>> they don't, everything in my test environment is by IP. But I've seen
>>>>> other software that still runs everything through name lookups, even
>>>>> if what's presented to the software (both in what's requested and in
>>>>> the ACLs) is all done by IPs. It's a easy way to bullet-proof the
>>>>> input (if it's a name it gets resolved, if it's an IP, the IP comes
>>>>> back as-is, and it works for IPv4 and IPv6, no need to have logic that
>>>>> looks at the value and tries to figure out if the user intended to
>>>>> type a name or an IP). I don't know how squid is working internally
>>>>> (it's a pretty large codebase, and I haven't tried to really dive into
>>>>> it) so I don't know if squid does this or not.
>>>>>
>>>>>> A patch you propose does not sound difficult to me, but since I cannot
>>>>>> contribute such a patch soon, it is probably better to test with ACLs
>>>>>> that do not require any DNS lookups instead.
>>>>>>
>>>>>>
>>>>>>> if that regains the speed and/or scalability it would point fingers
>>>>>>> fairly conclusively at the DNS components.
>>>>>>>
>>>>>>> this is the only think that I can think of that should be shared
>>>>>>> between
>>>>>>> multiple workers processing ACLs
>>>>>>
>>>>>> but it is _not_ currently shared from Squid point of view.
>>>>>
>>>>> Ok, I was assuming from the description of things that there would be
>>>>> one DNS process that all the workers would be accessing. from the way
>>>>> it's described in the documentation it sounds as if it's already a
>>>>> separate process, so I was thinking that it was possible that if each
>>>>> ACL IP address is being put through a single DNS process, I could be
>>>>> running into contention on that process (and having to do name lookups
>>>>> for both IPv6 and then falling back to IPv4 would explain the severe
>>>>> performance hit far more than the difference between IPs being 128 bit
>>>>> values instead of 32 bit values)
>>>>>
>>>>> David Lang
>>>>>
>>>>>
>>>
>>>
>
>
Received on Wed May 04 2011 - 23:36:17 MDT

This archive was generated by hypermail 2.2.0 : Thu May 05 2011 - 12:00:02 MDT