Hello,
     During the last couple of weeks, we have spent a lot of time 
comparing Squid v3.0 and v3.2 performance under various conditions to 
understand why v3.2 is sometimes 3-10% slower than v3.0. This email 
shares our findings and suggests actions for addressing the problems.
We made several discoveries that will improve v3.2 performance, 
including one regression bug, but my overall conclusion is that most of 
the observed slowdown can be attributed to code reorganization and 
various new features added after v3.0.
I am referring to this phenomenon as "death by a thousand cuts" because 
these changes have negligible overhead in most locations. We had to use 
custom, low-level profiling to find the main culprits, but most of the 
effects of the new code in an isolated function or method are 
indistinguishable from noise. It is their combined effect that matters.
Here are a few cases where we were able to measure the performance 
penalty by rewriting/optimizing the code:
    no addrinfo in comm_local_port: 2.0%
    no addrinfo in comm_accept:     0.2%
    no NtoA in client_db: 0.2% for small number of clients
    no zeroOnPush for some MEMPROXY_CLASSes: 0.8%
The percentages above can be interpreted as "Squid became X% faster when 
the corresponding overheads of the new code were removed". These numbers 
are provided for illustration only; the exact values and the meaning of 
"faster" are not important here. What's important is that most of the 
isolated overheads are far _less_ than the above numbers, but add up to 
measurable 3-10% performance degradation.
Two changes stand out the most in this "death by thousand cuts" 
category: asynchronous calls and Ip::Address. Both changes are 
necessary, but they add performance overheads we should be aware of.
I am not sure whether asynchronous calls can be significantly optimized. 
I have one idea that I am going to try, but if it does not work, then we 
will have to accept the performance price of this important API and 
optimize to compensate elsewhere. Few things can be worse that going 
back to spaghetti code!
As for Ip::Address, its implementation and use may need to be optimized, 
but I need your help to understand whether my suspicions are reasonable. 
I will send a separate email discussing IPv6-related overheads.
There are other overheads that we inherited from Squid v3.0 and Squid2. 
IP::Address and async calls are special because, from users point of 
view, these overheads did not exist in the Squid version they are 
running now so they want them gone.
It may be tempting to ignore these minor regressions and just fight 
major expenses such as excessive memory copying or slow parsing. The net 
result will be "better performance" anyway. However, I believe we have 
to do both because even if we start with bigger problems and eliminate 
them, the currently smaller problems will become relatively big. And 
fixes for ignored problems may become costlier with time.
We should also discuss whether it make sense to start doing semi-regular 
and/or on-demand performance testing using a "standardized" 
environment(s) and workload(s) so that performance regressions like the 
ones described above can be detected and dealt with earlier. It would be 
sad if we had to go through the same time wasting exercise during v3.3 
release.
Thank you,
Alex.
Received on Thu Nov 18 2010 - 02:30:16 MST
This archive was generated by hypermail 2.2.0 : Thu Nov 18 2010 - 12:00:05 MST