On Thu, Jan 30, 2014 at 12:39 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 01/29/2014 02:32 PM, Kinkie wrote:
>> On Wed, Jan 29, 2014 at 7:52 PM, Alex Rousskov wrote:
>>> On 01/29/2014 07:08 AM, Kinkie wrote:
>
>> (in a trunk checkout)
>> bzr diff -r lp:/squid/squid/vector-to-stdvector
>>
>> The resulting diff is reversed, but that should be easy enough to manage.
>
> Thanks! Not sure whether reversing a diff in my head is easier than
> downloading the branch :-(.
Maybe using qdiff (from the qt-bzr plugin) might help by writing
things in a graphical format?
Apart from that, I don't know.
>>> Can you give any specific examples of the code change that you would
>>> attribute to a loss of performance when using std::vector? I did not
>>> notice any obvious cases, but I did not look closely.
>
>
>> I suspect that it's all those lines doing vector.items[accessor] and
>> thus using C-style unchecked accesses.
>
> std::vector[] element access operator does not check anything either, as
> we discussed earlier. IIRC, you [correctly] used a few std::vector::at()
> calls as well, but I did not see any in a critical performance path.
>>>> test conditions:
>>>> - done on rs-ubuntu-saucy-perf (4-core VM, 4 Gb RAM)
>>>> - testing with ab. 1m requests @10 parallelism with keepalive,
>>>> stressing the TCP_MEM_HIT code path on a cold cache
>>>> - test on a multicore VM; default out-of-the-box configuration, ab
>>>> running on same hardware over the loopback interface.
>>>> - immediately after ab exits, collect counters (mgr:counters)
>>>>
>>>> numbers (for trunk / stdvector)
>>>> - mean response time: 1.032/1.060ms
>>>> - cpu_time: 102.878167/106.013725
>
>
>> If you can suggest a more thorough set of commands using the rough
>> tools I have, I'll gladly run them.
>
> How about repeating the same pair of tests a few times, in random order?
> Do you get consistent results?
Ok, here's some numbers (same testing methodology as before)
Trunk: mean RPS (CPU time)
10029.11 (996.661872)
9786.60 (1021.695007)
10116.93 (988.395665)
9958.71 (1004.039956)
stdvector: mean RPS (CPUtime)
9732.57 (1027.426563)
10388.38 (962.418333)
10332.17 (967.824790)
Some other insights I got by varying parameters:
By raw RPS, it seems that performance varies with the number of number
of parallel clients in this way (best to worst)
100 > 10 > 500 > 1
I also tried for fun to strip non strictly needed configuration (e.g.
logging, access control) and use 3 workers letting the fourth core for
ab: the best result was 39900.52 RPS. Useless, but impressive :)
-- /kinkieReceived on Thu Jan 30 2014 - 19:15:06 MST
This archive was generated by hypermail 2.2.0 : Fri Jan 31 2014 - 12:00:17 MST