Re: benchmarking squid on solaris/x86 from Andres Kroonmaa on 2002-03-21 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 21 Mar 2002 13:08:40 +0200

On 20 Mar 2002, at 23:03, Henrik Nordstrom <hno@squid-cache.org> wrote:

> On Wednesday 20 March 2002 17:25, Andres Kroonmaa wrote:
> >
> > I know from experience that tcp_time_wait_interval and
>
> the TIME_WAIT is quite well optimized in Linux to batch all the
> sockets together in larger chunks when under load, thus avoiding
> excessive time checks. Though Solaris did something similar by now..
> ...
> Basically the only thing a TIME_WAIT socket consumes while in
> TIME_WAIT is the memory for the socket. If you receive a lot of stray
> packets then TCP processing of those stray packets may alwo inflict a
> small penalty from having to look thru a few sockets before finding
> the correct one (or none), but you need to have really a lot of
> TIME_WAIT sockets and a lot of stray packets for this to be of
> significance.

Not only. Socket in TIME_WAIT is holding ephemeral port from
being reused, (and related file FD is not fully released imho,
but I may be wrong here) and this TW table must be scanned every
time socket() call is used.

> Because of this (and politics regarding what the user should be
> allowed to tune) the length of TIME_WAIT cannot be tuned in Linux. It
> is fixed at 60 seconds, or sligthly more when under load due to the
> infrequent flushing of TW buckets.

Every socket that is closed goes into TIME_WAIT. Solaris default
is 240 seconds (as per rfc1122, 4.2.2.13). Way too high.
Under high loads, like 200 tps, and each lasting <0.5 secs, you
flood all your ephemeral ports into TIME_WAIT if you don't lower
the timer. Time to find free ephemeral port is what hits the
performance, not so much background tasks. Besides, if I understand
correctly, Solaris tries to do the job at a time when socket
creation/close is requested (to reduce unaccounted background
kernel cpu time?), and this also impacts application (socket
ops become slower).

> Under low load (less than 2^5 TW sockets) each TIME_WAIT period is
> timed individually.

2^5 is neglible load. Given mean session time, and TIME_WAIT
timer ratio, you can estimate typical number of sockets in
TIME_WAIT state based on number of open sockets. For eg.
given 10seconds average session time and 60sec timer, you'd
expect to have 6x as many sockets in TIME_WAIT as you have
open TCP sessions. If you do 200 tps, this ratio will be
way higher. On Solaris, you start to worry only if number
of sockets in TIME_WAIT is in thousands.
I don't know how Solaris optimises the issue, but I suspect
they don't do the bucket stuff.
There are valid tricks to reuse sockets in TW state, but
afaik this depends on number of IP peers that communicate.
Thus test between few client hosts and servers is way
different than test between zillions of clients/servers.

But the tcp hash table tunable of the kernel can have
direct relation to the matter, infact Sun is said to up
it to 256K during web performance tests, so my suggestion
of 8K was very conservative.

On solaris, I check sometimes with:
netstat -na | nawk '{print $NF}' | sort | uniq -c | sort -n
and make sure that number of sockets in *WAIT is not higher
than number of sockets in ESTABLISHED state.

> > tcp_fin_wait_2_flush_interval impose performance hit on a
> > Solaris system. I think on Linux too.
>
> What exacly is it?

Best (to my knowledge) starting point for Solaris internals
tuning is:
http://www.sean.de/Solaris/tune.html#tcp_close_wait_interval

tcp_fin_wait_2_flush_interval
  This values seems to describe the (BSD) timer interval which
  prohibits a connection to stay in the FIN_WAIT_2 state forever.
  FIN_WAIT_2 is reached, if a connection closes actively. The FIN
  is acknowledged, but the FIN from the passive side didn't arrive
  yet - and maybe never will.

> Linux seems to share FIN_WAIT2 processing with TIME_WAIT, using the
> process described above. Only the lenght of FIN_WAIT can be tuned.
>
> > build up thousands of sockets in timewait. Now Linux afaik
> > handles this automatically by dropping oldest timewait sockets
> > without a notice to keep their count decent, breaching standard,
>
> Not that I know of.

Well, rfc-1122 requires TW to be 240sec. As you said, Linux fixes
it at 60sec, which is more useful, but ignores rfc. This has
immediate impact on perf when comparing linux to solaris.
But I may indeed be misinformed on the linux shortcut of dropping
TW sockets without notice. I can't recall where I got this from.

> The only similar thing I have found in Linux is that it immediately
> drops TIME_WAIT sockets if above the allowable queue length,
> complaining loudly about the fact in the process.. The TW queue limit
> defaults to 180K sockets on my 320MB laptop, more can be allowed
> runtime if needed.

btw, what does tunable tcp_tw_recycle actually do on linux?
tcp_max_orphans should imho also be related to the same matter.

> Linux also drops TIME_WAIT sockets before the period has expired in
> response to matching RST packets.

I guess you mean FW2 here?

> All of the above is based on reading the Linux-2.4 stack and my
> experiense from using Linux.

You are surely better informed, so if notice my error, don't
let it be ;)

------------------------------------
Andres Kroonmaa <andre@online.ee>
CTO, Microlink Online
Tel: 6501 731, Fax: 6501 725
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Thu Mar 21 2002 - 04:16:06 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:52 MST