Alex Rousskov <rousskov_at_measurement-factory.com> writes:
> On 01/13/2013 03:10 PM, Rainer Weikusat wrote:
>> Assuming that a client attempted to contact a HTTPS server which is
>> actually a 'port 443 blackhole' (meaning, attempts to connect to the
>> corresponding address and port 443 don't result in any kind of reply)
>> and this request was intercepted by a squid configured to do 'server
>> first' SSL bumping, the timeout squid enforces for the asynchronous
>> connect requests ultimatively triggers in assert in forward.cc. This
>> happens because the ConnOpener::timeout method calls
>> ConnOpener::connect which - in turn - calls comm_connect_addr to
>> determine the status of the connection attempt. This routine uses
>> getsockopt/ SOL_SOCKET/ SO_ERROR to determine if the connect
>> succeeded. Because nothing was received from the remote endpoint, at
>> least on Linux, the result will be 'no error' which means a 'false
>> positive' 'connection sucessfully established' situation
>> occurs.
>
> Hi Rainer,
>
> Nice analysis, thank you! Have you seen the discussion about
> ConnOpener problems in the squid-dev thread called "ICAP connections
> under heavy loads"? (If you have not, please review -- it is mostly not
> about ICAP). I suspect the comprehensive solution sketched out there
> solves this problem as well.
Well, fixing the timeout handling because of another problem this also
causes would also fix this problem (and another nascent one I happen to be
aware of, namely, the only reason why this doesn't bomb out for plain
HTTP, too, is that the client-side will time out first). But I've read
through the discussion and I agree with the opinion that the 'squid vs
tcp' race is a moot issue: Because the connect is asynchronous, it may
succeed at any time after connect was called and including a single,
additional check for 'did it succeed in the meantime' is not going to
solve the problem because it could suceed one microsecond after this
check: Any timeout which is shorter than the connection establishment
timeout enforced by the kernel will occasionally cause a spurious
connection failure and actually, even the kernel timeout will
occasionally cause that because the SYN-ACK could arrive immediately
after the kernel has decided to give up aka "the internet isn't
reliable". I also agree with the other opinion that the existing
timeout handling code is heavily contorted and that the connect code
should deal with connections and the timeout code with timeouts,
especially considering that the 'check for timeout' in connect is also
done in checkTimeouts (comm.cc) and that the connect check in its
present form will come to the conclusion 'no timeout' after the
checkTimeouts code concluded 'timeout'.
In any case, I need a working solution for this now because my
employer uses 3.3 for at least one customer. Since I really don't like
the hack I did yesterday, I've code what I consider to be a sensible
approach to deal with this issue instead. Because my boss also
requested that I should make this available to the project, I'm going
to send a third e-mail with the 2nd version of the patch.
Received on Mon Jan 14 2013 - 19:59:07 MST
This archive was generated by hypermail 2.2.0 : Tue Jan 15 2013 - 12:00:06 MST