On 09/19/2010 05:35 AM, Amos Jeffries wrote:
> Round 2 of only src/comm/ changes.
>
> I believe this accounts for all the bits requested to date. Including
> adding Subscriptions for ConnAcceptor.
* The patch is missing the new Subscription class. Based on your subject 
line, I am guessing you will post that separately.
* Comm::ConnOpener::start() should call connect() directly. Start() is 
already async, there is no need to pay the async price again. Moreover, 
by doing an async call, you are more likely to encounter timeouts even 
before we call connect() because the clock starts ticking before the 
call is fired.
* Comm::ConnOpener::timeout() should call connect() directly. Timeout() 
is already async, there is no need to pay the async price again.
* I do not like how Comm::ConnOpener::connect() calls itself. I assume 
you did not invent this, but the code will fail to produce the 
intended(?) delay if there are no other async calls already scheduled, 
which is something the class does not really control. Recall that async 
calls are AsyncCallQueue::fire()d until there are none left...
At the very least, please add a comment explaining why we use such a 
hack instead of either using an explicit loop or waiting a little bit 
via eventAdd.
* calls_.connect_ should not be reused in Comm::ConnOpener::connect() 
because AsyncCall objects may not support being fired twice -- they may 
have state. The connect_ object is not needed at all if you address the 
above comments. If you do not, set it to NULL at the start of 
Comm::ConnOpener::connect() and recreate when needed.
* I may be wrong, but the interesting "Is opened conn fully open?" 
comment below may reveal a couple of bugs:
> +    // recover what we can from the job
> +    if (conn_ != NULL && conn_->isOpen()) {
> +        // it never reached fully open, so abort the FD
> +        conn_->close();
> +        ...
Clearly, conn_ is open inside that if-statement (there is no such thing 
as half-open conn, I hope; we have enough headaches from half-closed 
ones!). There is nothing wrong with the connection. Instead, it is the 
ConnOpener job itself that got into a half-finished state here, most 
likely due to an exception after the connection got opened and before it 
was sent to the caller. To clean up, two separate actions are needed:
a) Close the opened connection. The beginning of your if-statement will 
do the job, but do not call doneConnecting(). Here is a polished version:
   if (conn_ != NULL && conn_->isOpen()) {
       Must(callback_ != NULL); // or there should be no conn_
       // we never sent this open connection to the caller, so close
       conn_->close();
       fd_table[conn_->fd].flags.open = 0;
       conn_ = NULL;
   }
b) Notify the caller we are dying:
   if (callback_ != NULL) {
       // do not let the caller wait forever
       doneConnecting(COMM_ERR_CONNECT, 0);
   }
I am not sure about the order. The above order means "caller can handle 
and should get nil params.conn". You can remove "conn_ = NULL" if 
"caller should get a non-nil but closed params.conn", but there will 
still be two if-statements.
Please check that I did not miss some other inter-state dependencies in 
the above sketch. You know this code much better than I am.
* Consider moving case COMM_OK code, including lookupLocalAddress() to 
something like Connection::opened(). The code seems to be manipulating 
conn_ without access to Comm::ConnOpener data except perhaps host_. I 
could have missed some other dependencies, of course, but it feels like 
a Connection business to set things up after the connection has been 
established. The ConnOpener job is done at that stage.
* Can we leave resetting fd_table[conn_->fd].flags.open to comm_close(), 
which will be called from via conn_->close()? It feels wrong to force it 
to zero way before the connection is actually closed but I understand 
that you may be fighting some hidden dependencies.
* Should we test _peer cbdata-validity before dereferencing it below?
> +        if (_peer)
> +            _peer->stats.conn_open--;
* The API below seems like a bad idea:
> +    /** retrieve the peer pointer for use.
> +     * The caller is responsible for all CBDATA operations regarding the
> +     * used of the pointer returned.
> +     */
> +    peer * const getPeer() const { return _peer; }
because it leads to bugs like that (missing cbdata check):
> +        if (conn_->getPeer())
> +            conn_->getPeer()->stats.conn_open++;
I suggest checking cbdata validity of _peer in getPeer(), returning NULL 
if the pointer is invalid, and using getPeer() even internally.
* Our Vector::shift is both slow and buggy. This is not your problem, I 
guess, but perhaps you can add an XXX comment to 
Comm::AcceptLimiter::kick() to replace Vector with an efficient queue type.
* Can you explain how "syncWithComm() side effects hit 
theCallSub->callback()"? Is not theCallSub->callback() itself constant 
and its result is stored in a local variable? I do not understand how 
syncWithComm() can affect theCallSub itself. Thank you.
* Why check calls_.earlyAbort_, calls_.timeout_, and connStart_ in 
ConnOpener::start(). Start is called only once and these checks seem to 
imply that those variables can be set somewhere else, before the start, 
even though they are private.
* Consider s/connStart_/connectStart_/ for clarity. Too many conn 
prefixes...
* Please use "const TextException &e" in
> +   } catch(TextException &e) {
* The code below does not need to declare temp outside its scope:
> +        ConnAcceptor *temp = deferred.shift();
> +        if (temp != NULL) {
* cbdataReferenceDone() already sets its parameter to NULL. No need to 
do that twice:
> +    /* clear any previous ptr */
> +    if (_peer) {
> +        cbdataReferenceDone(_peer);
> +        _peer = NULL;
> +    }
* s/COMM flags set on this connection/COMM flags; see COMM_ defines 
above/ or something of that kind.
* Consider s/_peer/peer_/ in Connection for consistency with most other 
code.
* s/unused//g. C++ does not require naming unused parameters.
* s/As soon as comm IO accepts/As soon as commSetSelect accepts/
just to be more precise, because many comm IO calls already accept 
AsyncCalls.
HTH,
Alex.
Received on Mon Sep 20 2010 - 22:01:04 MDT
This archive was generated by hypermail 2.2.0 : Wed Sep 22 2010 - 12:00:07 MDT