Re: [MERGE] c++-refactor HttpHdrCc

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 26 Sep 2011 16:57:12 -0600

On 09/14/2011 12:30 PM, Kinkie wrote:
> On Tue, Sep 13, 2011 at 6:11 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 09/12/2011 05:38 PM, Kinkie wrote:
>>> Hi,
>>> the attached bundle refactors the HttpHdrCc into a c++ class,
>>> attempting also to clean the code up. Main changes:
>>> - 6 less global functions in protos.h
>>> - 1 less struct in structs.h
>>> - proper c++ construction/destruction semantics, standard MemPool integration
>>
>>> - name->id header parsing now done via std::map instead of iterating
>>> over list: complexity should go from O(n) to O(log n) with STL-derived
>>> libs.
>>
>> On the other hand, you now deallocate and allocate a new SquidString for
>> every CC directive just to find that directive ID so the total cost may
>> be not be lower. I think we can do much better, and even if we cannot or
>> will not, I do not think we should commit the "optimization" part of
>> this change until proper performance regression testing.
>
> Point taken. I've implemented a small helper class (basically a const
> char*+length) to support the lookup in a very cheap way. I'd like to
> get your feedback on it before trying to extend its use (it's
> currently an in-cc-file definition).

You are on the right track with this class. It can be useful for a lot
of things (we use similar classes in other projects). The new class
needs some polishing though:

> +/** dumb char* /length combo for quick lookups.
> + *
> + * Data is not copied.
> + * Validity of the pointed-to storage is responsibility of the caller.
> + * */

The class data is copied. Consider:

/** A char* plus length combination. Useful for temporary storing
 * and quickly looking up strings.
 *
 * The pointed-to string may not be null-terminated.
 *
 * Not meant for stand-alone storage. Validity of the
 * pointed-to string is responsibility of the caller.
 **/

The "may not be null-terminated" part can be removed, but it would be
useful to preserve it as it will allow us to use this class for many
other things. Please note that if you do preserve it, you would need to
fix comparison functions and add a stream output operator (among other,
minor changes).

> +class strblob {

The class name must be changed to follow Squid style and, hopefully have
more meaning. Consider "StringArea". You may want to move it to
src/base/StringArea.h.

> + const char *thePtr;
> + size_t theLen;

I would s/thePtr/theStart/ but it is your call.

> + bool operator==(strblob &s) const { return theLen==s.theLen && 0==strncmp(thePtr,s.thePtr,theLen); }

The parameter "s" should be declared const.

Value result specific before the computation putting avoid please:
    strcmp() == 0.

Please add != operator, implemented using the == operator.

Or you can remove the whole operator if it is not needed.

> + bool operator< ( const strblob &s2) const { return strncmp(thePtr,s2.thePtr,theLen) < 0; }

This will crash if s2.theLen is smaller than this->theLen. Needs more
len-specific checks. Note that for your current purposes, you can
declare shorter strings smaller even if they are not so in the dictionary.

This is one reason why such comparison may need to be moved outside the
object. Another reason is that we will probably want case-insensitive
comparison in other contexts.

Rename "s2" to "s" or similar.

> + const HttpHeaderCcFields *f=&CcAttrs[i];

"f" can be a [const] reference to avoid taking the address and then
dereferencing it all the time:

    const HttpHeaderCcFields &f = CcAttrs[i];

> + const strblob tmpstr(item,nlen);
> + const CcNameToIdMap_t::iterator i=CcNameToIdMap.find(tmpstr);

Consider s/tmpstr/lookup/ or some such. You can avoid this named object
altogether if you just create strblob when calling find(). The iterator
should be constant, I think. Here is a sketch:

    const CcNameToIdMap_t::const_iterator match =
        CcNameToIdMap.find(strblob(item, nlen));

> + cc->setMaxAge(MAX_AGE_UNSET);

So it is set or unset? If you do not want to add an explicit clear
MaxAge() method, how about s/MAX_AGE_UNSET/MAX_AGE_UNKNOWN/?

Similar for S_MAXAGE_UNSET, MIN_FRESH_UNSET, MAX_STALE_ALWAYS, etc. (see
the blob below for the full list).

> + static const int32_t MAX_AGE_UNSET=-1; //max-age is unset
> + static const int32_t S_MAXAGE_UNSET=-1; //s-maxage is unset
> + static const int32_t MAX_STALE_UNSET=-1; //max-stale is unset
> + static const int32_t MAX_STALE_ALWAYS=-2; //max-stale is set to no value
> + static const int32_t STALE_IF_ERROR_UNSET=-1; //stale_if_error is unset
> + static const int32_t MIN_FRESH_UNSET=-1; //min_fresh is unset

The old types were "int". You are making them more specific now with
"int32_t". I am not sure such a change is withing this patch scope, but
even if it is, perhaps you should add a HttpCcDirective typedef or
similar to avoid spreading int32_t throughout the code.

Can the above constants be converted into an enum?

> + EBIT_CLR(mask, CC_MIN_FRESH);
> + this->min_fresh=STALE_IF_ERROR_UNSET;

Consistency typo.

> +public:
...
> + int32_t mask;
> +private:
> + int32_t max_age;
> + int32_t s_maxage;
...

Seems wrong for the mask to remain public if we hid most of the values.

> + int32_t mask;
> +private:
> + int32_t max_age;
> + int32_t s_maxage;
> + int32_t max_stale;
> + int32_t stale_if_error;
> + int32_t min_fresh;
> +public:
> + String other;

It would be nice to document at least the mask and Other fields.

> + bool gotList;

Please declare when first use and add const if possible.

> -## acl needs wordlist. wordlist needs MemBug
> +## acl needs wordlist. wordlist needs MemBuf

An unrelated change?

> - if (cc && cc->min_fresh > 0) {
> + const int32_t minFresh=cc->getMinFresh();
> + if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {

This change smells like a nil pointer dereference to me, but perhaps the
missing context makes it OK?

> + if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
...
> + entry->mem_obj->getReply()->cache_control->getStaleIfError() != HttpHdrCc::STALE_IF_ERROR_UNSET &&
...
> + if (cc->getMaxAge() != HttpHdrCc::MAX_AGE_UNSET) {

These and other similar comparisons now look even worse than they were.
How about adding HttpHdrCc::hasMaxAge() and such?

>> And if you want to mix optimization with cleanup, here is one place
>> where you can save quite a few cycles for many messages by not ignoring
>> the return value of getList():
>>
>>> getList(HDR_CACHE_CONTROL, &s);
>>>
>>> - cc = httpHdrCcParseCreate(&s);
>>> + cc=new HttpHdrCc();
>>> + if (!cc->parseInit(s)) {
>>> + delete cc;
>>> + cc = NULL;
>>> + }

> Done.

> + bool gotList;
>
> if (!CBIT_TEST(mask, HDR_CACHE_CONTROL))
> return NULL;
> PROF_start(HttpHeader_getCc);
>
> - getList(HDR_CACHE_CONTROL, &s);
> -
> - cc = httpHdrCcParseCreate(&s);
> + gotList=getList(HDR_CACHE_CONTROL, &s);
> +
> + cc=new HttpHdrCc();
> + if (!gotList)
> + return cc;
> +
> + if (!cc->parse(s)) {
> + delete cc;
> + cc = NULL;
> + }

Hmm.. I did not see that we already have a HDR_CACHE_CONTROL test above
that code. The new code now optimizes a rare case of an empty but
present Cache-Control header. This can only lead to bugs, with no
positive performance changes. We should remove this exception. Sorry.

(One more reason to provide more than 3 lines of context during review!)

>>> -HttpHeaderFieldInfo *CcFieldsInfo = NULL;
>>> +/// Map an header name to its type, to expedite parsing
>>> +typedef std::map<String,http_hdr_cc_type> HdrCcNameToIdMap_t;
>>> +static HdrCcNameToIdMap_t HdrCcNameToIdMap;
>>
>> HdrCcNameToIdMap should be constant so that we do not accidentally
>> modify it. You can make it a pointer and use similar-to-current
>> initialization code or try to initialize statically.
>
> Are you sure we run such a risk? I've now changed it so that it is a
> static data member with const key. Do you think it could be enough?

I was not worried about somebody modifying the key specifically. I was
more worried about somebody modifying the map (adding, removing, or
changing its elements). Thus, a constant key does not help much. This is
not a big deal though.

>>> +class HttpHdrCc
>>> +{
>>> +
>>> +public:
>>> + int32_t mask;
>>> + int32_t max_age;
>>> + int32_t s_maxage;
>>> + int32_t max_stale;
>>> + int32_t stale_if_error;
>>> + int32_t min_fresh;
>>> + String other;
>>> +
>>> + HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>
>> Please move public data member after public methods. It would be nice to
>> document them too, especially those that do not correspond to CC
>> directives. Also, check whether we really need all of the above,
>> especially mask, as public fields (more on that later below).
>
> Done. Documented "mask" and "other", the others should be straightforward.
> public mask could be avoided if we do full getters/setters, but doing
> that elegantly would be scope-balooning.

Not done? And you did "balloon the scope" by adding full
getters/setters, removing your excuse to leave the mask exposed, right?

>>> + HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>> + int32_t max_stale_=-1, int32_t min_fresh_=-1) :
>>
>> Do we really all five of these constructors, including the one that
>> allows for implicit conversion from int32_t to HttpHdrCc? I do not think
>> so. Please leave just what we need and use "explicit" to prevent
>> implicit conversions as needed.
>
> Overengineering. Changed to only the default contstructor (we need it
> to set various fields to "-1" by default).
> Ok for explicit.

An explicit keyword does not make sense for a default constructor, does
it? Not sure why it compiles for you, but I would recommend removing
that keyword because your current code does not allow implicit type
conversions.

>>> + /** reset the structure to a clear state.
>>> + *
>>> + */
>>> + void clear();
>>
>> The "clear state" is an undefined concept. Consider a more specific
>> description such as:
>> /// reset to the after-default-construction state
>>
>> (and there is probably a more elegant way of saying that).
>
> How about "reset all data members to default state"? Otherwise I'll
> use your suggestion.

I can leave with either one, but "reset to default state" is probably best.

BTW, the latest patch lost documentation for this (and some other?) members.

HTH,

Alex.
Received on Mon Sep 26 2011 - 22:58:09 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 27 2011 - 12:00:03 MDT