Re: [PATCH] SBuf c-string comparisons from Amos Jeffries on 2014-04-18 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 19 Apr 2014 06:12:10 +1200

On 18/04/2014 6:54 a.m., Alex Rousskov wrote:
> On 04/17/2014 04:43 AM, Amos Jeffries wrote:
>> On 17/04/2014 7:23 a.m., Alex Rousskov wrote:
>>> Any reasonable/popular implementation selected by the developer. This is
>>> a one-time check done by the developer, not an automated check done
>>> during Squid build. Sorry I was not clear about that.
>
>> So glibc: a do-while loop scanning word-by-word with individual
>> byte-by-byte loop (unrolled) over the bytes in each word.
>
> Yeah (if by "word" you mean "4 characters"). The important part here is
> that they _are_ trying to optimize the scan using low level tricks such
> as partial loop unrolling. That makes the cloning approach more
> attractive than if they did not, but I am not sure it makes it
> attractive enough. I am still OK with the hand-rolled loops direction,
> as your latest patch suggests.

Yes I did mean that by "word", as in the 32-bit/4-byte CPU or memory
'word' they seem to be optimizing for.

I opted for 'hand rolling' by copying the simple(r) strncasecmp() code
loop for both our cases instead of the strncmp() optimized code.

>
>> The cloning mechanism uses strlen() internally. So no benefit, but extra
>> malloc+free costs.
>
> AFAICT, cloning should not call strlen(). We are cloning the SBuf object
> ("this") which has a known length. We are not cloning the c-string. The
> only purpose of cloning is null termination of "this" -- SBuf::c_str()
> is not a constant method. Terminating "this" is enough to use the
> standard strncmp().
>

The c_str() part of the cloning is where the allocation cycle is added
to provide the guarantee that terminator and resulting (ab)uses of the
c_str() value are okay. Which is really worse than strlen() IMO.

>
>> Patch with hand-rolled scanner attached.
>
>
>> + // 0-length comparison is always true regardless of buffer states
>> + if (!n || (!length() && *s=='\0')) {
>> + ++stats.compareFast;
>> + return 0;
>> + }
>> +
>> + // N-length compare MUST provide a non-NULL C-string pointer
>> + assert(s);
>
> The assertion appears too late to save us from dereferencing NULL s in
> the if-statement code above. I suggest carefully removing that overly
> complicated if-statement (adjusting the rest of the code accordingly).
>
> assert(!n || s);
> ... adjusted loops go here ...
>
> I do not insist on removing the if-statement, but if you keep it, please
> adjust it so that it asserts instead of dereferencing NULL s from broken
> callers.

That "" vs "" case you had me throw in fails the unit tests if the check
for 0-length strings. Good thing you did. :-)

I can move the check below the assert so it becomes
(EXHIBIT A):

    // 0-length comparison is always true regardless of buffer states
    if (!n) {
        ++stats.compareFast;
        return 0;
    }

// N-length compare MUST provide a non-NULL C-string pointer
assert(s);

    // when this is a 0-length string, no need for any complexity.
    if (!length()) {
        ++stats.compareFast;
        return '\0' - *s;
    }

This will also catch the all compares when the SBuf is empty string
regardess of the length of s. Which was the "left" overread bug you
mentioned below.

>
>> + size_type byteCount = min(length(), n);
>
> Using npos in min() makes me uncomfortable but should work because
> size_type is unsigned. If you keep this code, please add a comment. For
> example:
>
> // n may be npos, but we treat that as a huge positive value
>

Done.

>
>> + while ((rv = *left - *right++) == 0) {
>
> "left" may point beyond allocated memory if length() is zero but *s is not.
>

I was under the impression SBuf can never have a NULL buffer. But
anyway, this is resolved by the above change preventing this whole area
of code being reached.

>
>> + if (length() < n)
>> + return '\0' - *right;
>
> "right" might also point beyond allocated memory by now -- it was
> unconditionally incremented in the loops and I do not think the non-zero
> byteCount check protects us from reaching this code in some cases.
>

1) The only case which can reach this dereference is when byteCount
reached 0.

2) "left" was also unconditionally incremented in the if-statement
condition before breaking out of the loop at the point byteCount reached 0.

3) we have not seen *right=='\0' yet. Because we only just reached EOS
on left OR byteCount would not have been decremented to 0 yet.

So we are safe in testing *right. Assuming that it is either
0-terminated or n is smaller than its endpoint. If those are not true we
will die in same way system strn*() do and even strlen() will not help
us with that one.

>
>> + size_type byteCount = min(length(), n);
> ...
>> + while ((rv = *left - *right++) == 0) {
>> + if (*left++ == '\0' || --byteCount == 0)
>> + break;
>
> byteCount may underflow here (become huge) if it starts as zero because
> length() is zero.

Solved by the earlier !length() check.

>
>
>> + while ((rv = *left - *right++) == 0) {
>> + if (*left++ == '\0' || --byteCount == 0)
>> + break;
>
> This loop in conjunction with mind boggling after-loop checks is just
> too smart for the human brains IMHO. This level of complexity creates a
> constant trickle of bugs. Please consider a more straightforward
> approach that is nearly as efficient. Here is a sketch:
>

I'm sorry they boggle your mind. I have added comments (below) to
clarify what/why each exists.

> {
> assert(!n || s);
> ...
> const char *left = buf();
> leftCount = length();
> const char *right = s; // or just rename s to right?
> // n may be npos, but we treat that as a huge positive value
> while (n) {
> const c1 = leftCount-- ? *left++ : '\0';
> const c2 = *right++;
> if (!c1 || c1 != c2) // covers !c2 as well
> return c1 - c2;
> n--;
> }
> // all n characters were identical (and none was \0)
> return 0;
> }
>
> The above requires some polishing (such as adding c1/2 types and a
> case-insensitive loop) but, AFAICT, the only added inefficiency here is
> the leftCount--? expression. I think it is worth the gain of much easier
> to understand code.
>
> However, I cannot insist that you switch to this simpler approach. If
> you prefer to keep fixing the more complex code, I will do my best to
> find the time to keep reviewing it.

I see you are suggesting the FreeBSD libc strncmp() method
http://fxr.watson.org/fxr/source/string/strncmp.c?v=FREEBSD6-LIBC

I was following the GNU libc strncase() method
http://www.ic.unicamp.br/~islene/2s2008-mo806/libc/string/strncase.c

> while ((result = TOLOWER (*p1) - TOLOWER (*p2++)) == 0)
> if (*p1++ == '\0' || --n == 0)
> break;
>
> return result;

The only new part is the if-statement(s) after looping. I have combined
them and added comments to clarify.

They/it are equivalent with your (leftCount-- ? *left++ : '\0'), but the
extra test is run only once per call instead of once per character.

I have merged the two into one with documentation to clarify
(EXHIBIT B):

    // If we stopped scanning because we reached the end of buf()
    // pretend we have a 0-terminator there to compare.
    // NP: the loop already incremented "right" ready for this
    if (!byteCount && length() < n)
        return '\0' - *right;

    // If we found a difference within the scan area,
    // or we found a '\0',
    // or all n characters were identical (and none was \0).
    return rv;

If you are okay with the new bits marked (EXHIBIT A) and (EXHIBIT B) I
think we are done. These have already been checked against all unit
tests and passed.

Amos
Received on Fri Apr 18 2014 - 18:12:24 MDT

This archive was generated by hypermail 2.2.0 : Sat Apr 19 2014 - 12:00:13 MDT