On 03/02/2011 03:00 AM, Amos Jeffries wrote:
> On Tue, 01 Mar 2011 20:29:51 +0200, Tsantilas Christos wrote:
>> This patch is forgotten too.
>> Any comments?
>>
>
> Just doing a quick audit now...
>
>......
> "
>
> src/client_side.cc:
> - the logic is wrong
> - in setLogUri() you do not need to call stringHasCntl(uri). The
> unescape call will do a one-pass scan and check these as it goes.
> also an old bug is now visible, stringHasCntl() does not check the full
> set of characters that need encoding :(
Well, does not check for  rfc1738_unsafe_chars, is not difficult to fix 
it. I can include a fix in this patch if required.
>
> With that fixed its clear that stringHasWhitespace() needs to be
> performed *before* any escaping and duplicating is done. Some shuffling
> can avoid doing that scan pass entirely on most config setups.
This is true, the rfc1738_escape_unscaped, escapes the whitespace so 
user whitespace policy are not applied.
Your patch is not correct too, because the uris must always escaped if 
they contain non printable characters, before logged. Your code only 
take cares for whitespaces...
A better solution requires 2 strdup/malloc if there are both whitespaces 
and ctl characters in the uri. I do not know how often are URIs with 
whitespaces and ctl characters but I do not like ...
The problem was to find a way to handle the cases you have unescaped 
URIs with whitespaces which causes problem on log analysis.
In my original patch if there are cntl characters escape them and escape 
the whitespaces too. In the case there are not cntl characters apply 
user defined whitespace policy....
Yes the above is not the best, but I think it is fair for most cases.
The overall source code which is related with http client request 
parsing is a little confused, and I believe that we are spending a lot 
of CPU cycles checking and parsing again and again the same strings.
My opinion is that a better solution is to rewrite or add a new 
rfc1738_escape_unescaped library function as follows:
rfc1738_escape_unescaped_uri(const char *uri,
        const char *escaped_uri, size_t escaped_uri_len,
        int whitespace_policy = URI_WHITESPACE_ENCODE)
uri: the original URI
escaped_uri: the buffer to copy the URI
escaped_uri_len: the length of the buffer
whitespace_policy: method to handle the whitespaces 
(URI_WHITESPACE_ALLOW, URI_WHITESPACE_ENCODE etc.... )
The above will save at list 2 full string passes (stringHasWhitespace() 
and stringHasCntl() functions) for every URI to be logged and in worst 
case 4 string passes and 2 strdup/mallocs
Regards,
     Christos
>
> What I think is needed is:
>
> if (!cleanUrl)
> ...
> else {
> switch (Config.uri_whitespace) {
> case URI_WHITESPACE_ALLOW:
> http->log_uri = xstrndup(uri, MAX_URL);
> break;
>
> case URI_WHITESPACE_ENCODE:
> http->log_uri = xstrndup(rfc1738_escape_unescaped(uri), MAX_URL);
> break;
>
> case URI_WHITESPACE_CHOP:
> http->log_uri = xstrndup(uri, min(MAX_URL,strcspn(http->log_uri,
> w_space)));
> break;
>
> case URI_WHITESPACE_DENY:
> case URI_WHITESPACE_STRIP:
> default:
> {
> char *q = static_cast<char*>(xmalloc(strlen(uri)*3 + 1));
> const char *t = uri;
> while (*t) {
> if (!xisspace(*t))
> *q++ = *t;
> t++;
> }
> *q = '\0';
> http->log_uri = xstrndup(rfc1738_escape_unescaped(q), MAX_URL);
> xfree(q);
> }
> }
>
>
>
> Amos
>
Received on Thu Mar 03 2011 - 16:24:11 MST
This archive was generated by hypermail 2.2.0 : Fri Mar 04 2011 - 12:00:02 MST