On Wed, 31 Aug 2005, Darryl L. Miles wrote:
> My config entry (base on the example for Apache common format):
>
> logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st 
> "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
>
>
> The problem affect my logfile stats program being unable to parse the line. 
> Looks like someone is trawling for an awstats.pl bug.  An example entry is:
>
> WARN:a1cpu4.bz.log:1786006 parse error for length at w;wget"
> WARN: 213.61.102.218 - - [15/Aug/2005:22:39:01 +0100] "GET 
> http://62.XX.XX.109//awstats.pl"w;wget" HTTP/1.1" 404 454 "-" 
> "Mozilla/4.0(compatible; MSIE 6.0; Windows 98)" TCP_MISS:DIRECT
>
>
> What I expected to see was:
>
> "GET http://62.XX.XX.109//awstats.pl"w;wget" HTTP/1.1"
>
> into (with additional \ character) which would be what Apache does:
>
> "GET http://62.XX.XX.109//awstats.pl\"w;wget" HTTP/1.1"
Right, the quoting selection magics currently doesn't handle this case 
very well.. defaulting to use no quoting of the URL data.
You should get the expected output if explicitly select the quoted 
string output format for the URL field:
logformat combined %>a %ui %un [%tl] "%rm %"ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
The attached incremental patch should correct the logformat directive to 
automatically use quoted string escaping on any format element found 
within a quoted string (not only when the quotes is immediately around the 
item as in the Referer and User-Agent cases), and similarily for braketed 
items. I have also tried to make the description of the format selectors 
perhaps a little easier to understand.
Regards
Henrik
This archive was generated by hypermail pre-2.1.9 : Sat Oct 01 2005 - 12:00:03 MDT