On 27/09/2013 5:23 p.m., Alex Rousskov wrote:
> On 09/26/2013 12:13 PM, Amos Jeffries wrote:
>> On 09/26/2013, Alex Rousskov wrote:
>>> The only real problem with /re/ syntax as the default is that it does
>>> not work well with URLs, which are very common in Squid patterns. That
>>> is why I think a string-based "re" may be a better default for Squid.
>
>> Which menas that is make escaping mandatory in one form or another.
>> Which is giant leap #1 down the slipery slope towards
>> "/http:\\\/\\\/foo\\/i broke
>> it\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/?how/"
> The "re" syntax does not use / characters at all so your example, if I
> understand it correctly, would be written as
>
> "http://foo/i broke it how"
:-) "if I understand it correctly" is precisely my point. Even you with
your skills reading syntax have to go slowly and you still fell for the
trap. There are only two passes of unescaping done, one in quoted-string
squid parser and one in the library. The remaining bytes used for
pattern pattern should match URLs with 16 bare '\' characters but only
if the match is in the path segment of the target URL.
ie these exact two sub-strings:
/http://foo/i broke it\\\\\\\\\\\\\\\\/how/
/http://foo/i broke it\\\\\\\\\\\\\\\how/
Mistakes here are *very* common and like the trap I set for you above
easily caused in a large number of ways. Lets go out of our way to avoid
needing escaping at all in squid.conf patterns. It is long overdue time
to end the nesting escape madness.
We have not yet fallen down that slippery slope, so lets not do it now.
>
>> With string based or any other delimiter (including '/') we cannot
>> differentiate the pattern token from the delimiter token without
>> escaping the pattern token, then any escape-characters in the pattern as
>> well.
> Sure, but that includes delimiters like () and []. The best solution for
> that problem that I know of is to allow folks to use the delimiter that
> they want (probably because it does not occur in their specific RE).
> Perl uses that approach, and I personally use that Perl feature
> frequently. I can think of only one other alternative (approach 1
> described below) but it seems too complex to me.
>
>
>> Given your code expertise you have possibly read the same or
>> similar language design document I did about this problem.
> Sorry, I do not know which document you are talking about, but I would
> be very happy to read it, especially if it proves me wrong.
>
>
>> Using () brackets or [] brackets we get that nice pairing guarantee
>> from regex (in all the flavours I'm aware of) and can apply the above
>> mentioned algorithm without any escaping necessary at the squid.conf
>> level.
> Are you proposing to use the regular expression library itself (or an
> equivalent hand-written code) to extract regular expressions from
> squid.conf? That is the only case where the RE syntax helps guarantee
> something. In all other cases, before the regex library gets the regular
> expression and can guarantee anything, Squid has to extract that
> expression from squid.conf.
>
> There are two ways Squid can extract a regular expression from squid.conf:
>
> 1) By understanding full regular expression syntax. This is doable, but
> is not currently supported and is not easy to support correctly (unless
> the RE library exposes such parsing support for us). This does not
> require escaping RE parts that confuse the Squid parser because there
> are no such parts -- the Squid parser becomes fully RE-aware itself!
>
> 2) By only understanding squid.conf expression syntax. This is what we
> currently support (albeit with poor syntax) and it is relatively easy to
> support as long as we keep the syntax simple. This does require escaping
> RE parts that confuse the Squid parser (often resulting in double
> escaping unless mitigated by a configurable RE delimiter).
>
> For example, consider the following regular expression that starts with
> a letter "e" and ends with a right square bracket "]". This example RE
> matches one of three sequences of 5 characters such as "ends)".
>
> ends[) (]
>
> Using approach (1), we could write
>
> acl foo url_regex (ends[) (])
>
> and the Squid parser would use the outer parenthesis to find the end of
> the regular expression and would identify that the parenthesis and space
> inside the square brackets is a part of that regular expression. No
> problem (except that implementing such a parser is probably very difficult).
>
>
> Using approach (2) with parens as fixed RE delimiters, we could write
>
> acl foo url_regex (ends[\) \(])
>
> and the Squid parser would use the outer parenthesis to find the end of
> the regular expression and would unescape and ignore the parenthesis
> inside the square brackets. No problem (except that the admin must
> escape those inner parenthesis, which quickly become tricky when the RE
> itself uses parenthesis for grouping.
>
> Using approach (2) with flexible RE delimiter, we could write
>
> acl foo url_regex /ends[) (]/
> or
> acl foo url_regex {ends[) (]}
> or
> acl foo url_regex @ends[) (]@
>
> and it will all work without double escaping.
>
>
> And, just for completeness sake, I would mention that we do need some RE
> delimiter in this case because without it any parser would see two
> [invalid] regular expressions separated by space instead of one:
>
> acl foo url_regex ends[) (]
>
>
> I hope the above illustrates why no single fixed RE delimiter can solve
> the double escaping problem (and why the RE syntax itself does not help)
> unless we start supporting full RE syntax inside the squid.conf parser
> itself.
>
>
> Cheers,
>
> Alex.
>
Received on Fri Sep 27 2013 - 07:05:12 MDT
This archive was generated by hypermail 2.2.0 : Fri Sep 27 2013 - 12:00:11 MDT