Hi all,
I am attaching a new patch which try to handle the reported problems.
Just adding a new method which do special parsing for tokens like the
regex has many effects, for example the ConfigParser::TokenUndo
mechanism can not work.
This patch:
1) By default disables quoted tokens
("configuration_includes_quoted_values off")
2) If configuration_includes_quoted_values is off the quoted tokens
parsed using the ConfigParser::NextToken include the quotes, to keep
compatibility with older releases.
3) For the cases where quoted strings are required (wordlists, Notes
parsing, Headers with acl), the new ConfigParser::NextQuotedToken method
added.
The old wordlists parser allowed escaping any character, this patch
will return an error if you try to escape alphanumeric characters. The
\r \n and \t have the C semantics.
4) Add the ConfigParser::RegexPattern() to get the next regex token
5) Add the ConfigParser::RegexStrtokFile() to get the next regex token
which is compatible with the old strtokFile
6) Removes the ConfigParser::TokenUndo method. The new method
ConfigParser::NextTokenPreview() which can be used to preview the next
token is added. This method if the next token is invalid (eg unquoted
with special characters) return as token the "SQUID_ERROR_TOKEN" (we do
not want to call self_destruct while previewing next element).
7) In this patch I kept the ConfigParser::TokenPutBack method which is
used in only one place (acl regex). However this method should removed
in the future, with the ConfigParser::Undo_ member and the
ConfigParser::Undo() method
Notes
======
1) The current trunk parser read a line, and the tokens stored in this
line and the line modified while parsed. This patch consider the line we
are parsing as const and stores parsed tokens to
ConfigParser::CfgLineTokens_ std::queue:
- we may need to parse again the line (NextTokenPreview/NextToken) so
we do not want to modify it
- The current line tokens must stored somewhere to support the following:
char *name = ConfigParser::NextToken();
char *value = ConfigParser::NextToken();
The ConfigParser::CfgLineTokens_ emptied every time new config line is read.
2) A set of new flags defined under ConfigParser class to define the
type of parsing: ParseRegex_ (next token is regex) ParseQuotedOrToEOL_
(next token is quoted or to-EOL), PreviewMode_ (just do preview do not
pop token)
This method is not the best, but it is not so bad....
3) The goal of new parser was to have a small and simple parser, but now
looks very complex. But it very is difficult to keep compatibility with
a simple parser.
Probably we will need to re-implement it after the old configuration
file style support removed from squid.
On 07/31/2013 07:59 PM, Alex Rousskov wrote:
> Christos, Amos,
>
> Thank you for working on this! I hope we can avoid repeating the
> same set of mistakes thrice by carefully considering the big picture and
> upgrade path. Here is my understanding of how we want things to work,
> based on the review of the reported bugs and this thread so far:
>
>
> 1. configuration_includes_quoted_values defaults to off. The setting
> scope extends from the place where the directive is used to either the
> end of configuration or to the next use, whichever comes first.
OK this patch make configuration_includes_quoted_values defaults to off
>
>
> 2. When configuration_includes_quoted_values is on, new "strict syntax"
> rules are enforced:
>
> 2a. "quoted values" and function()s are supported. %Macros are supported
> inside quoted values and only inside them. Unknown functions and macros
> terminate Squid. Code uses NextToken() to get tokens by default. A small
> subset of hand-picked directives may use NextQuotedOrToEol() or other
> special methods instead of NextToken() to accommodate more legacy cases.
> Logformat is one such example.
I think this patch cover this requirement.
>
> 2b. A % sign can be \-escaped to block macro expansion in quoted strings
> where needed. A few "standard" escape sequences are supported such as
> \n. Unknown escape sequences terminate Squid.
Any not alphanumeric character can be escaped in this patch.
Also the \n\t\r are supported.
We can add/remove more if required.
>
> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
> tokens containing any character other than alphanumeric, underscore,
> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.
OK.
In this patch allow the following ".,()-=_%/:"
We can easily add remove characters from this list.
If we remove from this list the ':' then we should require quotes to
define http, icap or ecap urls.
The % used to define percent values and the '/' to define filenames.
>
> 3. When configuration_includes_quoted_values is off, old "legacy syntax"
> is supported, to the extent possible:
>
> 3a. "quoted values", functions(), and %macros are not recognized or
> treated specially compared to Squid v3.3. Code uses NextToken() to get
> tokens by default. A small subset of hand-picked directives may use
> NextQuotedOrToEol() instead of NextToken() but that method does _not_
> treat quoted strings specially in legacy mode. Logformat is one such
> example.
OK on this.
I just add the NextQuotedToken to support quoted tokens in "legacy
syntax" mode.
We need it in wordlist tokens, Notes and Header with acls....
>
> 3b. I am not sure about "file.cfg" include syntax in legacy mode. I
> think it would be OK _not_ to support it (because fixing configurations
> to use new parameters() syntax should not be very difficult in most
> cases), but I may be wrong. If it is easy to continue to support
> "file.cfg" include style in NextToken() working in legacy mode, then we
> should do it.
This patch supports "file.cfg" in legacy mode.
>
>
> 4. Exceptions: A small set of directives that already supported quoted
> strings correctly before the introduction of
> configuration_includes_quoted_values must continue to support them.
> These directives should call NextQuotedOrLegacy() method. The method
> temporary forces configuration_includes_quoted_values to ON if and only
> if the next token starts with a quote.
I used the NextQuotedToken for this ...
>
> This may create a few upgrade problems when, for example, somebody is
> using unsupported %macros inside quoted strings with these options, but
> the support for quoted strings in those headers was added relatively
> recently so there should not be many such cases, and it is not practical
> to treat these options as a complex third class. They have to obey all
> strict syntax rules when they use quoted strings (and also when
> configuration_includes_quoted_values is on, of course).
>
>
> 5. Future changes:
>
> 5a. We revisit handling of 'single quoted strings' later, after the
> above is done. They are useful to disable macros and standard escape
> sequences in strings, but they are not a "must have" for now, and we
> need to decide how to treat \-escapes inside them. We will probably
> allow escaping of two characters only: \ and '.
>
> 5b. We revisit handling of REs, after the above is done. We will
> probably add a proper dedicated /quoting mechanism/ for them. For now,
> most REs require configuration_includes_quoted_values set to off.
>
> 5c. I recommend combining 5a and 5b implementation to support a simple
> generic quoting mechanism with an admin-selectable quoting character.
> That is almost a must for REs but also help with general quoting of
> complex expressions. See "Quote and Quote-like Operators" in Perl
> (perlop man page) for ideas (but we only need a few lines from that
> table). Note how 2c facilitates this future change by immediately
> prohibiting unquoted tokens that may become quoted strings later. For
> example q{foobar} is prohibited if configuration_includes_quoted_values
> is on.
>
>
> Anything I missed or misrepresented?
>
>
> Thank you,
>
> Alex.
>
This archive was generated by hypermail 2.2.0 : Wed Aug 14 2013 - 12:00:17 MDT