On 09/14/2012 01:23 AM, Kinkie wrote:
> 3. define a sane overall syntax, ignoring backwards compatibility,
I am not sure others would support such a significant change/effort, but
here is how I would approach this:
I. Change parsing code so that all options have to call something like
ConfigParser::nextElement()
rather than calling one of
strtok(NULL, w_space),
strtok(NULL, my_special_pattern), or
ConfigParser::strtokFile()
functions and their combinations or variations. Converging on a
well-defined element extraction API will require non-trivial adjustment
of some of the parsing code, especially the one that uses custom
strtok() patterns, such as "eol" parsing code. The syntax of some of
those eol options will be changed.
We may need to add a robust block-quoting mechanism to handle inclusion
of HTML and other quote-reach text (see err_html_text for example).
Alternatively, such directives should be converted to load their text
from a file.
II. Change ConfigParser and friends to follow these rules:
0. Preprocessing.
line = TBD but no changes expected here; we will continue
to support line continuations using backslashes.
comment = prefix-comment / suffix-comment
prefix-comment = a line that starts with optional whitespace
followed by <#>
suffix-comment = optional whitespace followed by <#>,
followed by optional whitespace and end of line
Comments are stripped first. Continuation lines are then merged if
needed and fed into stage-1 parser described below.
ConfigParser::nextElement() returns nil at the end of a [merged] line.
1. Structure.
config = *( directive / whitespace )
directive = token *( word / whitespace )
At this stage, the parsing is applied to [possibly merged] lines. That
is, there are no new line characters at this stage. Whitespace is ignored.
2. Word syntax.
word := token / single_quoted_string / double_quoted_string
token := 1*tchar
single_quoted_string := <'> *(sqchar / escaped-pair) <'>
double_quoted_string := <"> *(dqchar / escaped-pair) <">
tchar = any char except whitespace, quotes, and backslash
sqchar = any char except single quote and backslash
dqchar = any char except double quote and backslash
escaped-pair = backslash followed by any char except new line
The quotes surrounding quoted strings are removed before the word is
returned to the higher-level code. However, their presence is remembered
in the word flags as it is significant for word interpretation described
below.
Legal backslashes are removed before the word is returned to the
higher-level code. TBD: This is not exactly true because "\$macro" is
not a macro but "$macro" is.
At the expense of some backward compatibility, we can exclude more
special characters from tokens (e.g., we can exclude parenthesis and
various operator signs in case we decide to support arithmetic or logic
operations later). Alternatively, we can declare certain tokens reserved
for future use.
3. Word interpretation.
The following rules are used to go from a syntax-level "word" (a
sequence of characters) to a semantics-level "element" object returned
by ConfigParser::nextElement().
* Words that start with a 5-letter "file:" prefix are interpreted as
file names. The corresponding file is loaded and ran through the
preprocessor. Each line in that file is then interpreted as a single
word. These words are returned via ConfigParser::nextElement() API,
transparently to the caller. TBD: Detail and explain that from-file word
syntax is different from #2 word syntax above because multiple tokens on
one line are interpreted as a single word even if they are not quoted.
TBD: We should honor quoted lines, but do we honor line continuations in
these files?
* Double-quoted words are checked for macros. Macros are prohibited in
directive names. It is also an error to specify a macro that the
corresponding squid.conf directive parameter does not support. The
directive and the macros determine exactly when and how the macros are
expanded. TBD: Detail $macro and ${macro(parameters)} syntax.
* Tokens and single-quoted words are not checked for macros.
TBD: We may want to interpret \n and \r inside double-quoted strings
specially so that it is possible to include new lines in directive
parameters. It may make sense to reserve other \<alphanumeric> sequences
as well.
The above needs more polishing and detail, but should be consistent and
mostly backward compatible. The syntax will handle ACL values with
spaces. Is that something we should move towards?
Thank you,
Alex.
Received on Fri Sep 14 2012 - 16:10:25 MDT
This archive was generated by hypermail 2.2.0 : Sat Sep 15 2012 - 12:00:06 MDT