RE: [squid-users] How can I ignore 'form inputs' on a urlpath_reg ex ? from Henrik Nordstrom on 2005-05-16 (squid-users)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 16 May 2005 23:34:12 +0200 (CEST)

On Wed, 4 May 2005, Chris Robertson wrote:

> For the record, something like:
>
> ^[^?]*sex[^?]*
>
> should fit the bill(1), but would be an absolute CPU hog. Not to mention
> that you would have to do something to match other "bad" words (e.g. one
> regex for each word, or a single really long regex utilizing the "or"
> operator).
>
> Chris
>
> (1) If my logic is correct, this statement should translate to "From the
> beginning of the line, match anything but a question mark zero or more
> times, the letters s, e and x all together and anything but a question mark
> zero or more times". Not pretty, but it would do what you are asking.

Correct. But you should also realise that the letters s e x is not equal
to the word sex. The letters s e x is part of many other words not at all
related to sex (the most common example is the city Sussex).

GNU regex has magic patterns for matching word boundaries which can do
wonders at improving the accuracy of word based patters.

        There are two special cases(!) of bracket expressions: the
        bracket expressions `[[:<:]]' and `[[:>:]]' match the null
        string at the beginning and end of a word respectively. A
        word is defined as a sequence of word characters which is
        neither preceded nor followed by word characters. A word
        character is an alnum character (as defined by wctype(3)) or
        an underscore. This is an extension, compatible with but not
        specified by POSIX 1003.2, and should be used with caution in
        software intended to be portable to other systems.

man 7 regex

or
info regex

for details on the regex language on your system.

Regards
Henrik
Received on Mon May 16 2005 - 15:34:14 MDT

This archive was generated by hypermail pre-2.1.9 : Wed Jun 01 2005 - 12:00:03 MDT