On Wed, 4 May 2005, Chris Robertson wrote:
> For the record, something like:
>
> ^[^?]*sex[^?]*
>
> should fit the bill(1), but would be an absolute CPU hog. Not to mention
> that you would have to do something to match other "bad" words (e.g. one
> regex for each word, or a single really long regex utilizing the "or"
> operator).
>
> Chris
>
> (1) If my logic is correct, this statement should translate to "From the
> beginning of the line, match anything but a question mark zero or more
> times, the letters s, e and x all together and anything but a question mark
> zero or more times". Not pretty, but it would do what you are asking.
Correct. But you should also realise that the letters s e x is not equal
to the word sex. The letters s e x is part of many other words not at all
related to sex (the most common example is the city Sussex).
GNU regex has magic patterns for matching word boundaries which can do
wonders at improving the accuracy of word based patters.
There are two special cases(!) of bracket expressions: the
bracket expressions `[[:<:]]' and `[[:>:]]' match the null
string at the beginning and end of a word respectively. A
word is defined as a sequence of word characters which is
neither preceded nor followed by word characters. A word
character is an alnum character (as defined by wctype(3)) or
an underscore. This is an extension, compatible with but not
specified by POSIX 1003.2, and should be used with caution in
software intended to be portable to other systems.
man 7 regex
or
info regex
for details on the regex language on your system.
Regards
Henrik
Received on Mon May 16 2005 - 15:34:14 MDT
This archive was generated by hypermail pre-2.1.9 : Wed Jun 01 2005 - 12:00:03 MDT