On 12/15/2013 04:54 AM, Kinkie wrote:
> attached patch is from branch lp:~squid/squid/characterset/
> + std::vector<bool> chars_; //std::vector<bool> is optimized
According to STL documentation for std::vector<bool> specialization:
> These changes provide a quirky interface to this specialization and
> favor memory optimization over processing (which may or may not suit
> your needs).
This is kind of the opposite of what we want. We want access speed, not
RAM or copying speed savings. I suspect we do not want std::vector<bool>
here! The same docs suggest using a different element type (char,
unsigned char, or custom) if a different optimization trade-off is
needed. I think we should follow that advice if we want to optimize this
(which was the primary point of doing a custom CharacterSet).
> It extracts some of the work done in parser-ng to implement Alex'
> Tokenizer API, bringing O(N) find-any{,-not}-of matching to SBuf.
> Note: in IRC discussions, Amos mentioned find_first_of and
> find_first_not_of not being naming convention compliant (that's right,
> they are modeled after std::string). I'm fine with changing them as a
> follow-up commit if this is the collegial decision; I'd avoid do to it
> as part of this merge.
I agree that the two should be renamed to use camelCase [as a follow up
commit].
> + * \return length() if all characters in the SBuf are from set
and the implementation matches that documentation:
> +SBuf::size_type
> +SBuf::find_first_not_of(const CharacterSet &set, size_type startPos) const
> +{
...
> + debugs(24, 7, "not found");
> + return length();
> +}
but STL docs say:
> Return Value
> The position of the first character that does not match.
> If no such characters are found, the function returns string::npos.
which makes more sense than returning length() IMO.
> + CharacterSet(const char *label, const char * const c) : name(label), chars_(std::vector<bool>(256,false)) {
> + size_t clen = strlen(c);
> + for (size_t i = 0; i < clen; ++i)
> + chars_[static_cast<uint8_t>(c[i])] = true;
> + }
Since we are using set.name [for debugging] without checking, please set
name to "anonymous" or similar if the label parameter is NULL.
> + size_t clen = strlen(c);
This variable can be const.
> + size_t clen = strlen(c);
> + for (size_t i = 0; i < clen; ++i)
> + chars_[static_cast<uint8_t>(c[i])] = true;
Please use add() to add characters to a set.
> + /// add a given char to the character set. c must be >= 0.
> + CharacterSet & add(const unsigned char c) {chars_[static_cast<uint8_t>(c)] = true; return *this; }
The "c must be >= 0" comment is redundant given that "c" is unsigned. If
you are worried about folks adding negative signed chars, try adding a
corresponding add(const signed char) method with a check.
> + /// name of this character set
> + const char * name;
The comment does not really document the data member (the data member
name itself says exactly the same thing already). Consider:
/// optional set label for debugging ("anonymous" by default)
> +#include <vector>
> +
> +class CharacterSet
Missing class description. Consider:
/// An optimized set of C chars,
/// with a quick membership test and merge support.
> + /// characters defined in this set
> + std::vector<bool> chars_; //std::vector<bool> is optimized
s/defined/present/ or s/defined/included/
* CharacterSet creation and merging (+=) are heavy operations that do
not benefit from being inlined. I know it is a pain to deal with
Makefile dependencies, and I do not insist on this change, but please do
seriously consider placing them in .cc file.
FWIW, the following problems can be solved easier or better in a .cc
file. With a little bit of additional work, you can even remove #include
<vector> from CharacterSet users that way!
> + //precondition: src.chars_.size() == chars_.size()
There is no such precondition. I should be able to merge ALPHA and
NUMERIC sets, for example. Rewrite the += operator loop to simply
this->add() every character in the src set. Use std::for_each or another
<algorithm> for that if possible.
> + CharacterSet(const char *label, const char * const c) : name(label), chars_(std::vector<bool>(256,false)) {
> + std::vector<bool>::const_iterator s = src.chars_.begin();
> + const std::vector<bool>::const_iterator e = src.chars_.end();
> + std::vector<bool> chars_; //std::vector<bool> is optimized
Please avoid code duplication by typedef-ing std::vector<bool> as Chars
or similar and using the corresponding type name everywhere.
HTH,
Alex.
Received on Mon Dec 16 2013 - 16:41:11 MST
This archive was generated by hypermail 2.2.0 : Tue Dec 17 2013 - 12:00:11 MST