My opinions on the subject:
Buffer operates on octets and nothing but octets, where each octet is a
8-bit unsigned integer.
String is encoding aware, decomposing those octets into characters.
But I don't see why we would ever need to support UCS-2 or other
multi-byte encodings. As far as the scope of HTTP and related protocols
strings are either US-ASCII, UTF-8 or Latin-1, which all fits nice in
the octet world. We also do not need encoding aware upper/lower case
distinction, only US-ASCII case awareness.
I also agree with Alex that there is no need for < > or == in buffers as
such. A string is fully capable of holding a binary blob. These
operators should in our context always map to memcmp().
The only difference wrt < > == class of operators for binary regions or
strings is the ability of case-insensitive operations. But for case
insensitive operations other operators should be used. Which leaves them
the same in both contexts.
>From the discussion it's apparent to me that the current naming
convention isn't the best. Buffer should be String.
The design I'd like to see is
- Low-level refcounted memory area (address, size, refcount).
- Memory area "splitter". (memory area, current used offset). Helper
class for producing Buffer regions. This is the primary interface for
producing Buffer regions.
- Buffer, region of a memory area. (memory area, offset, size).
- String, subclass of buffer adding < > == and strstr operators, plus
case-insensitive variants of == and strstr operators (and maybe < > as
well). No additional data members.
- A StringV container class allowing large strings to built from a list
of String:s, supporting vector access (for I/O), incremental strstr
searches (with a separate state class) and extracting regions as String
or StringV. Extracting as String may need a copy if the requested area
is not linear in memory.
Of these only the low-level memory area and perhaps StringV needs to be
refcounted. Buffer & String is small enough to be copied, or passed as a
const reference in most function/method calls.
I am not 100% sure on the placement of String. It's possible this should
be fully merged into Buffer, but I think it provides a good separation.
It's entirely possible we will end up using String all over the place
with Buffer just being used internally to String and some very low-level
I/O stuff.
Regards
Henrik
Received on Wed Jan 21 2009 - 10:41:43 MST
This archive was generated by hypermail 2.2.0 : Wed Jan 21 2009 - 12:00:26 MST