Re: pseudo-specs for a String class

From: Henrik Nordstrom <henrik_at_henriknordstrom.net>
Date: Wed, 27 Aug 2008 01:33:47 +0200

On ons, 2008-08-27 at 00:24 +0200, Kinkie wrote:

> This is quite different from my current approach, by which Strings get
> created and drive the instantiations of Bufs (MemoryRegions).
> I feel that you'd be trying to reimplement parts of the memory
> manager. Maximum efficiency, at the expense of quite a bit of
> flexibility.

MemoryChunk (not Region).

Both modes is needed. It depends on the use.

String will create the MemoryChunk automatically if passed the data.

But some data sources such as networking has other needs and works
better the other way around, providing the data and then creating
Strings from that same buffer. But yes, it's possible to build an
interface for this using only String by introducing a special truncate
operation which frees data in the MemoryChunk (Buf) via String but it
exposes an operation which is not always safe..

> Hm... interesting for annotation purposes, but is it really significant?

The difference between String and MemoryRegion? Not sure. But it also
doesn't hurt as you can cast freely between the two (even when using
references).

> My thoughts: \0 is special, and would only be significant when strings
> need to be exported from the memory-managed code onto nonmanaged code.

Yes.

> Generally speaking, the safest way to do so is by copy rather than by
> reference, but I'd rather also keep the ability to export by reference
> - hoping the caller knows what they're doing. In that case the \0 is a
> must-have safeguard, in some cases might require copying. Unfortunate
> but unavoidable.

Agreed.

> > I think we are at the point
> > where we can fully drop the \0 without too much headache, but but it's
> > also true that in all cases where we tokenise a string there is
> > separators we can nuke and replace by \0's... However, with the \0
> > casting between MemoryRegion and String is tricky (needs to copy if
> > there is no \0) and tokenising gets destructive as it destroys the
> > original string by replacing separators by \0..
>
> Well, tokenising should be replaced by substringing really.. it could
> mean having to drop strtok().

substringing is a form of tokenising. Split a long String in it's
components. How that's done is an implementation detail.

> > Other modifications of String/MemoryRegion content generally requires a
> > COW operation.
>
> It depends: I expect a rather common case to be when only one String
> owns a Buf/MemoryBlob. In that case modifications are cheap.

That's a very common COW optimization, and assumed..

Regards
Henrik

Received on Tue Aug 26 2008 - 23:33:56 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 27 2008 - 12:00:06 MDT