Forgot to respond to the previous post.
Apologies on my part, as well.

In data 3/20/2005, "Nikolai Weibull"
<mailing-lists.ruby-talk / rawuncut.elitemail.org> ha scritto:

>* Glenn Parker (Mar 20, 2005 17:10):
>> > > > Well, the problem is that a String is represented by one long
>> > > > sequence of bytes in memory.
>
>> > > AFAIK, that is not actually a requirement for a Ruby
>> > > implementation.
>
>> > What does that mean?  That is how it's implemented in Ruby.  There's
>> > no definition of how Ruby should be implemented.
>
>> Sure there is: a Ruby implementation should implement the Ruby
>> language as currently defined.  What I'm saying is that there is
>> nothing in the definition of the Ruby language that forces an
>> implementor to use contiguous blocks of memory for strings or arrays.
>
>Well, yes there is.  You are forgetting the C interface:
>RSTRING(str)->ptr is used in a lot of modules.

YARV could be considered an implementation of ruby, so it's
not _necessarily_ the same although this is just a semantic
point.

>> Managing large strings as linked chunks is something text editors
>> already do for performance.
>
>It's not a very good representation for a text editor, though.  The
>lookup method becomes O(n), which is just terrible.  Read [1] for an
>overview of representing sequences.  I have already mentioned Ropes a
>couple of times as a possible solution to the problem of dealing with
>large strings.
>
>> Assertions about how strings should or should not be used are based,
>> in part, on assumptions that should be questioned.
>
>Yes, but the whole point is that they haven't and it seems that people
>don't realize how their implementation limits their use.  A String is
>simply not well-suited for sequences longer than some certain limit
>(depending on system and so on).  There are ways to alleviate these
>issues.  For me, the most obviously good solution would be to add
>another sequence data structure that would fit use-cases where long
>sequences are expected.

Ropes are used fairly widely and would be a good solution to this
problem. In addition, I see no reason why Strings and Ropes could
not use the same idiom as Fixnum and Bignum, where the former is
automatically converted (but the latter may also be explicitly
instantiated).

>> > The implementation is the standard.
>
>> I would rather say: The most widely used implementation is the de
>> facto standard.  :)
>
>Which, in this case, is like saying what I already said.
>
>> > That isn't to say that they are the final word of how a String
>> > should be represented, but if it will, in a future release, be
>> > represented in some other way, perhaps having destructive versus
>> > non-destructive methods will make no sense,
>
>> I think what I'm suggesting would actually make destructive methods
>> more appealing, without reducing the object creation overhead for
>> non-destructive methods.
>
>I don't see how.  One of the gains of breaking the string up is that
>parts of it can be reused in strings based on the original, e.g.,
>"abc" + "def" can be implemented as an O(1)-operation where we simply
>create a string that points to the two substrings.  slice can be
>implemented by pointing into the string we are taking a slice of and
>slice! can be implemented by breaking the string up and pointing to the
>two parts of the string that remain after the deletion.  Implementing
>slice and slice! in this manner means that they will be equally
>efficient, both time- and space-wise.
>
>See [1] and [2] for further discussions on the ideas I have presented
>here.
>
>A representation of a sequence as a sequence of continuous bytes in
>memory is very efficient for lookups and it's space optimal.  Most
>operations will perform very efficiently for shorter sequences as well,
>but degrades horribly for long sequences.  I am not advocating that the
>"sequence of continuous bytes" implementation for String be replaced, it
>fits very well with many use-cases.  What I am advocating is that we
>should realize that it doesn't fit every case and that a second sequence
>data structure should be added for dealing with the cases where a String
>isn't optimal.  I feel that most of the destructive methods on Strings
>are optimizing the wrong problem.  It's not excessive copying that
>should be avoided, but excessive computation.
>
>My solution will only solve the destructive vs. non-destructive problem
>for those methods that deal with substring of the original string, e.g.,
>concat, chomp, delete, gsub, lstrip, rstrip, slice, strip, sub.  For
>capitalize, downcase, swapcase, tr, tr_s, and upcase, one would still
>have to create a new copy.  I wonder, though, how often the difference
>between capitalize and capitalize! in execution speed actually makes a
>difference,

Interesting... I certainly see the validity of your call to optimize
the nondestructive operations; I'm sure the performance can be reasonably
close to destructive when properly designed. However, it will still
have an impact in _some_ programs even when we're dealing with these
minimal differences.

And I still assert that it's conceptually better to s.chomp! than
to s = s.chomp :)

>        nikolai
>
>[1] http://citeseer.ist.psu.edu/crowley98data.html
>[2] http://rubyurl.com/rubyurl/show/2FRbO (PDF)

E