On 9/4/07, Gustav Munkby <grddev / gmail.com> wrote:
> > CPU(user+sys,sec)   mem(peak,MB)
> > ------------------- ------------
> >  build  sort  total build  sort  class
> >  -----  ----  ----- -----  ----  -----
> >   0.10  1.70   1.80   287  1327  String
> >   0.01  0.27   0.28    22   153  Porth::Rope
> >   0.02  0.83   0.85    22    34  Choudhury::Rope *
> >   0.02  0.06   0.08    22    29  Mahurin::StringRope +
> >   0.00  0.08   0.08    22    30  Mahurin::DenormalStringRope +
> >   0.02  0.14   0.16    22    29  Mahurin::MutableStringRope
> >   0.07  0.10   0.17   151   151  Munkby::ArrayRope
> >   0.02  0.73   0.75    22   655  Kalenkovich::Rope *
>
> Thanks for running the tests!  I do think it is a shame, however,
> that the table does not include the notion of safety that I mentioned
> in my submission. If any of these classes should every go into a
> gem as ara mentioned, the user of that gem at least ought to be
> able to choose the safe version.
>
> Here is a very simple scenario where such a problem is introduced.
> One could probably do worse changes to the source strings, but this
> was the first example I came up with.
>
> ss = %w[test append]
> r1, r2 = ss.map {|x| Mahurin::StringRope.new(x) }
> r1 <<= r2
> ss.first << "ing"
> r1.length # => 10
> r1.to_s.length # => 13
>
> I have only tested Mahurin::StringRope, but would like to know for
> which other classes I need to be extra careful when adding strings
> to the rope. I did find, however, that by removing my "safety dups",
> the combined running time (on my machine) is about the same for
> my solution as for Mahurin::StringRope.

My classes that do the work are are immutable (there is a wrapper
class that makes a mutable rope).  I assumed that original data you
give (strings) won't change.  The caller should know whether the input
data will change or not and dup if necessary.

Some of the methods in your ArrayRope do modify the underlying
strings.  Because of that, you really need to dup the input.  But, you
could easily fix those few methods to dup when you want to make a
change.  I made a version of your code that doesn't do the dups up
front and benchmarked it.

I assume most of the other mutable rope solutions have similar
problems and need to implement a proper COW (copy-on-write) solution.

At first I thought your solution was linear to do a slice, but then I
noticed that you do a binary search.  I applaud your solution.  I
think many C++ STL deque implementations use this same scheme (two
level array).  Maybe that is why a rope isn't officially in STL -
because deque is good enough.

Here are the new results along with your non-dup solution and
Kalenkovich's fixes:

CPU(user+sys,sec)   mem(peak,MB)
------------------- ------------
 build  sort  total build  sort  class
 -----  ----  ----- -----  ----  -----
  0.10  1.70   1.80   287  1327  String
  0.01  0.27   0.28    22   153  Porth::Rope
  0.02  0.83   0.85    22    34  Choudhury::Rope *
  0.02  0.06   0.08    22    29  Mahurin::StringRope +
  0.00  0.08   0.08    22    30  Mahurin::DenormalStringRope +
  0.02  0.14   0.16    22    29  Mahurin::MutableStringRope
  0.07  0.10   0.17   151   151  Munkby::ArrayRope
  0.00  0.09   0.09    22    22  Munkby::ArrayRope (no dup)
  0.01  0.12   0.13    22    22  Kalenkovich::Rope (fixed)

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

* : self-checking failed
+ : immutable benchmark needed


As Eugene Kalenkovich mentioned my benchmark test shouldn't have
excepted self to be returned from normalize for mutable ropes.  I
fixed this in my test.