------art_17604_4118040.1188812565770
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 8/31/07, Ruby Quiz <james / grayproductions.net> wrote:
> by John Miller
>
> This week's task is to implement the Rope data structure as a Ruby class.  This
> topic comes out of the ICFP programming competition
> (http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
> character string this year.

My solution deviates slightly from the problem specification.
The most important difference is that instead of implementing
a binary tree to store the strings, they are all stored in an
array instead.

The class itself is quite long, since I wanted to implement
many of the methods of the built-in String class. Some of the
methods will require significant work to actually implement.
Most notably the Regexp-related methods, since there is no
way to instruct the Regexp matcher to ask for more characters
once it has reached the end of a string.

Actually, all String methods are implemented, by implementing
method missing and delegating to the composed string if the
string class can handle the method. After delegating to the
string class, we convert any String result to a new rope and
we also make sure to replace our content by the string we
delegated to, to make sure that destructive methods works as
expected.

In fact, we replace the content of our rope as soon as to_s
is called. since the reason for  lazy concatenation is to
avoid the concatenation cost, we can just as well keep the
concatenated string when we already had to pay the price.

The benchmark results are:

# String
Build:   0.170000   0.150000   0.320000 (  0.324341)
Sort:   3.470000   3.630000   7.100000 (  7.126741)

# ArrayRope
Build:   0.010000   0.010000   0.020000 (  0.009744)
Sort:   0.130000   0.000000   0.130000 (  0.138330)

# ArrayRope (with dup)
Build:   0.240000   0.160000   0.400000 (  0.402201)
Sort:   0.160000   0.000000   0.160000 (  0.163022)

For the unprotected case, sorting was ~52 times faster,
and building was ~33 times faster.

However, since the string class is not immutable, there is a
slight problem with this approach. The strings could added
to the rope could be modified "under the hood". We can easily
protect against that by making copies of the strings when we
add them to the rope. Since the built-in String shares the
actual data between the two instances (until they change),
this is not so much of a memory issue as one could expect.

By adding dup (in initialize/append/prepend) we end up with
the following times, which trades of some of our speed/memory
for a bit of safety. (This is actually the submitted solution)

Compared with the string approach, building is now (for obvious
reasons) slower than the String, but only about 25%.
Sorting is still much faster than the String case, namely ~44
times as fast.

!g

------art_17604_4118040.1188812565770
Content-Type: application/octet-stream; name=array_rope.rb
Content-Transfer-Encoding: base64
X-Attachment-Id: f_f64rore4
Content-Disposition: attachment; filename="array_rope.rb"

Y2xhc3MgQXJyYXkKICBkZWYgdXBwZXJfYm91bmQodmFsdWUpCiAgICBsbywgaGkgPSAwLCBzaXpl
LTEKICAgIHdoaWxlIGxvIDw9IGhpCiAgCSAgbWlkID0gKGxvICsgaGkpIC8gMgogIAkgIGlmIHZh
bHVlID49IGF0KG1pZCkKCSAgICAgIGxvID0gbWlkICsgMQogIAkgIGVsc2UKCSAgICAgIGhpID0g
bWlkIC0gMQogIAkgIGVuZAogIAllbmQKICAgIGxvCiAgZW5kCmVuZAoKY2xhc3MgQXJyYXlSb3Bl
CiAgYXR0cl9yZWFkZXIgOnNlZ21lbnRzCiAgZGVmIGluaXRpYWxpemUoKnNlZ21lbnRzKQogICAg
cmFpc2Ugc2VnbWVudHMuaW5zcGVjdCBpZiBzZWdtZW50cy5hbnk/IHsgfHN8IHMubmlsPyB9CiAg
ICBAc2VnbWVudHMgPSBzZWdtZW50cy5tYXAgeyB8c3wgcy5kdXAgfQogICAgY29tcHV0ZV9vZmZz
ZXRzCiAgZW5kCgogIGRlZiBjb21wdXRlX29mZnNldHMKICAgIGwgPSAwCiAgICBAb2Zmc2V0cyA9
IEBzZWdtZW50cy5tYXAgeyB8c3wgbCArPSBzLmxlbmd0aCB9CiAgICBAb2Zmc2V0cy51bnNoaWZ0
IDAgICAgCiAgZW5kCgogIGRlZiBsZW5ndGgKICAgIEBvZmZzZXRzLmxhc3QKICBlbmQKICBhbGlh
cyA6c2l6ZSA6bGVuZ3RoCiAgZGVmIGVtcHR5PwogICAgQHNlZ21lbnRzLmVtcHR5PwogIGVuZAoK
ICBkZWYgdG9fcwogICAgY2FzZSBAc2VnbWVudHMuc2l6ZQogICAgd2hlbiAwOyAiIgogICAgd2hl
biAxOyBAc2VnbWVudHMuZmlyc3QuZHVwCiAgICBlbHNlCiAgICAgICMgV2hlbiBjb252ZXJ0aW5n
IHRvIGEgc3RyaW5nLCB3ZSBtdXN0IGRvIHRoZSBleHBlbnNpdmUKICAgICAgIyBjb25jYXRlbmF0
aW9uLCBzbyBsZXRzIGhpamFjayB0aGF0IGluZm9ybWF0aW9uIGFuZCB1c2UgdGhlCiAgICAgICMg
bmV3IHN0cmluZyBpbnRlcm5hbGx5IGFzIHdlbGwuCiAgICAgIHMgPSAoQHNlZ21lbnRzICogIiIp
CiAgICAgIEBzZWdtZW50cywgQG9mZnNldHMgPSBbc10sIFswLHMubGVuZ3RoXQogICAgICBzLmR1
cAogICAgZW5kCiAgZW5kCiAgCiAgZGVmIG1ldGhvZF9taXNzaW5nKG0sICphcmdzLCAmYmxvY2sp
CiAgICAjIE1ha2Ugc3VyZSB3ZSBoYW5kbGUgcmVtYWluaW5nIHN0cmluZyBtZXRob2RzIGFjY29y
ZGluZ2x5CiAgICBpZiAiIi5yZXNwb25kX3RvPyhtKQogICAgICBzID0gdG9fcwogICAgICByID0g
cy5fX3NlbmRfXyhtLCAqYXJncywgJmJsb2NrKQoKICAgICAgIyBHdWVzcyB0aGF0IHRoaXMgaXMg
dGhlIHJpZ2h0IHRoaW5nIHRvIGRvCiAgICAgIGlmIHIuZXF1YWw/IHMKICAgICAgICByID0gc2Vs
ZiAKICAgICAgZWxzaWYgci5pc19hPyBTdHJpbmcKICAgICAgICByID0gQXJyYXlSb3BlLm5ldyBy
CiAgICAgIGVuZAoKICAgICAgIyBUaGUgbWV0aG9kIG1pZ2h0IGNoYW5nZSB0aGUgY29udGVudHMg
b2YgdGhlIHN0cmluZwogICAgICByZXBsYWNlIHMKCiAgICAgIHIKICAgIGVsc2UKICAgICAgc3Vw
ZXIobSwgKmFyZ3MsICZibG9jaykKICAgIGVuZAogIGVuZAogIAogIGRlZiByZXNwb25kc190bz8o
bSkKICAgICMgTWFrZSBzdXJlIHdlIGhhbmRsZSByZW1haW5pbmcgc3RyaW5nIG1ldGhvZHMgYWNj
b3JkaW5nbHkKICAgIHN1cGVyIHx8ICIiLnJlc3BvbmRzX3RvPyhtKQogIGVuZAogIAogIGRlZiBh
cHBlbmQoKnNlZ21lbnRzKQogICAgc2VnbWVudHMuZWFjaCBkbyB8c3wKICAgICAgaWYgcy5pc19h
PyBBcnJheVJvcGUKICAgICAgICBhcHBlbmQgKnMuc2VnbWVudHMKICAgICAgZWxzaWYgIXMuZW1w
dHk/CiAgICAgICAgQHNlZ21lbnRzIDw8IHMuZHVwCiAgICAgICAgQG9mZnNldHMgPDwgQG9mZnNl
dHMubGFzdCArIHMubGVuZ3RoCiAgICAgIGVuZAogICAgZW5kCiAgICBzZWxmCiAgZW5kCiAgYWxp
YXMgOjw8IDphcHBlbmQKICBkZWYgcHJlcGVuZCgqc2VnbWVudHMpCiAgICBzZWdtZW50cy5lYWNo
IGRvIHxzfAogICAgICBpZiBzLmlzX2E/IEFycmF5Um9wZQogICAgICAgIHByZXBlbmQgKnMuc2Vn
bWVudHMKICAgICAgZWxzaWYgIXMuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnVuc2hpZnQgcy5k
dXAKICAgICAgICBjb21wdXRlX29mZnNldHMKICAgICAgZW5kCiAgICBlbmQKICAgIHNlbGYKICBl
bmQKICAKICBkZWYgc2xpY2Uoc3RhcnQsbGVuZ3RoPW5pbCkKICAgIHJldHVybiB0b19zLnNsaWNl
KHN0YXJ0LCBsZW5ndGh8fDApIGlmIHN0YXJ0LmlzX2E/IFJlZ2V4cAogICAgcmV0dXJuIGluZGV4
KHN0YXJ0KSBpZiBzdGFydC5pc19hPyBTdHJpbmcKICAgIGlmICFsZW5ndGggJiYgc3RhcnQuaXNf
YT8oUmFuZ2UpCiAgICAgIGxlbmd0aCA9IHN0YXJ0Lmxhc3QgLSBzdGFydC5maXJzdCAtIChleGNs
dWRlX2VuZD8gPyAxIDogMCkKICAgICAgc3RhcnQgPSBzdGFydC5maXJzdAogICAgZW5kCiAgICBz
dGFydCA9IHNlbGYubGVuZ3RoICsgc3RhcnQgaWYgc3RhcnQgPCAwCiAgICBpZiBzdGFydCA8IDAg
fHwgc3RhcnQgPiBzZWxmLmxlbmd0aAogICAgICBuaWwKICAgIGVsc2lmICFsZW5ndGgKICAgICAg
QHNlZ21lbnRzLmRldGVjdCB7IHxzfCAoc3RhcnQgLT0gcy5sZW5ndGgpIDwgMCB9LnNsaWNlKHN0
YXJ0KQogICAgZWxzZQogICAgICBpbmRleCA9IEBvZmZzZXRzLnVwcGVyX2JvdW5kKHN0YXJ0KSAt
IDEKICAgICAgcm9wZSA9IEFycmF5Um9wZS5uZXcoCiAgICAgICAgQHNlZ21lbnRzLmF0KGluZGV4
KS5zbGljZShzdGFydCAtIEBvZmZzZXRzLmF0KGluZGV4KSwgbGVuZ3RoKSkKICAgICAgcmVtYWlu
ID0gbGVuZ3RoIC0gcm9wZS5sZW5ndGgKICAgICAgd2hpbGUgcmVtYWluID4gMCAmJiAoaW5kZXgg
Kz0gMSkgPCBAc2VnbWVudHMubGVuZ3RoCiAgICAgICAgcm9wZSA8PCBAc2VnbWVudHMuYXQoaW5k
ZXgpLnNsaWNlKDAsIHJlbWFpbikKICAgICAgICByZW1haW4gPSBsZW5ndGggLSByb3BlLmxlbmd0
aAogICAgICBlbmQKICAgICAgcm9wZQogICAgZW5kCiAgZW5kCiAgYWxpYXMgOltdIDpzbGljZQog
IAogIGRlZiBkdXA7IEFycmF5Um9wZS5uZXcoKkBzZWdtZW50cyk7IGVuZAogIGFsaWFzIDpjbG9u
ZSA6ZHVwCiAgCiAgZGVmICsob3RoZXIpOyBkdXAgPDwgb3RoZXI7IGVuZAoKICBpbmNsdWRlIEVu
dW1lcmFibGUKICBkZWYgZWFjaCAmYmxvY2sKICAgIGxhc3QgPSBuaWwKICAgIEBzZWdtZW50cy5l
YWNoIGRvIHxzfAogICAgICBzLmVhY2ggZG8gfGx8CiAgICAgICAgaWYgbFstMV0gPT0gP1xuCiAg
ICAgICAgICB5aWVsZChsYXN0ID8gIiN7bGFzdH0je2x9IiA6IGwpCiAgICAgICAgICBsYXN0ID0g
bmlsCiAgICAgICAgZWxzZQogICAgICAgICAgbGFzdCA9ICIje2xhc3R9I3tsfSIKICAgICAgICBl
bmQKICAgICAgZW5kCiAgICBlbmQKICAgIHlpZWxkIGxhc3QgaWYgbGFzdAogICAgc2VsZgogIGVu
ZAogIGRlZiBlYWNoX2J5dGUgJmJsb2NrCiAgICBAc2VnbWVudHMuZWFjaCB7IHxzfCBzLmVhY2hf
Ynl0ZSAmYmxvY2sgfQogICAgc2VsZgogIGVuZAoKICBpbmNsdWRlIENvbXBhcmFibGUKICAld1s8
PT4gY2FzZWNtcF0uZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYg
I3ttfShvdGhlcikKICAgICAgICByZXN1bHQgPSAwCiAgICAgICAgQHNlZ21lbnRzLnppcChAb2Zm
c2V0cykgZG8gfHNlZ21lbnQsb2Zmc2V0fAogICAgICAgICAgcmVzdWx0ID0gb3RoZXIuc2xpY2Uo
b2Zmc2V0LCBzZWdtZW50Lmxlbmd0aCkuI3ttfShzZWdtZW50KQogICAgICAgICAgYnJlYWsgaWYg
cmVzdWx0ICE9IDAKICAgICAgICBlbmQKICAgICAgICAtcmVzdWx0CiAgICAgIGVuZAogICAgRU9N
CiAgZW5kCiAgZGVmID09KG90aGVyKQogICAgQG9mZnNldHMubGFzdCA9PSBvdGhlci5sZW5ndGgg
JiYgKHNlbGYgPD0+IG90aGVyKSA9PSAwCiAgZW5kCiAgYWxpYXMgOmVxbD8gOj09CiAgCiAgZGVm
IGhhc2g7IEBzZWdtZW50cy5pbmplY3QoMCkgeyB8aCwgc3wgaCArIHMuaGFzaCB9OyBlbmQKCiAg
ZGVmIHJlcGxhY2Ugc3RyCiAgICBAc2VnbWVudHMsIEBvZmZzZXRzID0gW10sIFswXQogICAgYXBw
ZW5kIHN0cgogIGVuZAoKICAld1tkb3duY2FzZSB1cGNhc2UgY2FwaXRhbGl6ZSBzd2FwY2FzZV0u
ZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYgI3ttfSEKICAgICAg
ICBAc2VnbWVudHMuaW5qZWN0KG5pbCkgZG8gfHIsc3wKICAgICAgICAgIHMuI3sgbSB9ISA/IHNl
bGYgOiByCiAgICAgICAgZW5kCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCiAgCiAgZGVmIHJzdHJp
cCEKICAgIHdoaWxlIEBzZWdtZW50cy5sYXN0LnJzdHJpcCEKICAgICAgaWYgQHNlZ21lbnRzLmxh
c3QuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgIEBvZmZzZXRzLnBvcAogICAg
ICBlbHNlCiAgICAgICAgQG9mZnNldHNbLTFdID0gQG9mZnNldHNbLTJdICsgQHNlZ21lbnRzWy0x
XS5sZW5ndGgKICAgICAgICByZXR1cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAoKICBk
ZWYgbHN0cmlwIQogICAgd2hpbGUgQHNlZ21lbnRzLmZpcnN0LmxzdHJpcCEKICAgICAgaWYgQHNl
Z21lbnRzLmZpcnN0LmVtcHR5PwogICAgICAgIEBzZWdtZW50cy5zaGlmdAogICAgICBlbHNlCiAg
ICAgICAgY29tcHV0ZV9vZmZzZXRzCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAgICBl
bmQKICBlbmQKCiAgZGVmIHN0cmlwIQogICAgcnN0cmlwISA/IGxzdHJpcCEgfHwgc2VsZiA6IGxz
dHJpcCEKICBlbmQKCiAgZGVmIGNob3AhCiAgICB1bmxlc3MgQHNlZ21lbnRzLmVtcHR5PwogICAg
ICBpZiBAc2VnbWVudHMubGFzdC5jaG9wIQogICAgICAgIGlmIEBzZWdtZW50cy5sYXN0LmVtcHR5
PwogICAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgICAgQG9mZnNldHMucG9wCiAgICAgICAg
ZWxzZQogICAgICAgICAgQG9mZnNldHNbLTFdIC09IDEKICAgICAgICBlbmQKICAgICAgICByZXR1
cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAogIAogIGRlZiBjaG9tcCEoc2VwPSQvKQog
ICAgdW5sZXNzIEBzZWdtZW50cy5lbXB0eT8KICAgICAgaWYgQHNlZ21lbnRzLmxhc3QuY2hvbXAh
KHNlcCkKICAgICAgICBpZiBAc2VnbWVudHMubGFzdC5lbXB0eT8KICAgICAgICAgIEBzZWdtZW50
cy5wb3AKICAgICAgICAgIEBvZmZzZXRzLnBvcAogICAgICAgIGVsc2UKICAgICAgICAgIEBvZmZz
ZXRzWy0xXSAtPSAxCiAgICAgICAgZW5kCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAg
ICBlbmQKICBlbmQKICAKICBkZWYgbmV4dCEKICAgIChAc2VnbWVudHMubGVuZ3RoLTEpLmRvd250
bygwKSBkbyB8aXwKICAgICAgcyA9IEBzZWdtZW50cy5hdChpKQogICAgICBuID0gcy5uZXh0CiAg
ICAgIHIgPSBuLmxlbmd0aCA+IHMubGVuZ3RoCiAgICAgIHMucmVwbGFjZSBuCiAgICAgIHMuc2xp
Y2UhKC0xKSBpZiByICYmIGkgPiAwCiAgICAgIGJyZWFrIHVubGVzcyByCiAgICBlbmQKICAgIHNl
bGYKICBlbmQKCiAgJXdbZG93bmNhc2UgdXBjYXNlIGNhcGl0YWxpemUgc3dhcGNhc2UgbHN0cmlw
IHJzdHJpcCBzdHJpcCBjaG9wIGNob21wIG5leHRdLmVhY2ggZG8gfG18CiAgICBtb2R1bGVfZXZh
bCA8PC1FT00KICAgICAgZGVmICN7bX1zdHJpcAogICAgICAgIGQgPSBkdXAKICAgICAgICBkLiN7
bX1zdHJpcCEKICAgICAgICBkCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCmVuZAo------art_17604_4118040.1188812565770--