------ art_17604_4118040.1188812565770 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 8/31/07, Ruby Quiz <james / grayproductions.net> wrote: > by John Miller > > This week's task is to implement the Rope data structure as a Ruby class. This > topic comes out of the ICFP programming competition > (http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million > character string this year. My solution deviates slightly from the problem specification. The most important difference is that instead of implementing a binary tree to store the strings, they are all stored in an array instead. The class itself is quite long, since I wanted to implement many of the methods of the built-in String class. Some of the methods will require significant work to actually implement. Most notably the Regexp-related methods, since there is no way to instruct the Regexp matcher to ask for more characters once it has reached the end of a string. Actually, all String methods are implemented, by implementing method missing and delegating to the composed string if the string class can handle the method. After delegating to the string class, we convert any String result to a new rope and we also make sure to replace our content by the string we delegated to, to make sure that destructive methods works as expected. In fact, we replace the content of our rope as soon as to_s is called. since the reason for lazy concatenation is to avoid the concatenation cost, we can just as well keep the concatenated string when we already had to pay the price. The benchmark results are: # String Build: 0.170000 0.150000 0.320000 ( 0.324341) Sort: 3.470000 3.630000 7.100000 ( 7.126741) # ArrayRope Build: 0.010000 0.010000 0.020000 ( 0.009744) Sort: 0.130000 0.000000 0.130000 ( 0.138330) # ArrayRope (with dup) Build: 0.240000 0.160000 0.400000 ( 0.402201) Sort: 0.160000 0.000000 0.160000 ( 0.163022) For the unprotected case, sorting was ~52 times faster, and building was ~33 times faster. However, since the string class is not immutable, there is a slight problem with this approach. The strings could added to the rope could be modified "under the hood". We can easily protect against that by making copies of the strings when we add them to the rope. Since the built-in String shares the actual data between the two instances (until they change), this is not so much of a memory issue as one could expect. By adding dup (in initialize/append/prepend) we end up with the following times, which trades of some of our speed/memory for a bit of safety. (This is actually the submitted solution) Compared with the string approach, building is now (for obvious reasons) slower than the String, but only about 25%. Sorting is still much faster than the String case, namely ~44 times as fast. !g ------ art_17604_4118040.1188812565770 Content-Type: application/octet-stream; name=array_rope.rb Content-Transfer-Encoding: base64 X-Attachment-Id: f_f64rore4 Content-Disposition: attachment; filename="array_rope.rb" Y2xhc3MgQXJyYXkKICBkZWYgdXBwZXJfYm91bmQodmFsdWUpCiAgICBsbywgaGkgPSAwLCBzaXpl LTEKICAgIHdoaWxlIGxvIDw9IGhpCiAgCSAgbWlkID0gKGxvICsgaGkpIC8gMgogIAkgIGlmIHZh bHVlID49IGF0KG1pZCkKCSAgICAgIGxvID0gbWlkICsgMQogIAkgIGVsc2UKCSAgICAgIGhpID0g bWlkIC0gMQogIAkgIGVuZAogIAllbmQKICAgIGxvCiAgZW5kCmVuZAoKY2xhc3MgQXJyYXlSb3Bl CiAgYXR0cl9yZWFkZXIgOnNlZ21lbnRzCiAgZGVmIGluaXRpYWxpemUoKnNlZ21lbnRzKQogICAg cmFpc2Ugc2VnbWVudHMuaW5zcGVjdCBpZiBzZWdtZW50cy5hbnk/IHsgfHN8IHMubmlsPyB9CiAg ICBAc2VnbWVudHMgPSBzZWdtZW50cy5tYXAgeyB8c3wgcy5kdXAgfQogICAgY29tcHV0ZV9vZmZz ZXRzCiAgZW5kCgogIGRlZiBjb21wdXRlX29mZnNldHMKICAgIGwgPSAwCiAgICBAb2Zmc2V0cyA9 IEBzZWdtZW50cy5tYXAgeyB8c3wgbCArPSBzLmxlbmd0aCB9CiAgICBAb2Zmc2V0cy51bnNoaWZ0 IDAgICAgCiAgZW5kCgogIGRlZiBsZW5ndGgKICAgIEBvZmZzZXRzLmxhc3QKICBlbmQKICBhbGlh cyA6c2l6ZSA6bGVuZ3RoCiAgZGVmIGVtcHR5PwogICAgQHNlZ21lbnRzLmVtcHR5PwogIGVuZAoK ICBkZWYgdG9fcwogICAgY2FzZSBAc2VnbWVudHMuc2l6ZQogICAgd2hlbiAwOyAiIgogICAgd2hl biAxOyBAc2VnbWVudHMuZmlyc3QuZHVwCiAgICBlbHNlCiAgICAgICMgV2hlbiBjb252ZXJ0aW5n IHRvIGEgc3RyaW5nLCB3ZSBtdXN0IGRvIHRoZSBleHBlbnNpdmUKICAgICAgIyBjb25jYXRlbmF0 aW9uLCBzbyBsZXRzIGhpamFjayB0aGF0IGluZm9ybWF0aW9uIGFuZCB1c2UgdGhlCiAgICAgICMg bmV3IHN0cmluZyBpbnRlcm5hbGx5IGFzIHdlbGwuCiAgICAgIHMgPSAoQHNlZ21lbnRzICogIiIp CiAgICAgIEBzZWdtZW50cywgQG9mZnNldHMgPSBbc10sIFswLHMubGVuZ3RoXQogICAgICBzLmR1 cAogICAgZW5kCiAgZW5kCiAgCiAgZGVmIG1ldGhvZF9taXNzaW5nKG0sICphcmdzLCAmYmxvY2sp CiAgICAjIE1ha2Ugc3VyZSB3ZSBoYW5kbGUgcmVtYWluaW5nIHN0cmluZyBtZXRob2RzIGFjY29y ZGluZ2x5CiAgICBpZiAiIi5yZXNwb25kX3RvPyhtKQogICAgICBzID0gdG9fcwogICAgICByID0g cy5fX3NlbmRfXyhtLCAqYXJncywgJmJsb2NrKQoKICAgICAgIyBHdWVzcyB0aGF0IHRoaXMgaXMg dGhlIHJpZ2h0IHRoaW5nIHRvIGRvCiAgICAgIGlmIHIuZXF1YWw/IHMKICAgICAgICByID0gc2Vs ZiAKICAgICAgZWxzaWYgci5pc19hPyBTdHJpbmcKICAgICAgICByID0gQXJyYXlSb3BlLm5ldyBy CiAgICAgIGVuZAoKICAgICAgIyBUaGUgbWV0aG9kIG1pZ2h0IGNoYW5nZSB0aGUgY29udGVudHMg b2YgdGhlIHN0cmluZwogICAgICByZXBsYWNlIHMKCiAgICAgIHIKICAgIGVsc2UKICAgICAgc3Vw ZXIobSwgKmFyZ3MsICZibG9jaykKICAgIGVuZAogIGVuZAogIAogIGRlZiByZXNwb25kc190bz8o bSkKICAgICMgTWFrZSBzdXJlIHdlIGhhbmRsZSByZW1haW5pbmcgc3RyaW5nIG1ldGhvZHMgYWNj b3JkaW5nbHkKICAgIHN1cGVyIHx8ICIiLnJlc3BvbmRzX3RvPyhtKQogIGVuZAogIAogIGRlZiBh cHBlbmQoKnNlZ21lbnRzKQogICAgc2VnbWVudHMuZWFjaCBkbyB8c3wKICAgICAgaWYgcy5pc19h PyBBcnJheVJvcGUKICAgICAgICBhcHBlbmQgKnMuc2VnbWVudHMKICAgICAgZWxzaWYgIXMuZW1w dHk/CiAgICAgICAgQHNlZ21lbnRzIDw8IHMuZHVwCiAgICAgICAgQG9mZnNldHMgPDwgQG9mZnNl dHMubGFzdCArIHMubGVuZ3RoCiAgICAgIGVuZAogICAgZW5kCiAgICBzZWxmCiAgZW5kCiAgYWxp YXMgOjw8IDphcHBlbmQKICBkZWYgcHJlcGVuZCgqc2VnbWVudHMpCiAgICBzZWdtZW50cy5lYWNo IGRvIHxzfAogICAgICBpZiBzLmlzX2E/IEFycmF5Um9wZQogICAgICAgIHByZXBlbmQgKnMuc2Vn bWVudHMKICAgICAgZWxzaWYgIXMuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnVuc2hpZnQgcy5k dXAKICAgICAgICBjb21wdXRlX29mZnNldHMKICAgICAgZW5kCiAgICBlbmQKICAgIHNlbGYKICBl bmQKICAKICBkZWYgc2xpY2Uoc3RhcnQsbGVuZ3RoPW5pbCkKICAgIHJldHVybiB0b19zLnNsaWNl KHN0YXJ0LCBsZW5ndGh8fDApIGlmIHN0YXJ0LmlzX2E/IFJlZ2V4cAogICAgcmV0dXJuIGluZGV4 KHN0YXJ0KSBpZiBzdGFydC5pc19hPyBTdHJpbmcKICAgIGlmICFsZW5ndGggJiYgc3RhcnQuaXNf YT8oUmFuZ2UpCiAgICAgIGxlbmd0aCA9IHN0YXJ0Lmxhc3QgLSBzdGFydC5maXJzdCAtIChleGNs dWRlX2VuZD8gPyAxIDogMCkKICAgICAgc3RhcnQgPSBzdGFydC5maXJzdAogICAgZW5kCiAgICBz dGFydCA9IHNlbGYubGVuZ3RoICsgc3RhcnQgaWYgc3RhcnQgPCAwCiAgICBpZiBzdGFydCA8IDAg fHwgc3RhcnQgPiBzZWxmLmxlbmd0aAogICAgICBuaWwKICAgIGVsc2lmICFsZW5ndGgKICAgICAg QHNlZ21lbnRzLmRldGVjdCB7IHxzfCAoc3RhcnQgLT0gcy5sZW5ndGgpIDwgMCB9LnNsaWNlKHN0 YXJ0KQogICAgZWxzZQogICAgICBpbmRleCA9IEBvZmZzZXRzLnVwcGVyX2JvdW5kKHN0YXJ0KSAt IDEKICAgICAgcm9wZSA9IEFycmF5Um9wZS5uZXcoCiAgICAgICAgQHNlZ21lbnRzLmF0KGluZGV4 KS5zbGljZShzdGFydCAtIEBvZmZzZXRzLmF0KGluZGV4KSwgbGVuZ3RoKSkKICAgICAgcmVtYWlu ID0gbGVuZ3RoIC0gcm9wZS5sZW5ndGgKICAgICAgd2hpbGUgcmVtYWluID4gMCAmJiAoaW5kZXgg Kz0gMSkgPCBAc2VnbWVudHMubGVuZ3RoCiAgICAgICAgcm9wZSA8PCBAc2VnbWVudHMuYXQoaW5k ZXgpLnNsaWNlKDAsIHJlbWFpbikKICAgICAgICByZW1haW4gPSBsZW5ndGggLSByb3BlLmxlbmd0 aAogICAgICBlbmQKICAgICAgcm9wZQogICAgZW5kCiAgZW5kCiAgYWxpYXMgOltdIDpzbGljZQog IAogIGRlZiBkdXA7IEFycmF5Um9wZS5uZXcoKkBzZWdtZW50cyk7IGVuZAogIGFsaWFzIDpjbG9u ZSA6ZHVwCiAgCiAgZGVmICsob3RoZXIpOyBkdXAgPDwgb3RoZXI7IGVuZAoKICBpbmNsdWRlIEVu dW1lcmFibGUKICBkZWYgZWFjaCAmYmxvY2sKICAgIGxhc3QgPSBuaWwKICAgIEBzZWdtZW50cy5l YWNoIGRvIHxzfAogICAgICBzLmVhY2ggZG8gfGx8CiAgICAgICAgaWYgbFstMV0gPT0gP1xuCiAg ICAgICAgICB5aWVsZChsYXN0ID8gIiN7bGFzdH0je2x9IiA6IGwpCiAgICAgICAgICBsYXN0ID0g bmlsCiAgICAgICAgZWxzZQogICAgICAgICAgbGFzdCA9ICIje2xhc3R9I3tsfSIKICAgICAgICBl bmQKICAgICAgZW5kCiAgICBlbmQKICAgIHlpZWxkIGxhc3QgaWYgbGFzdAogICAgc2VsZgogIGVu ZAogIGRlZiBlYWNoX2J5dGUgJmJsb2NrCiAgICBAc2VnbWVudHMuZWFjaCB7IHxzfCBzLmVhY2hf Ynl0ZSAmYmxvY2sgfQogICAgc2VsZgogIGVuZAoKICBpbmNsdWRlIENvbXBhcmFibGUKICAld1s8 PT4gY2FzZWNtcF0uZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYg I3ttfShvdGhlcikKICAgICAgICByZXN1bHQgPSAwCiAgICAgICAgQHNlZ21lbnRzLnppcChAb2Zm c2V0cykgZG8gfHNlZ21lbnQsb2Zmc2V0fAogICAgICAgICAgcmVzdWx0ID0gb3RoZXIuc2xpY2Uo b2Zmc2V0LCBzZWdtZW50Lmxlbmd0aCkuI3ttfShzZWdtZW50KQogICAgICAgICAgYnJlYWsgaWYg cmVzdWx0ICE9IDAKICAgICAgICBlbmQKICAgICAgICAtcmVzdWx0CiAgICAgIGVuZAogICAgRU9N CiAgZW5kCiAgZGVmID09KG90aGVyKQogICAgQG9mZnNldHMubGFzdCA9PSBvdGhlci5sZW5ndGgg JiYgKHNlbGYgPD0+IG90aGVyKSA9PSAwCiAgZW5kCiAgYWxpYXMgOmVxbD8gOj09CiAgCiAgZGVm IGhhc2g7IEBzZWdtZW50cy5pbmplY3QoMCkgeyB8aCwgc3wgaCArIHMuaGFzaCB9OyBlbmQKCiAg ZGVmIHJlcGxhY2Ugc3RyCiAgICBAc2VnbWVudHMsIEBvZmZzZXRzID0gW10sIFswXQogICAgYXBw ZW5kIHN0cgogIGVuZAoKICAld1tkb3duY2FzZSB1cGNhc2UgY2FwaXRhbGl6ZSBzd2FwY2FzZV0u ZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYgI3ttfSEKICAgICAg ICBAc2VnbWVudHMuaW5qZWN0KG5pbCkgZG8gfHIsc3wKICAgICAgICAgIHMuI3sgbSB9ISA/IHNl bGYgOiByCiAgICAgICAgZW5kCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCiAgCiAgZGVmIHJzdHJp cCEKICAgIHdoaWxlIEBzZWdtZW50cy5sYXN0LnJzdHJpcCEKICAgICAgaWYgQHNlZ21lbnRzLmxh c3QuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgIEBvZmZzZXRzLnBvcAogICAg ICBlbHNlCiAgICAgICAgQG9mZnNldHNbLTFdID0gQG9mZnNldHNbLTJdICsgQHNlZ21lbnRzWy0x XS5sZW5ndGgKICAgICAgICByZXR1cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAoKICBk ZWYgbHN0cmlwIQogICAgd2hpbGUgQHNlZ21lbnRzLmZpcnN0LmxzdHJpcCEKICAgICAgaWYgQHNl Z21lbnRzLmZpcnN0LmVtcHR5PwogICAgICAgIEBzZWdtZW50cy5zaGlmdAogICAgICBlbHNlCiAg ICAgICAgY29tcHV0ZV9vZmZzZXRzCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAgICBl bmQKICBlbmQKCiAgZGVmIHN0cmlwIQogICAgcnN0cmlwISA/IGxzdHJpcCEgfHwgc2VsZiA6IGxz dHJpcCEKICBlbmQKCiAgZGVmIGNob3AhCiAgICB1bmxlc3MgQHNlZ21lbnRzLmVtcHR5PwogICAg ICBpZiBAc2VnbWVudHMubGFzdC5jaG9wIQogICAgICAgIGlmIEBzZWdtZW50cy5sYXN0LmVtcHR5 PwogICAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgICAgQG9mZnNldHMucG9wCiAgICAgICAg ZWxzZQogICAgICAgICAgQG9mZnNldHNbLTFdIC09IDEKICAgICAgICBlbmQKICAgICAgICByZXR1 cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAogIAogIGRlZiBjaG9tcCEoc2VwPSQvKQog ICAgdW5sZXNzIEBzZWdtZW50cy5lbXB0eT8KICAgICAgaWYgQHNlZ21lbnRzLmxhc3QuY2hvbXAh KHNlcCkKICAgICAgICBpZiBAc2VnbWVudHMubGFzdC5lbXB0eT8KICAgICAgICAgIEBzZWdtZW50 cy5wb3AKICAgICAgICAgIEBvZmZzZXRzLnBvcAogICAgICAgIGVsc2UKICAgICAgICAgIEBvZmZz ZXRzWy0xXSAtPSAxCiAgICAgICAgZW5kCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAg ICBlbmQKICBlbmQKICAKICBkZWYgbmV4dCEKICAgIChAc2VnbWVudHMubGVuZ3RoLTEpLmRvd250 bygwKSBkbyB8aXwKICAgICAgcyA9IEBzZWdtZW50cy5hdChpKQogICAgICBuID0gcy5uZXh0CiAg ICAgIHIgPSBuLmxlbmd0aCA+IHMubGVuZ3RoCiAgICAgIHMucmVwbGFjZSBuCiAgICAgIHMuc2xp Y2UhKC0xKSBpZiByICYmIGkgPiAwCiAgICAgIGJyZWFrIHVubGVzcyByCiAgICBlbmQKICAgIHNl bGYKICBlbmQKCiAgJXdbZG93bmNhc2UgdXBjYXNlIGNhcGl0YWxpemUgc3dhcGNhc2UgbHN0cmlw IHJzdHJpcCBzdHJpcCBjaG9wIGNob21wIG5leHRdLmVhY2ggZG8gfG18CiAgICBtb2R1bGVfZXZh bCA8PC1FT00KICAgICAgZGVmICN7bX1zdHJpcAogICAgICAgIGQgPSBkdXAKICAgICAgICBkLiN7 bX1zdHJpcCEKICAgICAgICBkCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCmVuZAo------ art_17604_4118040.1188812565770--