--eJnRUKwClWJh1Khz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Aug 31, 2007 at 10:19:54PM +0900, Ruby Quiz wrote:
> by John Miller
> 
> This week's task is to implement the Rope data structure as a Ruby class.  This
> topic comes out of the ICFP programming competition
> (http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
> character string this year.

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.

Without further ado, here are the results I'm getting for SIZE  12 * 1024,
CHUNKS  12:

$ time ruby -r himadri_choudhury.rb bm.rb Rope
Build:   0.130000   0.000000   0.130000 (  0.129476)
Sort:  10.340000   0.050000  10.390000 ( 10.648223)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
Build:   0.020000   0.000000   0.020000 (  0.018946)
Sort:   0.100000   0.000000   0.100000 (  0.108499)

$ ruby eric_mahurin.rb StringRope
[...]
Build:   0.060000   0.000000   0.060000 (  0.057299)
Sort:   0.870000   0.000000   0.870000 (  0.896493)

For SIZE  024, CHUNKS  6384:

$ ruby eric_mahurin.rb StringRope
[...]
Build:   3.470000   0.040000   3.510000 (  3.588875)
Sort:  89.110000   0.700000  89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build:   0.360000   0.000000   0.360000 (  0.378352)
Sort:   3.940000   0.040000   3.980000 (  4.079140)

At that point the pure Ruby rope is taking over 6 times more memory than
the OCaml one. I ascribe this to iv_tbls being very heavy and to memory
fragmentation.

I benchmarked Himadri's implementation first and was surprised by the
exceedingly large speed difference --- I expected one, not two orders of
magnitude for this code, as there's enough Ruby code in common in qsort to
mask the speed gains in the rope operations. However, Eric's solution proved
that it was just due to a slow Ruby implementation.

Here's the interface definition (extconf.rb):


EXT_NAME ocaml_rope" OCAML_PACKAGES w[] CAML_LIBS w[] CAML_OBJS w[] CAML_FLAGS " CAML_INCLUDES ] require 'rocaml' Interface.generate("ocaml_rope") do |iface| def_class("Rope", :under "OCaml") do |c| rope .abstract_type fun "empty", UNIT rope, :as "new_empty" fun "of_string", STRING rope, :as "new_from_string" method "sub", [rope, INT, INT] rope, :as "slice" method "concat", [rope, rope] rope method "length", rope INT method "get", [rope, INT] INT method "to_string", rope STRING, :as "to_s" end end require 'rocaml_extconf'
As you can see, OCaml::Rope is purely functional, and the interface differs a bit from that expected by bm.rb (a modified version that works with immutable ropes is attached), so I adapted it with the following ocaml_rope.rb, which also loads the extension:
module OCaml # Rope will be placed in this module end require "ocaml_rope.so" module OCaml class Rope def self.new(str ") case str when String; new_from_string str when Rope; str when ""; new_empty else new_from_string(str.to_str) rescue new_from_string(str.to_s) end end def prepend(rope) rope.append(self) end alias_method :append, :concat alias_method :<<, :append end end
The OCaml code is attached, in case anybody wants to look at it. Incidentally, it weighs a bit under 220 lines, which is also the amount taken by Himadri's and Eric's solutions. Unlike them, rope.ml features O(1) concatenation for small elements; this accounts for a large part of the code and the complexity of some patterns. O(1) concatenation doesn't really affect performance in the use case exerted by bm.rb anyway. -- Mauricio Fernandez - http://eigenclass.org --eJnRUKwClWJh1Khz Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="bm.rb" require 'benchmark' #This code make a String/Rope of CHUNKS chunks of text #each chunck is SIZE bytes long. Each chunk starts with #an 8 byte number. Initially the chunks are shuffled the #qsort method sorts them into ascending order. # #pass the name of the class to use as a parameter #ruby -r rope.rb this_file Rope puts 'preparing data...' TextClass ARGV.shift || "String").split(/::/).inject(Object){|s,x| s.const_get(x)} def qsort(text) return TextClass.new if text.length 0 pivot ext.slice(0,8).to_s.to_i less extClass.new more extClass.new offset +SIZE while (offset < text.length) i ext.slice(offset,8).to_s.to_i if i < pivot less << ext.slice(offset,8+SIZE) else more << ext.slice(offset,8+SIZE) end offset ffset + 8+SIZE end #print "*" return qsort(less) << text.slice(0,8+SIZE) << qsort(more) end SIZE * 1024 CHUNKS 2768 CHARS w[R O P E] data extClass.new bulk_string TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join) puts 'Building Text...' build enchmark.measure do (0..CHUNKS).sort_by { rand }.each do |n| data ata << TextClass.new(sprintf("%08i",n)) << bulk_string end data ata.normalize if data.respond_to? :normalize end GC.start sort enchmark.measure do puts "Sorting Text..." qsort(data) puts"\nEND" end puts "Build: #{build}Sort: #{sort}" --eJnRUKwClWJh1Khz Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="rope.ml" type t Empty (* left, left size, right, right size, height *) | Concat of t * int * t * int * int | Leaf of string type forest_element mutable c : t; mutable len : int } let str_append ^) let empty_str " let string_of_string_list l tring.concat "" l let max_height 8 let leaf_size 56 exception Out_of_bounds let empty mpty (* by construction, there cannot be Empty or Leaf "" leaves *) let is_empty unction Empty -> true | _ -> false let height unction Empty | Leaf _ -> 0 | Concat(_,_,_,_,h) -> h let rec length unction Empty -> 0 | Leaf s -> String.length s | Concat(_,cl,_,cr,_) -> cl + cr let make_concat l r let hl eight l and hr eight r in let cl ength l and cr ength r in Concat(l, cl, r, cr, if hl > r then hl + 1 else hr + 1) let min_len let fib_tbl rray.make max_height 0 in let rec fib n atch fib_tbl.(n) with 0 -> let last ib (n - 1) and prev ib (n - 2) in let r ast + prev in let r f r > last then r else last in (* check overflow *) fib_tbl.(n) <- r; r | n -> n in fib_tbl.(0) <- leaf_size + 1; fib_tbl.(1) <- 3 * leaf_size / 2 + 1; Array.init max_height (fun i -> if i then 1 else fib (i - 1)) let max_length in_len.(Array.length min_len - 1) let concat_fast l r atch l with Empty -> r | Leaf _ | Concat(_,_,_,_,_) -> match r with Empty -> l | Leaf _ | Concat(_,_,_,_,_) -> make_concat l r (* based on Hans-J. Boehm's *) let add_forest forest rope len let i ef 0 in let sum ef empty in while len > min_len.(!i+1) do if forest.(!i).c <> Empty then begin sum : oncat_fast forest.(!i).c !sum; forest.(!i).c <- Empty end; incr i done; sum : oncat_fast !sum rope; let sum_len ef (length !sum) in while !sum_len > in_len.(!i) do if forest.(!i).c <> Empty then begin sum : oncat_fast forest.(!i).c !sum; sum_len : sum_len + forest.(!i).len; forest.(!i).c <- Empty; end; incr i done; decr i; forest.(!i).c <- !sum; forest.(!i).len <- !sum_len let concat_forest forest Array.fold_left (fun s x -> concat_fast x.c s) Empty forest let rec balance_insert rope len forest atch rope with Empty -> () | Leaf _ -> add_forest forest rope len | Concat(l,cl,r,cr,h) when h > ax_height || len < min_len.(h) -> balance_insert l cl forest; balance_insert r cr forest | x -> add_forest forest x len (* function or balanced *) let balance r match r with Empty -> Empty | Leaf _ -> r | _ -> let forest rray.init max_height (fun _ -> {c mpty; len }) in balance_insert r (length r) forest; concat_forest forest let bal_if_needed l r let r ake_concat l r in if height r < max_height then r else balance r let concat_str l unction Empty | Concat(_,_,_,_,_) -> invalid_arg "concat_str" | Leaf rs as r -> let lenr tring.length rs in match l with | Empty -> r | Leaf ls -> let slen enr + String.length ls in if slen < eaf_size then Leaf (str_append ls rs) else make_concat l r (* height *) | Concat(ll, cll, Leaf lrs, clr, h) -> let slen lr + lenr in if clr + lenr < eaf_size then Concat(ll, cll, Leaf (str_append lrs rs), slen, h) else bal_if_needed l r | _ -> bal_if_needed l r let append_char c r oncat_str r (Leaf (String.make 1 c)) let concat l unction Empty -> l | Leaf _ as r -> concat_str l r | Concat(Leaf rls,rlc,rr,rc,h) as r -> (match l with Empty -> r | Concat(_,_,_,_,_) -> bal_if_needed l r | Leaf ls -> let slen lc + String.length ls in if slen < eaf_size then Concat(Leaf(str_append ls rls), slen, rr, rc, h) else bal_if_needed l r) | r -> (match l with Empty -> r | _ -> bal_if_needed l r) let prepend_char c r oncat (Leaf (String.make 1 c)) r let rec get i unction Empty -> raise Out_of_bounds | Leaf s -> if i > && i < String.length s then String.unsafe_get s i else raise Out_of_bounds | Concat (l, cl, r, cr, _) -> if i < cl then get i l else get (i - cl) r let of_string unction s when String.length s -> Empty | s -> let min (x:int) (y:int) f x < then x else y in let rec loop r s len i if i < len then (* len - i > 0, thus Leaf "" can't happen *) loop (concat r (Leaf (String.sub s i (min (len - i) leaf_size)))) s len (i + leaf_size) else r in loop Empty s (String.length s) 0 let rec sub start len unction Empty -> if start <> 0 || len <> 0 then raise Out_of_bounds else Empty | Leaf s -> if len > 0 then (* Leaf "" cannot happen *) (try Leaf (String.sub s start len) with _ -> raise Out_of_bounds) else if len < 0 || start < 0 || start > String.length s then raise Out_of_bounds else Empty | Concat(l,cl,r,cr,_) -> if start < 0 || len < 0 || start + len > cl + cr then raise Out_of_bounds; let left if start then if len > l then l else sub 0 len l else if start > cl then Empty else if start + len > l then sub start (cl - start) l else sub start len l in let right if start < l then let upto tart + len in if upto l + cr then r else if upto < cl then Empty else sub 0 (upto - cl) r else sub (start - cl) len r in concat left right let to_string r let rec strings l unction Empty -> l | Leaf s -> s :: l | Concat(left,_,right,_,_) -> strings (strings l right) left in string_of_string_list (strings [] r) let insert start rope r concat (concat (sub 0 start r) rope) (sub start (length r - start) r) let remove start len r concat (sub 0 start r) (sub (start + len) (length r - start - len) r) let () let r name v allback.register ("Rope." ^ name) v in r "empty" (fun () -> empty); r "of_string" of_string; r "sub" (fun r n m -> sub n m r); r "concat" concat; r "length" length; r "get" (fun r i -> get i r); r "to_string" to_string --eJnRUKwClWJh1Khz--