On Fri, Aug 31, 2007 at 10:19:54PM +0900, Ruby Quiz wrote:
> by John Miller
> 
> This week's task is to implement the Rope data structure as a Ruby class.  This
> topic comes out of the ICFP programming competition
> (http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
> character string this year.

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.

Without further ado, here are the results I'm getting for SIZE = 512 * 1024,
CHUNKS = 512:

$ time ruby -r himadri_choudhury.rb bm.rb Rope
Build:   0.130000   0.000000   0.130000 (  0.129476)
Sort:  10.340000   0.050000  10.390000 ( 10.648223)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
Build:   0.020000   0.000000   0.020000 (  0.018946)
Sort:   0.100000   0.000000   0.100000 (  0.108499)

$ ruby eric_mahurin.rb StringRope
[...]
Build:   0.060000   0.000000   0.060000 (  0.057299)
Sort:   0.870000   0.000000   0.870000 (  0.896493)

For SIZE = 1024, CHUNKS = 16384:

$ ruby eric_mahurin.rb StringRope
[...]
Build:   3.470000   0.040000   3.510000 (  3.588875)
Sort:  89.110000   0.700000  89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build:   0.360000   0.000000   0.360000 (  0.378352)
Sort:   3.940000   0.040000   3.980000 (  4.079140)

At that point the pure Ruby rope is taking over 6 times more memory than
the OCaml one. I ascribe this to iv_tbls being very heavy and to memory
fragmentation.

I benchmarked Himadri's implementation first and was surprised by the
exceedingly large speed difference --- I expected one, not two orders of
magnitude for this code, as there's enough Ruby code in common in qsort to
mask the speed gains in the rope operations. However, Eric's solution proved
that it was just due to a slow Ruby implementation.

Here's the interface definition (extconf.rb):


EXT_NAME = "ocaml_rope" OCAML_PACKAGES = %w[] CAML_LIBS = %w[] CAML_OBJS = %w[] CAML_FLAGS = "" CAML_INCLUDES = [] require 'rocaml' Interface.generate("ocaml_rope") do |iface| def_class("Rope", :under => "OCaml") do |c| rope = c.abstract_type fun "empty", UNIT => rope, :as => "new_empty" fun "of_string", STRING => rope, :as => "new_from_string" method "sub", [rope, INT, INT] => rope, :as => "slice" method "concat", [rope, rope] => rope method "length", rope => INT method "get", [rope, INT] => INT method "to_string", rope => STRING, :as => "to_s" end end require 'rocaml_extconf'
As you can see, OCaml::Rope is purely functional, and the interface differs a bit from that expected by bm.rb (a modified version that works with immutable ropes is attached), so I adapted it with the following ocaml_rope.rb, which also loads the extension:
module OCaml # Rope will be placed in this module end require "ocaml_rope.so" module OCaml class Rope def self.new(str = "") case str when String; new_from_string str when Rope; str when ""; new_empty else new_from_string(str.to_str) rescue new_from_string(str.to_s) end end def prepend(rope) rope.append(self) end alias_method :append, :concat alias_method :<<, :append end end
The OCaml code is attached, in case anybody wants to look at it. Incidentally, it weighs a bit under 220 lines, which is also the amount taken by Himadri's and Eric's solutions. Unlike them, rope.ml features O(1) concatenation for small elements; this accounts for a large part of the code and the complexity of some patterns. O(1) concatenation doesn't really affect performance in the use case exerted by bm.rb anyway. -- Mauricio Fernandez - http://eigenclass.org require 'benchmark' #This code make a String/Rope of CHUNKS chunks of text #each chunck is SIZE bytes long. Each chunk starts with #an 8 byte number. Initially the chunks are shuffled the #qsort method sorts them into ascending order. # #pass the name of the class to use as a parameter #ruby -r rope.rb this_file Rope puts 'preparing data...' TextClass = (ARGV.shift || "String").split(/::/).inject(Object){|s,x| s.const_get(x)} def qsort(text) return TextClass.new if text.length == 0 pivot = text.slice(0,8).to_s.to_i less = TextClass.new more = TextClass.new offset = 8+SIZE while (offset < text.length) i = text.slice(offset,8).to_s.to_i if i < pivot less <<= text.slice(offset,8+SIZE) else more <<= text.slice(offset,8+SIZE) end offset = offset + 8+SIZE end #print "*" return qsort(less) << text.slice(0,8+SIZE) << qsort(more) end SIZE = 1 * 1024 CHUNKS = 32768 CHARS = %w[R O P E] data = TextClass.new bulk_string = TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join) puts 'Building Text...' build = Benchmark.measure do (0..CHUNKS).sort_by { rand }.each do |n| data = data << TextClass.new(sprintf("%08i",n)) << bulk_string end data = data.normalize if data.respond_to? :normalize end GC.start sort = Benchmark.measure do puts "Sorting Text..." qsort(data) puts"\nEND" end puts "Build: #{build}Sort: #{sort}" type t = Empty (* left, left size, right, right size, height *) | Concat of t * int * t * int * int | Leaf of string type forest_element = { mutable c : t; mutable len : int } let str_append = (^) let empty_str = "" let string_of_string_list l = String.concat "" l let max_height = 48 let leaf_size = 256 exception Out_of_bounds let empty = Empty (* by construction, there cannot be Empty or Leaf "" leaves *) let is_empty = function Empty -> true | _ -> false let height = function Empty | Leaf _ -> 0 | Concat(_,_,_,_,h) -> h let rec length = function Empty -> 0 | Leaf s -> String.length s | Concat(_,cl,_,cr,_) -> cl + cr let make_concat l r = let hl = height l and hr = height r in let cl = length l and cr = length r in Concat(l, cl, r, cr, if hl >= hr then hl + 1 else hr + 1) let min_len = let fib_tbl = Array.make max_height 0 in let rec fib n = match fib_tbl.(n) with 0 -> let last = fib (n - 1) and prev = fib (n - 2) in let r = last + prev in let r = if r > last then r else last in (* check overflow *) fib_tbl.(n) <- r; r | n -> n in fib_tbl.(0) <- leaf_size + 1; fib_tbl.(1) <- 3 * leaf_size / 2 + 1; Array.init max_height (fun i -> if i = 0 then 1 else fib (i - 1)) let max_length = min_len.(Array.length min_len - 1) let concat_fast l r = match l with Empty -> r | Leaf _ | Concat(_,_,_,_,_) -> match r with Empty -> l | Leaf _ | Concat(_,_,_,_,_) -> make_concat l r (* based on Hans-J. Boehm's *) let add_forest forest rope len = let i = ref 0 in let sum = ref empty in while len > min_len.(!i+1) do if forest.(!i).c <> Empty then begin sum := concat_fast forest.(!i).c !sum; forest.(!i).c <- Empty end; incr i done; sum := concat_fast !sum rope; let sum_len = ref (length !sum) in while !sum_len >= min_len.(!i) do if forest.(!i).c <> Empty then begin sum := concat_fast forest.(!i).c !sum; sum_len := !sum_len + forest.(!i).len; forest.(!i).c <- Empty; end; incr i done; decr i; forest.(!i).c <- !sum; forest.(!i).len <- !sum_len let concat_forest forest = Array.fold_left (fun s x -> concat_fast x.c s) Empty forest let rec balance_insert rope len forest = match rope with Empty -> () | Leaf _ -> add_forest forest rope len | Concat(l,cl,r,cr,h) when h >= max_height || len < min_len.(h) -> balance_insert l cl forest; balance_insert r cr forest | x -> add_forest forest x len (* function or balanced *) let balance r = match r with Empty -> Empty | Leaf _ -> r | _ -> let forest = Array.init max_height (fun _ -> {c = Empty; len = 0}) in balance_insert r (length r) forest; concat_forest forest let bal_if_needed l r = let r = make_concat l r in if height r < max_height then r else balance r let concat_str l = function Empty | Concat(_,_,_,_,_) -> invalid_arg "concat_str" | Leaf rs as r -> let lenr = String.length rs in match l with | Empty -> r | Leaf ls -> let slen = lenr + String.length ls in if slen <= leaf_size then Leaf (str_append ls rs) else make_concat l r (* height = 1 *) | Concat(ll, cll, Leaf lrs, clr, h) -> let slen = clr + lenr in if clr + lenr <= leaf_size then Concat(ll, cll, Leaf (str_append lrs rs), slen, h) else bal_if_needed l r | _ -> bal_if_needed l r let append_char c r = concat_str r (Leaf (String.make 1 c)) let concat l = function Empty -> l | Leaf _ as r -> concat_str l r | Concat(Leaf rls,rlc,rr,rc,h) as r -> (match l with Empty -> r | Concat(_,_,_,_,_) -> bal_if_needed l r | Leaf ls -> let slen = rlc + String.length ls in if slen <= leaf_size then Concat(Leaf(str_append ls rls), slen, rr, rc, h) else bal_if_needed l r) | r -> (match l with Empty -> r | _ -> bal_if_needed l r) let prepend_char c r = concat (Leaf (String.make 1 c)) r let rec get i = function Empty -> raise Out_of_bounds | Leaf s -> if i >= 0 && i < String.length s then String.unsafe_get s i else raise Out_of_bounds | Concat (l, cl, r, cr, _) -> if i < cl then get i l else get (i - cl) r let of_string = function s when String.length s = 0 -> Empty | s -> let min (x:int) (y:int) = if x <= y then x else y in let rec loop r s len i = if i < len then (* len - i > 0, thus Leaf "" can't happen *) loop (concat r (Leaf (String.sub s i (min (len - i) leaf_size)))) s len (i + leaf_size) else r in loop Empty s (String.length s) 0 let rec sub start len = function Empty -> if start <> 0 || len <> 0 then raise Out_of_bounds else Empty | Leaf s -> if len > 0 then (* Leaf "" cannot happen *) (try Leaf (String.sub s start len) with _ -> raise Out_of_bounds) else if len < 0 || start < 0 || start > String.length s then raise Out_of_bounds else Empty | Concat(l,cl,r,cr,_) -> if start < 0 || len < 0 || start + len > cl + cr then raise Out_of_bounds; let left = if start = 0 then if len >= cl then l else sub 0 len l else if start > cl then Empty else if start + len >= cl then sub start (cl - start) l else sub start len l in let right = if start <= cl then let upto = start + len in if upto = cl + cr then r else if upto < cl then Empty else sub 0 (upto - cl) r else sub (start - cl) len r in concat left right let to_string r = let rec strings l = function Empty -> l | Leaf s -> s :: l | Concat(left,_,right,_,_) -> strings (strings l right) left in string_of_string_list (strings [] r) let insert start rope r = concat (concat (sub 0 start r) rope) (sub start (length r - start) r) let remove start len r = concat (sub 0 start r) (sub (start + len) (length r - start - len) r) let () = let r name v = Callback.register ("Rope." ^ name) v in r "empty" (fun () -> empty); r "of_string" of_string; r "sub" (fun r n m -> sub n m r); r "concat" concat; r "length" length; r "get" (fun r i -> get i r); r "to_string" to_string