------art_33989_25453491.1202668439212
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I'd like to bring up the issue of how characters are represented in
ruby 1.9from a performance standpoint.  In a recent ruby-quiz (parsing
JSON), the
fastest pure-ruby solution was simply an LL(1) parser that looked at one
character at a time (it beat various Regexp solutions).  With ruby 1.9, the
runtime increased by 4X making it a slow solution.  A simple benchmark is at
the end of this message that counts words in an LL(1) fashion.  With ruby
1.8.6, it can could the words in Homer's Iliad in 1.46s on my machine and in
ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in runtime).

I'm writing a ruby DSL parser/lexer generator (could also replace Regexp
functionality).  This performance issue in ruby 1.9 is a serious problem.

The problem of course is that every character in ruby 1.9 becomes a normal
ruby object (String) in ruby 1.9, whereas in ruby 1.8 they where immediates
(Fixnums).

I'd like to propose that at least ASCII characters in ruby 1.9 be made into
immediates:

* at a minimum, characters should be read-only/frozen.  Allowing them to be
mutable will inhibit many future optimizations.
* give (small) characters a separate class with string-like (read-only)
functionality.
* possibly add a base class that String and this new character class would
be a descendent of.
* eventually make this small (i.e. ASCII or even unicode) character class
have immediate objects

If the above was done, one of these immediate characters would be to a
Fixnum as a frozen String would be to Bignum.  A possible base class of
these would be in line with the Integer class.

Please consider this significant performance issue in ruby 1.9.

Eric


#!/usr/bin/env ruby

require 'benchmark'
require 'stringio'

def io_getc(io)
    io.rewind
    io0  o.getc
    words  
    strings  
    spacing  
    punctuation  
    while (true)
         case io0
         when ?a..?z, ?A..?Z, ?_
             words + 
             io0  o.getc
             io0  o.getc while (case io0;when
?a..?z,?A..?Z,?_,?0..?9;1;end)
         when ?\s,?\t,?\n,?\r
             spacing + 
             io0  o.getc
             io0  o.getc while (case io0;when ?\s,?\t,?\n,?\r;1;end)
         when nil
             break
         else
             punctuation + 
             io0  o.getc
         end
    end
    return words, strings, spacing, punctuation
end

file_name  Homer - Iliad.txt"
system("wget  http://www.e-text.org/text/Homer%20-%20Iliad.txt") unless
File.exist?(file_name)
text  O.read(file_name)

io  tringIO.new(text)
#io  ile.open(file_name)

Benchmark.bmbm { |b|
    b.report("IO#getc") { p io_getc(io) }
}

------art_33989_25453491.1202668439212
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I&#39;d like to bring up the issue of how characters are represented in ruby 1.9 from a performance standpoint.&nbsp; In a recent ruby-quiz (parsing JSON), the fastest pure-ruby solution was simply an LL(1) parser that looked at one character at a time (it beat various Regexp solutions).&nbsp; With ruby 1.9, the runtime increased by 4X making it a slow solution.&nbsp; A simple benchmark is at the end of this message that counts words in an LL(1) fashion.&nbsp; With ruby 1.8.6, it can could the words in Homer&#39;s Iliad in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in runtime).<br>
<br>I&#39;m writing a ruby DSL parser/lexer generator (could also replace Regexp functionality).&nbsp; This performance issue in ruby 1.9 is a serious problem.<br><br>The problem of course is that every character in ruby 1.9 becomes a normal ruby object (String) in ruby 1.9, whereas in ruby 1.8 they where immediates (Fixnums).<br>
<br>I&#39;d like to propose that at least ASCII characters in ruby 1.9 be made into immediates:<br><br>* at a minimum, characters should be read-only/frozen.&nbsp; Allowing them to be mutable will inhibit many future optimizations.<br>
* give (small) characters a separate class with string-like (read-only) functionality.<br>* possibly add a base class that String and this new character class would be a descendent of.<br>* eventually make this small (i.e. ASCII or even unicode) character class have immediate objects<br>
<br>If the above was done, one of these immediate characters would be to a Fixnum as a frozen String would be to Bignum.&nbsp; A possible base class of these would be in line with the Integer class.<br><br>Please consider this significant performance issue in ruby 1.9.<br>
<br>Eric<br><br><br>#!/usr/bin/env ruby<br><br>require &#39;benchmark&#39;<br>require &#39;stringio&#39;<br><br>def io_getc(io)<br>&nbsp;&nbsp;&nbsp; io.rewind<br>&nbsp;&nbsp;&nbsp; io0  o.getc<br>&nbsp;&nbsp;&nbsp; words  <br>&nbsp;&nbsp;&nbsp; strings  <br>&nbsp;&nbsp;&nbsp; spacing  <br>
&nbsp;&nbsp;&nbsp; punctuation  <br>&nbsp;&nbsp;&nbsp; while (true)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case io0<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; when ?a..?z, ?A..?Z, ?_<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; words + <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; io0  o.getc<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; io0  o.getc while (case io0;when ?a..?z,?A..?Z,?_,?0..?9;1;end)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; when ?\s,?\t,?\n,?\r<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; spacing + <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; io0  o.getc<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; io0  o.getc while (case io0;when ?\s,?\t,?\n,?\r;1;end)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; when nil<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; punctuation + <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; io0  o.getc<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end<br>&nbsp;&nbsp;&nbsp; end<br>&nbsp;&nbsp;&nbsp; return words, strings, spacing, punctuation<br>end<br><br>file_name  quot;Homer - Iliad.txt&quot;<br>system(&quot;wget&nbsp; <a hrefttp://www.e-text.org/text/Homer%20-%20Iliad.txt">http://www.e-text.org/text/Homer%20-%20Iliad.txt</a>&quot;) unless File.exist?(file_name)<br>
text  O.read(file_name)<br><br>io  tringIO.new(text)<br>#io  ile.open(file_name)<br><br>Benchmark.bmbm { |b|<br>&nbsp;&nbsp;&nbsp; b.report(&quot;IO#getc&quot;) { p io_getc(io) }<br>}<br><br><br>

------art_33989_25453491.1202668439212--