------ art_33989_25453491.1202668439212
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
I'd like to bring up the issue of how characters are represented in
ruby 1.9from a performance standpoint. In a recent ruby-quiz (parsing
JSON), the
fastest pure-ruby solution was simply an LL(1) parser that looked at one
character at a time (it beat various Regexp solutions). With ruby 1.9, the
runtime increased by 4X making it a slow solution. A simple benchmark is at
the end of this message that counts words in an LL(1) fashion. With ruby
1.8.6, it can could the words in Homer's Iliad in 1.46s on my machine and in
ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in runtime).
I'm writing a ruby DSL parser/lexer generator (could also replace Regexp
functionality). This performance issue in ruby 1.9 is a serious problem.
The problem of course is that every character in ruby 1.9 becomes a normal
ruby object (String) in ruby 1.9, whereas in ruby 1.8 they where immediates
(Fixnums).
I'd like to propose that at least ASCII characters in ruby 1.9 be made into
immediates:
* at a minimum, characters should be read-only/frozen. Allowing them to be
mutable will inhibit many future optimizations.
* give (small) characters a separate class with string-like (read-only)
functionality.
* possibly add a base class that String and this new character class would
be a descendent of.
* eventually make this small (i.e. ASCII or even unicode) character class
have immediate objects
If the above was done, one of these immediate characters would be to a
Fixnum as a frozen String would be to Bignum. A possible base class of
these would be in line with the Integer class.
Please consider this significant performance issue in ruby 1.9.
Eric
#!/usr/bin/env ruby
require 'benchmark'
require 'stringio'
def io_getc(io)
io.rewind
io0 o.getc
words
strings
spacing
punctuation
while (true)
case io0
when ?a..?z, ?A..?Z, ?_
words +
io0 o.getc
io0 o.getc while (case io0;when
?a..?z,?A..?Z,?_,?0..?9;1;end)
when ?\s,?\t,?\n,?\r
spacing +
io0 o.getc
io0 o.getc while (case io0;when ?\s,?\t,?\n,?\r;1;end)
when nil
break
else
punctuation +
io0 o.getc
end
end
return words, strings, spacing, punctuation
end
file_name Homer - Iliad.txt"
system("wget http://www.e-text.org/text/Homer%20-%20Iliad.txt") unless
File.exist?(file_name)
text O.read(file_name)
io tringIO.new(text)
#io ile.open(file_name)
Benchmark.bmbm { |b|
b.report("IO#getc") { p io_getc(io) }
}
------ art_33989_25453491.1202668439212
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
I'd like to bring up the issue of how characters are represented in ruby 1.9 from a performance standpoint. In a recent ruby-quiz (parsing JSON), the fastest pure-ruby solution was simply an LL(1) parser that looked at one character at a time (it beat various Regexp solutions). With ruby 1.9, the runtime increased by 4X making it a slow solution. A simple benchmark is at the end of this message that counts words in an LL(1) fashion. With ruby 1.8.6, it can could the words in Homer's Iliad in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes 52.87s (36X increase in runtime).<br>
<br>I'm writing a ruby DSL parser/lexer generator (could also replace Regexp functionality). This performance issue in ruby 1.9 is a serious problem.<br><br>The problem of course is that every character in ruby 1.9 becomes a normal ruby object (String) in ruby 1.9, whereas in ruby 1.8 they where immediates (Fixnums).<br>
<br>I'd like to propose that at least ASCII characters in ruby 1.9 be made into immediates:<br><br>* at a minimum, characters should be read-only/frozen. Allowing them to be mutable will inhibit many future optimizations.<br>
* give (small) characters a separate class with string-like (read-only) functionality.<br>* possibly add a base class that String and this new character class would be a descendent of.<br>* eventually make this small (i.e. ASCII or even unicode) character class have immediate objects<br>
<br>If the above was done, one of these immediate characters would be to a Fixnum as a frozen String would be to Bignum. A possible base class of these would be in line with the Integer class.<br><br>Please consider this significant performance issue in ruby 1.9.<br>
<br>Eric<br><br><br>#!/usr/bin/env ruby<br><br>require 'benchmark'<br>require 'stringio'<br><br>def io_getc(io)<br> io.rewind<br> io0 o.getc<br> words <br> strings <br> spacing <br>
punctuation <br> while (true)<br> case io0<br> when ?a..?z, ?A..?Z, ?_<br> words + <br> io0 o.getc<br> io0 o.getc while (case io0;when ?a..?z,?A..?Z,?_,?0..?9;1;end)<br>
when ?\s,?\t,?\n,?\r<br> spacing + <br> io0 o.getc<br> io0 o.getc while (case io0;when ?\s,?\t,?\n,?\r;1;end)<br> when nil<br> break<br> else<br>
punctuation + <br> io0 o.getc<br> end<br> end<br> return words, strings, spacing, punctuation<br>end<br><br>file_name quot;Homer - Iliad.txt"<br>system("wget <a href ttp://www.e-text.org/text/Homer%20-%20Iliad.txt">http://www.e-text.org/text/Homer%20-%20Iliad.txt</a>") unless File.exist?(file_name)<br>
text O.read(file_name)<br><br>io tringIO.new(text)<br>#io ile.open(file_name)<br><br>Benchmark.bmbm { |b|<br> b.report("IO#getc") { p io_getc(io) }<br>}<br><br><br>
------ art_33989_25453491.1202668439212--