------art_4304_15657722.1203261087188
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Feb 16, 2008 8:51 PM, Charles Oliver Nutter <charles.nutter / sun.com>
wrote:

> Eric Mahurin wrote:
> > I'd like to bring up the issue of how characters are represented in ruby
> > 1.9 from a performance standpoint.  In a recent ruby-quiz (parsing
> > JSON), the fastest pure-ruby solution was simply an LL(1) parser that
> > looked at one character at a time (it beat various Regexp solutions).
> > With ruby 1.9, the runtime increased by 4X making it a slow solution.  A
> > simple benchmark is at the end of this message that counts words in an
> > LL(1) fashion.  With ruby 1.8.6, it can could the words in Homer's Iliad
> > in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes
> > 52.87s (36X increase in runtime).
>
> Interesting benchmark; may I include it in JRuby's collection of
> benchmarks?
>
> - Charlie
>

Of course!  It is basically just a test of IO#getc or StringIO#getc along
with comparison of those characters with some fixed characters.  I think it
is a good representation of LL(1) character parsing/lexing.  LR/LALR/DFA/NFA
character parsing/lexing will have similar operations.  Instead of recursive
code, you'll have a state machine/stack for those.  Also, for any of these,
table lookup could be done instead of comparison, but it would cost a bit of
memory for character sets (probably slower too).

Most people coding in ruby use Regexp (DFA or NFA parsing itself?) at the
character-level, so the above isn't necessarily representative.  I just find
using Regexp with an IO more trouble than it is worth when writing a clean
parser/lexer (generator).

Eric

------art_4304_15657722.1203261087188
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Feb 16, 2008 8:51 PM, Charles Oliver Nutter &lt;<a hrefailto:charles.nutter / sun.com">charles.nutter / sun.com</a>&gt; wrote:<br><div classmail_quote"><blockquote classmail_quote" styleorder-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div classh2E3d">Eric Mahurin wrote:<br>&gt; I&#39;d like to bring up the issue of how characters are represented in ruby<br>&gt; 1.9 from a performance standpoint. &nbsp;In a recent ruby-quiz (parsing<br>&gt; JSON), the fastest pure-ruby solution was simply an LL(1) parser that<br>
&gt; looked at one character at a time (it beat various Regexp solutions).<br>&gt; With ruby 1.9, the runtime increased by 4X making it a slow solution. &nbsp;A<br>&gt; simple benchmark is at the end of this message that counts words in an<br>
&gt; LL(1) fashion. &nbsp;With ruby 1.8.6, it can could the words in Homer&#39;s Iliad<br>&gt; in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes<br>&gt; 52.87s (36X increase in runtime).<br><br></div>Interesting benchmark; may I include it in JRuby&#39;s collection of benchmarks?<br>
<br>- Charlie<br></blockquote></div><br>Of course!&nbsp; It is basically just a test of IO#getc or StringIO#getc along with comparison of those characters with some fixed characters.&nbsp; I think it is a good representation of LL(1) character parsing/lexing.&nbsp; LR/LALR/DFA/NFA character parsing/lexing will have similar operations.&nbsp; Instead of recursive code, you&#39;ll have a state machine/stack for those.&nbsp; Also, for any of these, table lookup could be done instead of comparison, but it would cost a bit of memory for character sets (probably slower too).<br>
<br>Most people coding in ruby use Regexp (DFA or NFA parsing itself?) at the character-level, so the above isn&#39;t necessarily representative.&nbsp; I just find using Regexp with an IO more trouble than it is worth when writing a clean parser/lexer (generator).<br>
<br>Eric<br><br><br>

------art_4304_15657722.1203261087188--