------ art_4304_15657722.1203261087188 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Feb 16, 2008 8:51 PM, Charles Oliver Nutter <charles.nutter / sun.com> wrote: > Eric Mahurin wrote: > > I'd like to bring up the issue of how characters are represented in ruby > > 1.9 from a performance standpoint. In a recent ruby-quiz (parsing > > JSON), the fastest pure-ruby solution was simply an LL(1) parser that > > looked at one character at a time (it beat various Regexp solutions). > > With ruby 1.9, the runtime increased by 4X making it a slow solution. A > > simple benchmark is at the end of this message that counts words in an > > LL(1) fashion. With ruby 1.8.6, it can could the words in Homer's Iliad > > in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes > > 52.87s (36X increase in runtime). > > Interesting benchmark; may I include it in JRuby's collection of > benchmarks? > > - Charlie > Of course! It is basically just a test of IO#getc or StringIO#getc along with comparison of those characters with some fixed characters. I think it is a good representation of LL(1) character parsing/lexing. LR/LALR/DFA/NFA character parsing/lexing will have similar operations. Instead of recursive code, you'll have a state machine/stack for those. Also, for any of these, table lookup could be done instead of comparison, but it would cost a bit of memory for character sets (probably slower too). Most people coding in ruby use Regexp (DFA or NFA parsing itself?) at the character-level, so the above isn't necessarily representative. I just find using Regexp with an IO more trouble than it is worth when writing a clean parser/lexer (generator). Eric ------ art_4304_15657722.1203261087188 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Feb 16, 2008 8:51 PM, Charles Oliver Nutter <<a href ailto:charles.nutter / sun.com">charles.nutter / sun.com</a>> wrote:<br><div class mail_quote"><blockquote class mail_quote" styleorder-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> <div class h2E3d">Eric Mahurin wrote:<br>> I'd like to bring up the issue of how characters are represented in ruby<br>> 1.9 from a performance standpoint. In a recent ruby-quiz (parsing<br>> JSON), the fastest pure-ruby solution was simply an LL(1) parser that<br> > looked at one character at a time (it beat various Regexp solutions).<br>> With ruby 1.9, the runtime increased by 4X making it a slow solution. A<br>> simple benchmark is at the end of this message that counts words in an<br> > LL(1) fashion. With ruby 1.8.6, it can could the words in Homer's Iliad<br>> in 1.46s on my machine and in ruby 1.9 (from ubuntu gutsy) it takes<br>> 52.87s (36X increase in runtime).<br><br></div>Interesting benchmark; may I include it in JRuby's collection of benchmarks?<br> <br>- Charlie<br></blockquote></div><br>Of course! It is basically just a test of IO#getc or StringIO#getc along with comparison of those characters with some fixed characters. I think it is a good representation of LL(1) character parsing/lexing. LR/LALR/DFA/NFA character parsing/lexing will have similar operations. Instead of recursive code, you'll have a state machine/stack for those. Also, for any of these, table lookup could be done instead of comparison, but it would cost a bit of memory for character sets (probably slower too).<br> <br>Most people coding in ruby use Regexp (DFA or NFA parsing itself?) at the character-level, so the above isn't necessarily representative. I just find using Regexp with an IO more trouble than it is worth when writing a clean parser/lexer (generator).<br> <br>Eric<br><br><br> ------ art_4304_15657722.1203261087188--