Not to interrupt this thread, but what is rockit? I've seen that a couple times on this list and being new to Ruby is it something within Ruby or a general technology someone is trying to implement through Ruby? Or neither? TIA, Derek ----- Original Message ----- From: "Robert Feldt" <feldt / ce.chalmers.se> To: "ruby-talk ML" <ruby-talk / ruby-lang.org> Cc: <ruby-talk / netlab.co.jp> Sent: Thursday, June 14, 2001 6:43 AM Subject: [ruby-talk:16473] Re: Opinion sought: parsing non-regula`r languages > On Thu, 14 Jun 2001, Dave Thomas wrote: > > > Robert Feldt <feldt / ce.chalmers.se> writes: > > > > > Any comments or ideas? Which solution would you prefer if you'd get to > > > choose? > > > > I know next to nothing about parsing and lexing, but it seems to me > > that you'd spend an inordinate amount of effort trying to produce a > > lexer that could deal with the kind of things that people dream up for > > tokens. Identifying regular expressions is a particularly hairy case > > > Yes, sounds reasonable. > > > that comes to mind: /[[]/ and friends are all special cases. > > > I'm not sure we're talking about the same thing here but wouldn't > > /\/((\\\/)|[^\/])*\/[iomx]*/ > > cut it? Thats what I use to find regexps in rockit grammars so I hope I'm > not too far off the mark... > > > So, I'd be in favor of providing simple hooks for adding my own code > > to the lexer. If this isn't language independent, then have a > > provision to add lexer chunks in multiple languages: you can get it > > all working in Ruby, then when you want to produce your C-based > > parser, give a command line option and it will chose your C-based > > lexer chunk. > > > Thanks for your opinion; this is close to what I feel is the right > thing. I'm thinking something like: > > Grammar Ruby > Tokenizers (Ruby) # First one is default. No name needed if only one. > ... > Tokenizers (C) > ... > Tokens > ... > > And its a good thing not to spend too much time on this issue since people > will not very often work with non-regular languages. I'm glad I asked for > your opinion. > > > In the code example you showed for this (S2), you had a dedicated call > > per symbol. I assume this means that these chunks must be called often > > on a trial and error basis, looking for a match. You might be able to > > make this more efficient by (a) providing some context as part of the call > > and/or (b) allowing the lexing chunk to return the type of symbol > > found: > > > Yes, the API needs more thinking. However, the penalty might not be > as high as you'd think since what tokens might match can be inferred from > the parsing context (the current production/rule being applied). Often > this will limit the number of tokens that can match. Its a good > thing though to encourage that only the minimum is taken care of by a > tokenizer. If we know that there must be a leading % we can generate > faster lexers. > > > Tokenizers > > def delimited_string > > s, cp = @string, @current_position > > type = case s[cp] > > when 'q' then DelimQString > > when 'Q' then DelimIString > > when 'x' then DelimXString > > when 'r' then Regexp > > else return nil > > end > > # .. more stuff > > return type, end_pos, position_of_next_unconsumed_char > > end > > > Yes, thats better than my example. Thanks. > > Thanks, > > Robert >