Not to interrupt this thread, but what is rockit?  I've seen that a couple
times on this list and being new to Ruby is it something within Ruby or a
general technology someone is trying to implement through Ruby?  Or neither?

TIA,
Derek
----- Original Message -----
From: "Robert Feldt" <feldt / ce.chalmers.se>
To: "ruby-talk ML" <ruby-talk / ruby-lang.org>
Cc: <ruby-talk / netlab.co.jp>
Sent: Thursday, June 14, 2001 6:43 AM
Subject: [ruby-talk:16473] Re: Opinion sought: parsing non-regula`r
languages


> On Thu, 14 Jun 2001, Dave Thomas wrote:
>
> > Robert Feldt <feldt / ce.chalmers.se> writes:
> >
> > > Any comments or ideas? Which solution would you prefer if you'd get to
> > > choose?
> >
> > I know next to nothing about parsing and lexing, but it seems to me
> > that you'd spend an inordinate amount of effort trying to produce a
> > lexer that could deal with the kind of things that people dream up for
> > tokens. Identifying regular expressions is a particularly hairy case
> >
> Yes, sounds reasonable.
>
> > that comes to mind: /[[]/ and friends are all special cases.
> >
> I'm not sure we're talking about the same thing here but wouldn't
>
> /\/((\\\/)|[^\/])*\/[iomx]*/
>
> cut it? Thats what I use to find regexps in rockit grammars so I hope I'm
> not too far off the mark...
>
> > So, I'd be in favor of providing simple hooks for adding my own code
> > to the lexer. If this isn't language independent, then have a
> > provision to add lexer chunks in multiple languages: you can get it
> > all working in Ruby, then when you want to produce your C-based
> > parser, give a command line option and it will chose your C-based
> > lexer chunk.
> >
> Thanks for your opinion; this is close to what I feel is the right
> thing. I'm thinking something like:
>
>   Grammar Ruby
>     Tokenizers (Ruby) # First one is default. No name needed if only one.
>       ...
>     Tokenizers (C)
>       ...
>     Tokens
>       ...
>
> And its a good thing not to spend too much time on this issue since people
> will not very often work with non-regular languages. I'm glad I asked for
> your opinion.
>
> > In the code example you showed for this (S2), you had a dedicated call
> > per symbol. I assume this means that these chunks must be called often
> > on a trial and error basis, looking for a match. You might be able to
> > make this more efficient by (a) providing some context as part of the
call
> > and/or (b) allowing the lexing chunk to return the type of symbol
> > found:
> >
> Yes, the API needs more thinking. However, the penalty might not be
> as high as you'd think since what tokens might match can be inferred from
> the parsing context (the current production/rule being applied). Often
> this will limit the number of tokens that can match. Its a good
> thing though to encourage that only the minimum is taken care of by a
> tokenizer. If we know that there must be a leading % we can generate
> faster lexers.
>
> >   Tokenizers
> >      def delimited_string
> >        s, cp = @string, @current_position
> >        type = case s[cp]
> >               when 'q' then DelimQString
> >               when 'Q' then DelimIString
> >               when 'x' then DelimXString
> >               when 'r' then Regexp
> >               else return nil
> >               end
> >        # .. more stuff
> >        return type, end_pos, position_of_next_unconsumed_char
> >      end
> >
> Yes, thats better than my example. Thanks.
>
> Thanks,
>
> Robert
>