Robert Feldt <feldt / ce.chalmers.se> writes:

> Any comments or ideas? Which solution would you prefer if you'd get to
> choose?

I know next to nothing about parsing and lexing, but it seems to me
that you'd spend an inordinate amount of effort trying to produce a
lexer that could deal with the kind of things that people dream up for
tokens. Identifying regular expressions is a particularly hairy case
that comes to mind: /[[]/ and friends are all special cases.

So, I'd be in favor of providing simple hooks for adding my own code
to the lexer. If this isn't language independent, then have a
provision to add lexer chunks in multiple languages: you can get it
all working in Ruby, then when you want to produce your C-based
parser, give a command line option and it will chose your C-based
lexer chunk.

In the code example you showed for this (S2), you had a dedicated call
per symbol. I assume this means that these chunks must be called often
on a trial and error basis, looking for a match. You might be able to
make this more efficient by (a) providing some context as part of the call
and/or (b) allowing the lexing chunk to return the type of symbol
found:

  Tokens
    Blank = /\s+/  :skip:

    calculated = %#{delimited_string}    << ie '%' followed by a match
                                            by the delimited_symbol routine

  Tokenizers
     def delimited_string
       s, cp = @string, @current_position
       type = case s[cp]
              when 'q' then DelimQString
              when 'Q' then DelimIString
              when 'x' then DelimXString
              when 'r' then Regexp
              else return nil
              end
       # .. more stuff
       return type
     end


But, as I said, I know little about the complexities of such a beast.



Dave