Hi .. 

 On Wednesday 26 January 2005 09:08, Hugh Sasse Staff Elec Eng wrote:
> I seem to have run into my parsing problem again.  Whatever I'm
> doing I usually end up having to parse non-simplistic input, and I'm
> still not happy about the apparently available solutions to this.
> So I'm wondering what other people do.
>
My personal solution to this is to use Coco/R, an LL(1) scanner/generator.  
You can find more information at:

  http://www.scifac.ru.ac.za/coco

The primary advantage of this approach, IMHO, is that all of the grammar / 
scanning rules are in a single file (rather than the lex/yacc approach).  
This makes the grammar quite easy to read and extend, once you are familiar 
with the process.  Ryan Davies has a pure ruby version, and I have a ruby 
extension version.  Both seem to work well for little languages.

> [1] I find that thinking in the manner of a shift/reduce parser is
> particularly unnatural to me.  ... Maybe there is something I can read which 
> will turn the problem around, so it becomes easy to handle?

Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1) 
(and other) topics very well.  You can find it at:

  http://www.scifac.ru.ac.za/compilers/

The primary disadvantage of Coco/R is the LL(1) part.  This means that your 
grammar needs to be fairly well formed and not arbitrarily complex.  As an 
example, Ruby can not, as far as I have tried, be converted into an LL(1) 
grammar, though C can.

A simple example of the ruby grammar (this is for the famous four function 
calculator) for my extension library.  Note that this will generate a Ruby 
extension.  When you compile and link, you can use it in Ruby like this:

# ---( test.rb )-------------
require 'Calc'

f = File.readlines("calc.inp")
t = Calc.new
t.run(f)

if t.success
   puts "parsed ok!"
   t.capture.each { |ans| puts " ans==#{ans}" }
else
   puts "Errors ::"
   t.errs.each { |err| puts " --> #{err}" }
end


# ---( calc.inp )-----------
var a,b,c,d;

write 1+(2*3)+4;
write 100/10;

a := 37-12-(4*5);
write a;
b := a*16;
write b*2



# ---( calc.atg )-----------
$C   /* Generate Main Module */
COMPILER Calc

#define upcase(c)       ((c >= 'a' && c <= 'z')? c-32:c)
int VARS[10000];

int get_spix()
{
  char name[20];
  LEX_S(name, sizeof(name) - 1);
  if (strlen(name) >= 2)
    return 26*(upcase(name[1])-'A')+(upcase(name[0])-'A');
  else
    return (upcase(name[0])-'A');
}

int get_number()
{
  char name[20];
  LEX_S(name, sizeof(name) - 1);
  return atoi(name);
}

void new_var(int spix)
{
  VARS[spix] = 0;
}

int get_var(int spix)
{
  return VARS[spix];
}

void write_val(int val)
{
  char tmp[20];

  sprintf(tmp, "%d", val);
  t_capture_output(tmp);
}

void set_var(int spix, int val)
{
  VARS[spix] = val;
}

IGNORE CASE

CHARACTERS
  letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
  digit  = "0123456789".
  eol    = CHR(13) .
  lf     = CHR(10) .

COMMENTS
  FROM '--' TO eol

IGNORE eol + lf

TOKENS
  ident  = letter {letter | digit} .
  number = digit {digit} .

PRODUCTIONS
  Calc =
    [Declarations] StatSeq .

  Declarations
    =                                     (. int spix; .)
       'VAR'
       Ident <&spix>                      (. new_var(spix); .)
       { ',' Ident <&spix>                (. new_var(spix); .)
       } ';'.

  StatSeq =
    Stat {';' Stat}.

  Stat
    =                                     (. int spix, val; .)
      | "WRITE" Expr <&val>               (. write_val(val); .)
      | Ident <&spix> ":=" Expr <&val>    (. set_var(spix, val); .) .

  Expr <int *exprVal>
    =                                     (. int termVal; .)
      Term <exprVal>
      {  '+' Term <&termVal>              (. *exprVal += termVal; .)
      |  '-' Term <&termVal>              (. *exprVal -= termVal; .)
      } .

  Term <int *termVal>
    =                                     (. int factVal; .)
      Fact <termVal>
      {  '*' Fact <&factVal>              (. *termVal *= factVal; .)
      |  '/' Fact <&factVal>              (. *termVal /= factVal; .)
      } .

  Fact <int *factVal>
    =                                     (. int spix; .)
         Ident <&spix>                    (. *factVal = get_var(spix); .)
      |  number                           (. *factVal = get_number(); .)
      | '(' Expr <factVal> ')' .

  Ident <int *spix>
    = ident                               (. *spix = get_spix(); .) .

END Calc.


I hope that this helps.

Regards,

-- 
-mark.  (probertm at acm dot org)