Hi .. On Wednesday 26 January 2005 09:08, Hugh Sasse Staff Elec Eng wrote: > I seem to have run into my parsing problem again. Whatever I'm > doing I usually end up having to parse non-simplistic input, and I'm > still not happy about the apparently available solutions to this. > So I'm wondering what other people do. > My personal solution to this is to use Coco/R, an LL(1) scanner/generator. You can find more information at: http://www.scifac.ru.ac.za/coco The primary advantage of this approach, IMHO, is that all of the grammar / scanning rules are in a single file (rather than the lex/yacc approach). This makes the grammar quite easy to read and extend, once you are familiar with the process. Ryan Davies has a pure ruby version, and I have a ruby extension version. Both seem to work well for little languages. > [1] I find that thinking in the manner of a shift/reduce parser is > particularly unnatural to me. ... Maybe there is something I can read which > will turn the problem around, so it becomes easy to handle? Pat Terry has a book "Compilers and Compiler Generators" that covers LL(1) (and other) topics very well. You can find it at: http://www.scifac.ru.ac.za/compilers/ The primary disadvantage of Coco/R is the LL(1) part. This means that your grammar needs to be fairly well formed and not arbitrarily complex. As an example, Ruby can not, as far as I have tried, be converted into an LL(1) grammar, though C can. A simple example of the ruby grammar (this is for the famous four function calculator) for my extension library. Note that this will generate a Ruby extension. When you compile and link, you can use it in Ruby like this: # ---( test.rb )------------- require 'Calc' f = File.readlines("calc.inp") t = Calc.new t.run(f) if t.success puts "parsed ok!" t.capture.each { |ans| puts " ans==#{ans}" } else puts "Errors ::" t.errs.each { |err| puts " --> #{err}" } end # ---( calc.inp )----------- var a,b,c,d; write 1+(2*3)+4; write 100/10; a := 37-12-(4*5); write a; b := a*16; write b*2 # ---( calc.atg )----------- $C /* Generate Main Module */ COMPILER Calc #define upcase(c) ((c >= 'a' && c <= 'z')? c-32:c) int VARS[10000]; int get_spix() { char name[20]; LEX_S(name, sizeof(name) - 1); if (strlen(name) >= 2) return 26*(upcase(name[1])-'A')+(upcase(name[0])-'A'); else return (upcase(name[0])-'A'); } int get_number() { char name[20]; LEX_S(name, sizeof(name) - 1); return atoi(name); } void new_var(int spix) { VARS[spix] = 0; } int get_var(int spix) { return VARS[spix]; } void write_val(int val) { char tmp[20]; sprintf(tmp, "%d", val); t_capture_output(tmp); } void set_var(int spix, int val) { VARS[spix] = val; } IGNORE CASE CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". digit = "0123456789". eol = CHR(13) . lf = CHR(10) . COMMENTS FROM '--' TO eol IGNORE eol + lf TOKENS ident = letter {letter | digit} . number = digit {digit} . PRODUCTIONS Calc = [Declarations] StatSeq . Declarations = (. int spix; .) 'VAR' Ident <&spix> (. new_var(spix); .) { ',' Ident <&spix> (. new_var(spix); .) } ';'. StatSeq = Stat {';' Stat}. Stat = (. int spix, val; .) | "WRITE" Expr <&val> (. write_val(val); .) | Ident <&spix> ":=" Expr <&val> (. set_var(spix, val); .) . Expr <int *exprVal> = (. int termVal; .) Term <exprVal> { '+' Term <&termVal> (. *exprVal += termVal; .) | '-' Term <&termVal> (. *exprVal -= termVal; .) } . Term <int *termVal> = (. int factVal; .) Fact <termVal> { '*' Fact <&factVal> (. *termVal *= factVal; .) | '/' Fact <&factVal> (. *termVal /= factVal; .) } . Fact <int *factVal> = (. int spix; .) Ident <&spix> (. *factVal = get_var(spix); .) | number (. *factVal = get_number(); .) | '(' Expr <factVal> ')' . Ident <int *spix> = ident (. *spix = get_spix(); .) . END Calc. I hope that this helps. Regards, -- -mark. (probertm at acm dot org)