------art_7990_1327777.1202187756281
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Feb 4, 2008 7:29 PM, James Gray <james / grayproductions.net> wrote:

> Here's my own recursive descent parser (based on the not-quite-correct
> quiz tests):
>

Hi James,

I benchmarked your code.  I also was curious how well a good hand-crafted
Regexp recursive descent parser (using the StringScanner) would compare to
my hand-crafted LL(1) (1 character lookahead) parser.  So, I took your code
and applied some of the same optimizations that I used:
  - minimize method calls (inline at least where a method is called once)
  - minimize object creation (put results in the callers output buffer
instead of returning an AST)
  - minimize exceptions

Here are the benchmark results with this and the other parsers you posted
(couldn't get the ghostwheel parser to work well):

ch/s    author/gem
----    ----------
-       Pawel Radecki (RE, couldn't parse benchmark JSON)
-       ghostwheel (ghostwheel, couldn't parse benchmark JSON)
1226    James Edward Gray II (peggy, fails one test)
3214    Justin Ethier (RE lexer, ruby eval, fixed number parsing)
4054    Eric Mahurin (Grammar0, no lexer, no parser generation)
4078    Eric I (Treetop, unicode broken)
6534    oksteev (Treetop, mismatches in benchmark)
8313    Clifford Heath (Treetop, had to remove handling of "\/")
17320   Alexander Stedile (RE, recursive descent)
54586   Eric Mahurin (Grammar, no lexer, v0.5)
137989  Paolo Bonzini (RE, recursive descent)
166041  Thomas Link (RE lexer, ruby eval, ruby 1.9 results)
186042  James Edward Gray II (RE, recursive descent)
220289  json (pure ruby version)
223486  Eric Mahurin (Grammar, no lexer, unreleased)
224823  fjson (uses C extensions)
287292  James Edward Gray II (RE, recursive descent, Eric optimized)
333368  Thomas Link & Paolo Bonzini (RE + eval, unicode broken)
388670  Eric Mahurin (recursive descent)
553081  Eric Mahurin (Grammar, no lexer, unreleased, w/ ruby2cext)
1522250 json (w/ C extensions)


I'd like to see a RACC parser for JSON.


Here is the optimized version of your recursive-descent Regexp/StringScanner
parser:


require "strscan"

class JSONParser

  def parse(input)
    @input  tringScanner.new(input.strip)
    parse_value(out) or error("Illegal JSON value")
    out[0]
  end

  private

  def parse_value(out)
    if @input.scan(/\{\s*/)
      object  }
      kv  ]
      while @input.scan(/"/)
        parse_string(kv)
        @input.scan(/\s*:\s*/) or error("Expecting object separator")
        parse_value(kv) or error("Illegal JSON value")
        object[kv[0]]  v[1]
        @input.scan(/\s*,\s*/) or break
        kv.clear
      end
      @input.scan(/\s*\}\s*/) or error("Unclosed object")
      out << object
    elsif @input.scan(/\[\s*/)
      array  ]
      while parse_value(array)
        @input.scan(/\s*,\s*/) or break
      end
      @input.scan(/\s*\]\s*/) or error("Unclosed array")
      out << array
    elsif @input.scan(/"/)
      parse_string(out)
    elsif @input.scan(/-?(?:0|[1-9]\d*)(?:\.\d+(?:[eE][+-]?\d+)?)?\b/)
      out << eval(@input.matched)
    elsif @input.scan(/\b(?:true|false|null)\b/)
      out << eval(@input.matched.sub("null", "nil"))
    end
  end

  def parse_string(out)
    string  "
    while true
      if @input.scan(/[^\\"]+/)
        string << @input.matched
      elsif @input.scan(%r{\\["\\/]})
        string << @input.matched[-1]
      elsif @input.scan(/\\[bfnrt]/)
        string << eval(%Q{"#{@input.matched}"})
      elsif @input.scan(/\\u[0-9a-fA-F]{4}/)
        string << [Integer("0x#{@input.matched[2..-1]}")].pack("U")
      else
        break
      end
    end
    @input.scan(/"\s*/) or error("Unclosed string")
    out << string
  end

  def error(message)
    if @input.eos?
      raise "Unexpected end of input."
    else
      raise "#{message}:  #{@input.peek(@input.string.length)}"
    end
  end
end

------art_7990_1327777.1202187756281--