------ art_7990_1327777.1202187756281 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Feb 4, 2008 7:29 PM, James Gray <james / grayproductions.net> wrote: > Here's my own recursive descent parser (based on the not-quite-correct > quiz tests): > Hi James, I benchmarked your code. I also was curious how well a good hand-crafted Regexp recursive descent parser (using the StringScanner) would compare to my hand-crafted LL(1) (1 character lookahead) parser. So, I took your code and applied some of the same optimizations that I used: - minimize method calls (inline at least where a method is called once) - minimize object creation (put results in the callers output buffer instead of returning an AST) - minimize exceptions Here are the benchmark results with this and the other parsers you posted (couldn't get the ghostwheel parser to work well): ch/s author/gem ---- ---------- - Pawel Radecki (RE, couldn't parse benchmark JSON) - ghostwheel (ghostwheel, couldn't parse benchmark JSON) 1226 James Edward Gray II (peggy, fails one test) 3214 Justin Ethier (RE lexer, ruby eval, fixed number parsing) 4054 Eric Mahurin (Grammar0, no lexer, no parser generation) 4078 Eric I (Treetop, unicode broken) 6534 oksteev (Treetop, mismatches in benchmark) 8313 Clifford Heath (Treetop, had to remove handling of "\/") 17320 Alexander Stedile (RE, recursive descent) 54586 Eric Mahurin (Grammar, no lexer, v0.5) 137989 Paolo Bonzini (RE, recursive descent) 166041 Thomas Link (RE lexer, ruby eval, ruby 1.9 results) 186042 James Edward Gray II (RE, recursive descent) 220289 json (pure ruby version) 223486 Eric Mahurin (Grammar, no lexer, unreleased) 224823 fjson (uses C extensions) 287292 James Edward Gray II (RE, recursive descent, Eric optimized) 333368 Thomas Link & Paolo Bonzini (RE + eval, unicode broken) 388670 Eric Mahurin (recursive descent) 553081 Eric Mahurin (Grammar, no lexer, unreleased, w/ ruby2cext) 1522250 json (w/ C extensions) I'd like to see a RACC parser for JSON. Here is the optimized version of your recursive-descent Regexp/StringScanner parser: require "strscan" class JSONParser def parse(input) @input tringScanner.new(input.strip) parse_value(out ) or error("Illegal JSON value") out[0] end private def parse_value(out) if @input.scan(/\{\s*/) object } kv ] while @input.scan(/"/) parse_string(kv) @input.scan(/\s*:\s*/) or error("Expecting object separator") parse_value(kv) or error("Illegal JSON value") object[kv[0]] v[1] @input.scan(/\s*,\s*/) or break kv.clear end @input.scan(/\s*\}\s*/) or error("Unclosed object") out << object elsif @input.scan(/\[\s*/) array ] while parse_value(array) @input.scan(/\s*,\s*/) or break end @input.scan(/\s*\]\s*/) or error("Unclosed array") out << array elsif @input.scan(/"/) parse_string(out) elsif @input.scan(/-?(?:0|[1-9]\d*)(?:\.\d+(?:[eE][+-]?\d+)?)?\b/) out << eval(@input.matched) elsif @input.scan(/\b(?:true|false|null)\b/) out << eval(@input.matched.sub("null", "nil")) end end def parse_string(out) string " while true if @input.scan(/[^\\"]+/) string << @input.matched elsif @input.scan(%r{\\["\\/]}) string << @input.matched[-1] elsif @input.scan(/\\[bfnrt]/) string << eval(%Q{"#{@input.matched}"}) elsif @input.scan(/\\u[0-9a-fA-F]{4}/) string << [Integer("0x#{@input.matched[2..-1]}")].pack("U") else break end end @input.scan(/"\s*/) or error("Unclosed string") out << string end def error(message) if @input.eos? raise "Unexpected end of input." else raise "#{message}: #{@input.peek(@input.string.length)}" end end end ------ art_7990_1327777.1202187756281--