Here's mine, done with Treetop. It also includes a
Readline-based interpretive checker. The generated
parser from Treetop has a slightly different interface,
so I've included JEG's test program with an adapter at
the top.
The nicest thing about using Treetop is how close the
grammar to the JSON spec :-).
I prefer to convert hash keys to symbols, but the test
cases don't allow that so I stripped out my .to_sym's.
Note that the test cases are rather limited in things
like white-space handling (and in fact the JSON spec is
actually incorrect, in that it doesn't define which rules
constitute tokens that may be separated by whitespace!)
Whitespace in Treetop must be handled explicitly, and it's
easy to miss a spot where it should be skipped, so the
tests should cover that.
I welched on full Unicode support as commented in my code,
but there's another test case you should apply, to parse
the string "\\u1234", which should throw an exception.
You'll see that my code is missing that exception, and
will misbehave instead :-).
It wasn't clear from the quiz or the JSON spec whether an
integer is valid JSON. I elected to accept any value, not
just an object or array.
Treetop now uses Polyglot, which loads the generated .rb
file if you've generated it, or the .treetop file if not.
Clifford Heath.
First, the interactive test program:
require 'treetop'
require 'json' # Note that we can require the Treetop file directly.
require 'readline'
parser = JsonParser.new
while line = Readline::readline("? ", [])
begin
tree = parser.parse(line)
if tree
p tree.obj
else
puts parser.failure_reason
end
rescue => e
puts e
p e.backtrace
p tree if tree
end
end
puts
Now, my test adapter:
class JSONParser
def parse(text)
parser = JsonParser.new
p = parser.parse(text)
raise parser.failure_reason unless p
p.obj
end
end
Finally, the grammar itself:
# Treetop grammar for JSON for Ruby Quiz #155 by Clifford Heath.
grammar Json
rule json
value
end
rule object
'{' s pairs:pairs? s '}' s
{ def obj
pairs.empty? ? {} : pairs.obj
end
}
end
rule pairs
member rest:(s ',' s member)*
{ def obj
rest.elements.inject({eval(member.k.text_value) => member.value.obj}) { |h, e|
h[eval(e.member.k.text_value)] = e.member.value.obj
h
}
end
}
end
rule member # key/value pair of an object
k:string s ':' s value
end
rule array
'[' s e:elements? s ']'
{ def obj
e.empty? ? [] : e.obj
end
}
end
rule elements # elements of an array
value rest:(s ',' s value)*
{ def obj
rest.elements.inject([value.obj]) { |a, e|
a << e.value.obj
}
end
}
end
rule value
s alt:(string / number / object / array
/ 'true' { def obj; true; end }
/ 'false' { def obj; false; end }
/ 'null' { def obj; nil; end }
)
{ def obj; alt.obj; end }
end
rule string
'"' char* '"' { def obj
eval(
# Strip Unicode characters down to the chr equivalent.
# Note that I'm cheating here: '"\\u4321"' should assert,
# and there are cases that will succeed but corrupt the data.
# This should be handled in the "char" rule.
text_value.gsub(/\\u..../) { |unicode|
eval("0x"+unicode[2..-1]).chr
}
)
end
}
end
rule char
'\\' [\"\\\/bfnrt]
/ '\\u' hex hex hex hex
/ (![\\"] .)
end
rule hex
[0-9A-Fa-f]
end
rule number
int frac? exp? { def obj; eval(text_value); end }
end
rule int # Any integer
'-'? ([1-9] [0-9]* / '0')
{ def obj; eval(text_value); end }
end
rule frac # The fractional part of a floating-point number
'.' [0-9]+
end
rule exp # An exponent
[eE] [-+]? [0-9]+
end
rule s # Any amount of whtespace
[ \t\n\t]*
end
end