------art_11671_1495614.1202238027096
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Feb 5, 2008 11:44 AM, tho_mica_l <micathom / gmail.com> wrote:

> > Maybe, but then making a fast parser wouldn't be any fun :)
>
> Since the figures differ slightly from Eric
> Mahurin's benchmark it's possible that I did something wrong. But in
> this case I did it equally wrong for all solutions. The code is down
> below.


We probably should probably assume all of these benchmarks have +-50%
error.  The performance is highly data-set and phase-of-the-moon dependent.
You can still judge whether something has non-linear performance (i.e.
quadratic runtime) or judge whether one solution is 5-10X faster than
another.  But, if two solutions are within 2X of each other in a benchmark,
I don't think there is a clear winner.

It does look like some solutions have quadratic runtime on ruby 1.9.  I
didn't observe this on 1.8.6.

I added all of the unit tests I found in this thread, plus this one:

 def test_int_parsing
   assert_same(0,     @parser.parse("0"))
   assert_same(42,      @parser.parse("42"))
   assert_same(-13,     @parser.parse("-13"))
 end

and removed these that don't seem correct:

   #assert_raise(RuntimeError) { @parser.parse(%{"\u0022; p 123;
\u0022Busted"}) }
   #assert_equal("\\u0022; p 123; \u0022Busted",
   #            @parser.parse(%{"\\u0022; p 123; \\u0022Busted"}))

Here is a tally of failures(F) and errors(F) using this expanded unit test
suite:

ch/s    F E  author/gem
----    - -  ----------
-       5 0  Pawel Radecki (RE, recursive descent)
-       6 2  ghostwheel (ghostwheel)
1226    3 2  James Edward Gray II (peggy)
3214    5 1  Justin Ethier (RE lexer, ruby eval, fixed numbers)
4054    0 0  Eric Mahurin (Grammar0, no lexer, no parser generation)
4078    2 0  Eric I (Treetop, unicode broken)
6534    2 0  Steve (Treetop, mismatches in benchmark)
8313    1 1  Clifford Heath (Treetop, removed handling of "\/")
17320   0 0  Alexander Stedile (RE, recursive descent)
54586   0 0  Eric Mahurin (Grammar, no lexer, v0.5)
137989  2 1  Paolo Bonzini (RE, recursive descent)
166041  2 1  Thomas Link (RE lexer, ruby eval, ruby 1.9 results)
186042  5 0  James Edward Gray II (RE, recursive descent)
220289  1 7* json
223486  0 0  Eric Mahurin (Grammar, no lexer, unreleased)
224823  6 0  fjson (uses C extensions)
287292  5 0  James Edward Gray II (RE, recursive, Eric optimized)
333368  3 0  Thomas Link & Paolo Bonzini (RE + eval, unicode broken)
388670  0 0  Eric Mahurin (recursive descent)
553081  4 9  Eric Mahurin (Grammar, no lexer, unreleased, ruby2cext)
1522250 0 7* json (w/ C extensions)

For the json gem, all of the failures happen because the tests are invalid -
top-level json should only be an array or an object.

My Grammar with ruby2cext didn't work well with unit testing because it
didn't handle creating the parser multiple times.  Need to fix that.

Has anyone been able to benchmark the ghostwheel json parser?  I would like
to see how well it does.

Here is the complete set of unit tests I used:

require "test/unit"

class TestJSONParser < Test::Unit::TestCase
 def setup
   @parser  SONParser.new
 end

 def test_keyword_parsing
   assert_equal(true,  @parser.parse("true"))
   assert_equal(false, @parser.parse("false"))
   assert_equal(nil,   @parser.parse("null"))
 end

 def test_number_parsing
   assert_equal(42,      @parser.parse("42"))
   assert_equal(-13,     @parser.parse("-13"))
   assert_equal(3.1415,  @parser.parse("3.1415"))
   assert_equal(-0.01,   @parser.parse("-0.01"))

   assert_equal(0.2e1,   @parser.parse("0.2e1"))
   assert_equal(0.2e+1,  @parser.parse("0.2e+1"))
   assert_equal(0.2e-1,  @parser.parse("0.2e-1"))
   assert_equal(0.2E1,   @parser.parse("0.2e1"))
 end

 def test_string_parsing
   assert_equal(String.new,          @parser.parse(%Q{""}))
   assert_equal("JSON",              @parser.parse(%Q{"JSON"}))

   assert_equal( %Q{nested "quotes"},
                 @parser.parse('"nested \"quotes\""') )
   assert_equal("\n",                @parser.parse(%Q{"\\n"}))
   assert_equal( "a",
                 @parser.parse(%Q{"\\u#{"%04X" % ?a}"}) )
 end

 def test_array_parsing
   assert_equal(Array.new, @parser.parse(%Q{[]}))
   assert_equal( ["JSON", 3.1415, true],
                 @parser.parse(%Q{["JSON", 3.1415, true]}) )
   assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
 end

 def test_object_parsing
   assert_equal(Hash.new, @parser.parse(%Q{{}}))
   assert_equal( {"JSON" 3.1415, "data" true},
                 @parser.parse(%Q{{"JSON": 3.1415, "data": true}}) )
   assert_equal( { "Array"  [1, 2, 3],
                   "Object" {"nested" "objects"} },
                 @parser.parse(<<-END_OBJECT) )
   {"Array": [1, 2, 3], "Object": {"nested": "objects"}}
   END_OBJECT
 end

 def test_parse_errors
   assert_raise(RuntimeError) { @parser.parse("{") }
   assert_raise(RuntimeError) { @parser.parse(%q{{"key": true false}}) }

   assert_raise(RuntimeError) { @parser.parse("[") }
   assert_raise(RuntimeError) { @parser.parse("[1,,2]") }

   assert_raise(RuntimeError) { @parser.parse(%Q{"}) }
   assert_raise(RuntimeError) { @parser.parse(%Q{"\\i"}) }

   assert_raise(RuntimeError) { @parser.parse("$1,000") }
   assert_raise(RuntimeError) { @parser.parse("1_000") }
   assert_raise(RuntimeError) { @parser.parse("1K") }

   assert_raise(RuntimeError) { @parser.parse("unknown") }
 end

 def test_int_parsing
   assert_same(0,     @parser.parse("0"))
   assert_same(42,      @parser.parse("42"))
   assert_same(-13,     @parser.parse("-13"))
 end

 def test_more_numbers
   assert_equal(5, @parser.parse("5"))
   assert_equal(-5, @parser.parse("-5"))
   assert_equal 45.33, @parser.parse("45.33")
   assert_equal 0.33, @parser.parse("0.33")
   assert_equal 0.0, @parser.parse("0.0")
   assert_equal 0, @parser.parse("0")
   assert_raises(RuntimeError) { @parser.parse("-5.-4") }
   assert_raises(RuntimeError) { @parser.parse("01234") }
   assert_equal(0.2e1, @parser.parse("0.2E1"))
   assert_equal(42e10, @parser.parse("42E10"))
 end

 def test_more_string
   assert_equal("abc\befg", @parser.parse(%Q{"abc\\befg"}))
   assert_equal("abc\nefg", @parser.parse(%Q{"abc\\nefg"}))
   assert_equal("abc\refg", @parser.parse(%Q{"abc\\refg"}))
   assert_equal("abc\fefg", @parser.parse(%Q{"abc\\fefg"}))
   assert_equal("abc\tefg", @parser.parse(%Q{"abc\\tefg"}))
   assert_equal("abc\\efg", @parser.parse(%Q{"abc\\\\efg"}))
   assert_equal("abc/efg", @parser.parse(%Q{"abc\\/efg"}))
 end

 def test_more_object_parsing
   assert_equal({'a','b'}, @parser.parse(%Q{{   "a" : 2 , "b":4 }}))
   assert_raises(RuntimeError) { @parser.parse(%Q{{   "a" : 2, }}) }
   assert_raises(RuntimeError) { @parser.parse(%Q{[   "a" , 2, ]}) }
 end

 def test_alexander
   assert_raise(RuntimeError) { @parser.parse(%Q{"a" "b"}) }
 end

 def test_thomas
   assert_raise(RuntimeError) { @parser.parse(%{p "Busted"}) }
   assert_raise(RuntimeError) { @parser.parse(%{[], p "Busted"}) }
   assert_raise(RuntimeError) { @parser.parse(%{[p "Busted"]}) }
   assert_raise(RuntimeError) { @parser.parse(%{{1 STDOUT.puts("Busted")}})
}
   #assert_raise(RuntimeError) { @parser.parse(%{"\u0022; p 123;
\u0022Busted"}) }
   assert_raise(RuntimeError) { @parser.parse(%{"" p 123; ""}) }
   #assert_equal("\\u0022; p 123; \u0022Busted",
   #            @parser.parse(%{"\\u0022; p 123; \\u0022Busted"}))
   assert_equal('#{p 123}', @parser.parse(%q{"#{p 123}"}))
   assert_equal(['#{`ls -r`}'], @parser.parse(%q{["#{`ls -r`}"]}))
   assert_equal('#{p 123}', @parser.parse(%q{"\\u0023{p 123}"}))
   assert_equal('#{p 123}', @parser.parse(%q{"\u0023{p 123}"}))
 end

 def test_thomas2
   assert_raise(RuntimeError) { @parser.parse(%{[], p "Foo"}) }
   assert_raise(RuntimeError) { @parser.parse(%{""; p 123; "Foo"}) }
   assert_raise(RuntimeError) { @parser.parse(%{"" p 123; ""}) }

   assert_raises(RuntimeError) { @parser.parse("-5.-4") }
   assert_raises(RuntimeError) { @parser.parse(%Q{{   "a" : 2, }}) }
   assert_raise(RuntimeError) { @parser.parse(%q{true false}) }
 end

end

------art_11671_1495614.1202238027096--