> Maybe, but then making a fast parser wouldn't be any fun :)

Since I ran my first preliminary benchmark I have been asking myself
how big the advantage of a C-based parser would actually be. So I
elaborated a little bit on this question. In order to also answer the
question how your solutions "scale", I cleaned up my benchmarks a
little bit. The following includes all submissions that I could make
run with ruby19 -- for whatever reason. I don't have json for ruby18
installed, which is why I didn't run this test with ruby18.

The objects are generated before the test. The tests are run in a
tight loop, the influence of the benchmarking code should thus be
rather marginal.

Objects were generated the JSON representation of which adds up to
about 2MB in 4 different chunk sizes ranging from about 45 to 900
bytes. The object set is identical for all solutions, the numbers are
thus quite comparable. Since the figures differ slightly from Eric
Mahurin's benchmark it's possible that I did something wrong. But in
this case I did it equally wrong for all solutions. The code is down
below.

Regards,
Thomas.


Input chunks:
10: n=43475 avg.size=46.01 tot.size=2000236
20: n=12856 avg.size=155.61 tot.size=2000543
30: n=4897 avg.size=408.51 tot.size=2000483
40: n=2236 avg.size=894.47 tot.size=2000045



Ruby19 json
      user     system      total        real
10  2.274000   0.000000   2.274000 (  2.294000)
20  1.402000   0.000000   1.402000 (  1.432000)
30  1.041000   0.000000   1.041000 (  1.061000)
40  1.282000   0.000000   1.282000 (  1.302000)

10 871942 chars/sec (2000236/2.29)
20 1397027 chars/sec (2000543/1.43)
30 1885469 chars/sec (2000483/1.06)
40 1536132 chars/sec (2000045/1.30)


"solution_tml.rb"
      user     system      total        real
10  8.452000   0.010000   8.462000 (  8.633000)
20  6.570000   0.000000   6.570000 (  6.599000)
30  6.068000   0.000000   6.068000 (  6.119000)
40  5.659000   0.000000   5.659000 (  5.698000)

10 231696 chars/sec (2000236/8.63)
20 303158 chars/sec (2000543/6.60)
30 326929 chars/sec (2000483/6.12)
40 351008 chars/sec (2000045/5.70)


"solution_tml_pb.rb" (modified by P Bonzini)
      user     system      total        real
10  8.151000   0.000000   8.151000 (  8.192000)
20  5.849000   0.000000   5.849000 (  5.879000)
30  5.307000   0.000000   5.307000 (  5.337000)
40  5.238000   0.000000   5.238000 (  5.268000)

10 244169 chars/sec (2000236/8.19)
20 340286 chars/sec (2000543/5.88)
30 374832 chars/sec (2000483/5.34)
40 379659 chars/sec (2000045/5.27)


"solution_eric_i.rb"
      user     system      total        real
10158.318000   0.040000 158.358000 (158.798000)
20162.133000   0.030000 162.163000 (162.845000)
30170.305000   0.030000 170.335000 (170.525000)
40193.187000   0.070000 193.257000 (193.458000)

10 12596 chars/sec (2000236/158.80)
20 12284 chars/sec (2000543/162.85)
30 11731 chars/sec (2000483/170.53)
40 10338 chars/sec (2000045/193.46)


"solution_eric_mahurin3.rb"
      user     system      total        real
10  7.631000   0.000000   7.631000 (  7.641000)
20  6.319000   0.000000   6.319000 (  6.329000)
30  6.179000   0.000000   6.179000 (  6.179000)
40  5.769000   0.000000   5.769000 (  5.778000)

10 261776 chars/sec (2000236/7.64)
20 316091 chars/sec (2000543/6.33)
30 323755 chars/sec (2000483/6.18)
40 346148 chars/sec (2000045/5.78)


"solution_james_gray.rb"
      user     system      total        real
10 13.820000   0.000000  13.820000 ( 13.890000)
20 12.117000   0.000000  12.117000 ( 12.138000)
30 12.909000   0.000000  12.909000 ( 12.918000)
40 15.051000   0.010000  15.061000 ( 15.082000)

10 144005 chars/sec (2000236/13.89)
20 164816 chars/sec (2000543/12.14)
30 154860 chars/sec (2000483/12.92)
40 132611 chars/sec (2000045/15.08)


"solution_justin_ethier.rb"
      user     system      total        real
10 17.025000   0.000000  17.025000 ( 17.025000)
20 17.915000   0.040000  17.955000 ( 17.985000)
30 28.001000   0.021000  28.022000 ( 28.041000)
40 51.253000   0.070000  51.323000 ( 51.394000)

10 117488 chars/sec (2000236/17.03)
20 111233 chars/sec (2000543/17.98)
30 71341 chars/sec (2000483/28.04)
40 38915 chars/sec (2000045/51.39)


"solution_paolo_bonzini.rb"
      user     system      total        real
10 11.036000   0.000000  11.036000 ( 11.036000)
20 17.045000   0.030000  17.075000 ( 17.104000)
30 32.717000   0.020000  32.737000 ( 32.857000)
40 69.119000   0.070000  69.189000 ( 69.310000)

10 181246 chars/sec (2000236/11.04)
20 116963 chars/sec (2000543/17.10)
30 60884 chars/sec (2000483/32.86)
40 28856 chars/sec (2000045/69.31)


"solution_steve.rb"
      user     system      total        real
10210.152000   0.040000 210.192000 (210.573000)
20215.260000   0.060000 215.320000 (215.590000)
30223.201000   0.110000 223.311000 (228.368000)
40241.257000   0.260000 241.517000 (248.868000)

10 9499 chars/sec (2000236/210.57)
20 9279 chars/sec (2000543/215.59)
30 8759 chars/sec (2000483/228.37)
40 8036 chars/sec (2000045/248.87)



Benchmark code:

require 'benchmark'
# require 'json/pure'
require 'json'

N = 2000
S = [10, 20, 30, 40]

# This is a slightly enhanced version of Ara's object generator.
# Objects are generated via RandomObject.generate(nil, DEPTH)
# -- the first argument defines which object types are eligible
# and can be ignored in this context.
require 'tml/random-object'

puts 'Preparing objects ...'
sizes   = Hash.new
objects = S.inject({}) do |h, s|
    size = 0
    a = h[s] = []
    n = N * 1000
    while size < n
        o = RandomObject.generate(nil, s)
        j = o.to_json
        a << [o, j]
        size += j.size
    end
    sizes[s] = size.to_f
    h
end

throughput = Hash.new {|h, k| h[k] = Hash.new(0)}

ARGV.each do |arg|
    p arg
    require arg

    parser = JSONParser.new

    throughput = []
    Benchmark.bm do |b|
        S.each do |s|
            t = b.report(s.to_s) do |sn|
                objects[s].each do |o, j|
                    if o != parser.parse(j)
                        raise RuntimeError
                    end
                end
            end
            throughput << "%s %d chars/sec (%d/%0.2f)" % [s,
sizes[s] / t.real, sizes[s], t.real]
        end
    end
    puts
    puts throughput.join("\n")
    puts
    puts

end

objects.each do |s, z|
    puts "%s: n=%d avg.size=%0.2f tot.size=%d" %
    [s, z.size, sizes[s].to_f / z.size, sizes[s]]

end
puts