Serializing and deserializing is a bottleneck and msgpack is fast:

maasha@mel:~$ cat benchmark.rb
#!/usr/bin/env ruby

require 'benchmark'
require 'msgpack'

n = 100_000

h = {
  zero:  0,
  one:   1,
  two:   2,
  three: 3,
  four:  4,
  five:  5,
  six:   6
}

Benchmark.bm() do |x|
  x.report("Marshal") { n.times { m = Marshal.dump h; u = Marshal.load m 
} }
  x.report("MsgPack") { n.times { m = h.to_msgpack;   u = 
MessagePack.unpack m } }
end

maasha@mel:~$ ./benchmark.rb
       user     system      total        real
Marshal  1.770000   0.000000   1.770000 (  1.774334)
MsgPack  0.900000   0.020000   0.920000 (  0.921232)


Now concerning CAT: This is a placeholder for commands that will read in 
data in particular formats from specified files. I have found time and 
time again that it is extremely useful (using Biopieces 1.0) to be able 
to process data like this:

p = Pipe.new
p << CAT["file1"]
p << GREP["/foobar/"]
p << CAT["file2]
p.execute_processes

I know that this is not the way UNIX cat works, and that BASH can do 
stuff like this neatly. But this is not UNIX and UNIX cat. I really 
would like to let ALL commands read any incoming records and emit them 
again (pending given options) along with any newly created records.


Cheers,


Martin

-- 
Posted via http://www.ruby-forum.com/.