On Wed, Jan 22, 2014 at 2:27 PM, Martin Hansen <lists / ruby-forum.com> wrote:
> Serializing and deserializing is a bottleneck and msgpack is fast:
>
> maasha@mel:~$ cat benchmark.rb
> #!/usr/bin/env ruby
>
> require 'benchmark'
> require 'msgpack'
>
> n = 100_000
>
> h = {
>   zero:  0,
>   one:   1,
>   two:   2,
>   three: 3,
>   four:  4,
>   five:  5,
>   six:   6
> }
>
> Benchmark.bm() do |x|
>   x.report("Marshal") { n.times { m = Marshal.dump h; u = Marshal.load m
> } }
>   x.report("MsgPack") { n.times { m = h.to_msgpack;   u =
> MessagePack.unpack m } }
> end
>
> maasha@mel:~$ ./benchmark.rb
>        user     system      total        real
> Marshal  1.770000   0.000000   1.770000 (  1.774334)
> MsgPack  0.900000   0.020000   0.920000 (  0.921232)

Impressive. Although it does not look as if Marshal was really slow.

> Now concerning CAT: This is a placeholder for commands that will read in
> data in particular formats from specified files. I have found time and
> time again that it is extremely useful (using Biopieces 1.0) to be able
> to process data like this:
>
> p = Pipe.new
> p << CAT["file1"]
> p << GREP["/foobar/"]
> p << CAT["file2]
> p.execute_processes
>
> I know that this is not the way UNIX cat works, and that BASH can do
> stuff like this neatly. But this is not UNIX and UNIX cat. I really
> would like to let ALL commands read any incoming records and emit them
> again (pending given options) along with any newly created records.

Well, but then you need a way to decide whether you want to read from
the pipe or not because the first CAT will always block on $stdio.  I
suggest to do it like the UNIX cat and use "-" as identifier for
stdin:

CAT = lambda do |*args|
  args << '-' if args.empty?

  lambda do |io_in, io_out|
    args.each do |file|
      case file
      when "-"
        io_in.each_line {|line| line.chomp!; io_out.puts line}
      else
        File.foreach(file) {|line| io_out.puts line}
      end
    end
  end
end

Now you example above looks like this

p = Pipe.new
p << CAT["file1"]
p << GREP["/foobar/"]
p << CAT["-", "file2]
p.execute_processes

:-)

For using messages to pass through pipes I suggest to write a wrapper
class around Pipe which uses lambdas with a single argument and pass
the return value through the chain.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/