ara.t.howard / noaa.gov wrote:
> On Sat, 2 Dec 2006, Paul Lutus wrote:
>
> > Joel VanderWerf wrote:
> >
> >> Paul Lutus wrote:
> >>> ara.t.howard / noaa.gov wrote:
> >>>
> >>>> for years i've felt that i should be able to pipe numerical output into
> >>>> some unix command like so
> >>>>
> >>>>    cat list | mean
> >>>>    cat list | sum
> >>>>    cat list | minmax
> >>>>
> >>>> etc.
> >>>>
> >>>> and have never found one.  right now i'm building a ruby version -
> >>>> before i continue, does anyone know a standard unix or ruby version of
> >>>> this?
> >>>
> >>> It is so easy to create in Ruby, a matter of minutes, that it is not
> >>> terribly important to do the search you are suggesting.
> >>
> >> Disagree.
> >
> > It's a bit too late to disagree, in the face of the evidence that I said it,
> > then I did it.
>
> i agree that it's easy to emulate awk, but shouldn't we do something better in
> ruby?  i'm personally always inspired by ruby's elegance to write something
> better and more exstensible than something i could easily do in the
> shell/awk/perl/c/etc and find that, over the long run (say more than 3 days)
> i've found that my productivity increases in an exponential way if i simply
> embrace ruby's power to write clear and re-usable code and code it right 'the
> first time.'  imho it's a shame to write throw-away scripts in ruby.
>
> here's what i've got so far:  the concept is that each line may contain 'n'
> columns of numbers, which is to say the input is not a simple list of numbers,
> but a list of __rows__ of numbers: a table.  any non-numeric data is ignored,
> eliminating the need to grep out crud.  also, integer arithmitic is attempted
> where possible but the code falls back to floats when needed.  all numeric
> input must be valid - no use of #to_i or #to_f, preferring Integer() and
> Float().  the code abstracts all of the input, computation, and output
> functions and is user-extensible via the use of duck-typed filters.  it's also
> usable both as a library or from the command-line
>
>
> first some examples of usage:
>
>
>    mussel:~/eg/ruby/listc > cat input.a
>    1
>    2
>    3
>
>    mussel:~/eg/ruby/listc > ./listc sum < input.a
>    6
>
>    mussel:~/eg/ruby/listc > ./listc mean < input.a
>    2.0
>
>
>    mussel:~/eg/ruby/listc > cat input.b
>    1 2
>    3 4
>    5 6
>
>    mussel:~/eg/ruby/listc > ./listc median < input.b
>    3.0 4.0
>
>
>    mussel:~/eg/ruby/listc > cat input.c
>    foo 1 bar 2
>    a 3 b 4
>    x 5 y 6
>
>    mussel:~/eg/ruby/listc > ./listc minmax < input.c
>    1:5 2:6
>
>    mussel:~/eg/ruby/listc > ./listc min < input.c
>    1 2
>
>    mussel:~/eg/ruby/listc > ./listc max < input.c
>    5 6
>
>
>
>    mussel:~/eg/ruby/listc > cat input.d
>    ---
>    -
>      elapsed : 770.1453289
>    -
>      elapsed : 620.9993257
>    -
>      elapsed : 1440.629573
>
>    mussel:~/eg/ruby/listc > ./listc mean < input.d
>    943.924742533333
>
>
>
> now the code (i'm not golfing, for you non-vim users strange markers are
> 'folds': those lines appear as one single line to me):
>
>
>    mussel:~/eg/ruby/listc > cat ./listc
>    #! /usr/bin/env ruby
>
>    class Main
>    #--{{{
>      OPS = %w( sum add mean avg median max min minmax )
>
>      def main
>        op = ARGV.shift.to_s.strip.downcase
>
>        klass =
>          case op
>            when 'sum', 'add'
>              SumFilter
>            when 'mean', 'avg'
>              MeanFilter
>            when 'median'
>              MedianFilter
>            when 'minmax'
>              MinMaxFilter
>            when 'max'
>              MaxFilter
>            when 'min'
>              MinFilter
>            else
>              abort "bad op <#{ op }> not in <#{ OPS.join ',' }>"
>          end
>
>        filter = klass.new
>
>        $stdin.each{|line| filter << line}
>
>        filter.result >> $stdout
>      end
>    #--}}}
>    end
>
>    def Main(*a, &b) Main.new(*a, &b).main end
>
>    module FilterUtils
>    #--{{{
>      def extract_numbers line
>        fields = line.strip.split(%r/\s+/)
>        fields.map{|f| Integer(f) rescue Float(f) rescue nil}.compact
>      end
>
>      class List < Array
>        def >> port = STDOUT
>          port << join(' ')
>          port << "\n"
>        end
>        def self.from other
>          new.instance_eval{ replace other; self }
>        end
>      end
>      def new_list l = nil
>        l ? (List === l ? l : List.from(l)) : List.new
>      end
>
>      class MultiList < Array
>        def >> port = STDOUT
>          port << map{|elem| elem.join(':')}.join(' ')
>          port << "\n"
>        end
>        def self.from other
>          new.instance_eval{ replace other; self }
>        end
>      end
>      def new_multilist ml = nil
>        ml ? (MultiList === ml ? ml : MultiList.from(ml)) : MultiList.new
>      end
>    #--}}}
>    end
>
>    class SumFilter
>    #--{{{
>      include FilterUtils
>      attr 'sum'
>      def initialize
>        @sum = new_list
>      end
>      def << line
>        numbers = extract_numbers line
>        numbers.each_with_index do |n,i|
>          @sum[i] ||= 0
>          @sum[i] += n
>        end
>      end
>      def result
>        @sum
>      end
>    #--}}}
>    end
>
>    class MeanFilter
>    #--{{{
>      include FilterUtils
>      attr 'sum'
>      attr 'count'
>      def initialize
>        @sum = new_list
>        @count = new_list
>      end
>      def << line
>        numbers = extract_numbers line
>        numbers.each_with_index do |n,i|
>          @sum[i] ||= 0
>          @count[i] ||= 0
>          @sum[i] += n
>          @count[i] += 1
>        end
>      end
>      def result
>        mean = new_list
>        @sum.zip(@count){|s,c| mean << (s.to_f/c.to_f)}
>        mean
>      end
>    #--}}}
>    end
>
>    class MedianFilter
>    #--{{{
>      include FilterUtils
>      attr 'min'
>      attr 'max'
>      def initialize
>        @min = new_list
>        @max = new_list
>      end
>      def << line
>        numbers = extract_numbers line
>        numbers.each_with_index do |n,i|
>          @min[i] ||= n
>          @min[i] = [ @min[i], n ].min
>          @max[i] ||= n
>          @max[i] = [ @max[i], n ].max
>        end
>      end
>      def result
>        median = new_list
>        @min.zip(@max){|mi,ma| median << (mi + ((ma - mi)/2.0))}
>        median
>      end
>    #--}}}
>    end
>
>    class MinMaxFilter
>    #--{{{
>      include FilterUtils
>      attr 'min'
>      attr 'max'
>      def initialize
>        @minmax = new_multilist
>      end
>      def << line
>        numbers = extract_numbers line
>        numbers.each_with_index do |n,i|
>          @minmax[i] ||= [n,n]
>          @minmax[i][0] = [ @minmax[i][0], n ].min
>          @minmax[i][1] = [ @minmax[i][1], n ].max
>        end
>      end
>      def result
>        @minmax
>      end
>    #--}}}
>    end
>
>    class MinFilter < MinMaxFilter
>    #--{{{
>      def result
>        new_list @minmax.map{|minmax| [minmax.first]}
>      end
>    #--}}}
>    end
>
>    class MaxFilter < MinMaxFilter
>    #--{{{
>      def result
>        new_list @minmax.map{|minmax| [minmax.last]}
>      end
>    #--}}}
>    end

Here's another short version.  This can handle very large files.


ops = {
  :sum,    proc{|cum,cur| (cum||0) + cur},
  :mean,   [ proc{|cum,cur| (cum||0) + cur},
             proc{|cum,count| Float(cum)/count} ],
  :min,    proc{|cum,cur| cum ? [cum,cur].min : cur},
  :max,    proc{|cum,cur| cum ? [cum,cur].max : cur},
  :minmax, proc{|cum,cur|
             cum ? [[cum[0],cur].min, [cum[1],cur].max] : [cur,cur] }
}
ops[:add] = ops[:sum]
ops[:avg] = ops[:mean]

op = ops[ ARGV.shift.to_sym ] or
  abort "op not in #{ops.map{|k,v| k}.join(',')}"

count = 0; cumulative = nil; values = []
ARGF.each_line{|line|   count += 1
  values = line.split.map{|s|
    Integer(s) rescue Float(s) rescue nil}.compact
  cumulative ||= [nil] * values.size
  values.each_with_index{|val,i|
    cumulative[i] = Array(op)[0].call( cumulative[i], val ) }
}
puts cumulative.map{|x| op.class==Array ? op[1].call(x,count) : x}.
  map{|x| Array(x).join(":")}.join(" ")