ara.t.howard / noaa.gov wrote: > On Sat, 2 Dec 2006, Paul Lutus wrote: > > > Joel VanderWerf wrote: > > > >> Paul Lutus wrote: > >>> ara.t.howard / noaa.gov wrote: > >>> > >>>> for years i've felt that i should be able to pipe numerical output into > >>>> some unix command like so > >>>> > >>>> cat list | mean > >>>> cat list | sum > >>>> cat list | minmax > >>>> > >>>> etc. > >>>> > >>>> and have never found one. right now i'm building a ruby version - > >>>> before i continue, does anyone know a standard unix or ruby version of > >>>> this? > >>> > >>> It is so easy to create in Ruby, a matter of minutes, that it is not > >>> terribly important to do the search you are suggesting. > >> > >> Disagree. > > > > It's a bit too late to disagree, in the face of the evidence that I said it, > > then I did it. > > i agree that it's easy to emulate awk, but shouldn't we do something better in > ruby? i'm personally always inspired by ruby's elegance to write something > better and more exstensible than something i could easily do in the > shell/awk/perl/c/etc and find that, over the long run (say more than 3 days) > i've found that my productivity increases in an exponential way if i simply > embrace ruby's power to write clear and re-usable code and code it right 'the > first time.' imho it's a shame to write throw-away scripts in ruby. > > here's what i've got so far: the concept is that each line may contain 'n' > columns of numbers, which is to say the input is not a simple list of numbers, > but a list of __rows__ of numbers: a table. any non-numeric data is ignored, > eliminating the need to grep out crud. also, integer arithmitic is attempted > where possible but the code falls back to floats when needed. all numeric > input must be valid - no use of #to_i or #to_f, preferring Integer() and > Float(). the code abstracts all of the input, computation, and output > functions and is user-extensible via the use of duck-typed filters. it's also > usable both as a library or from the command-line > > > first some examples of usage: > > > mussel:~/eg/ruby/listc > cat input.a > 1 > 2 > 3 > > mussel:~/eg/ruby/listc > ./listc sum < input.a > 6 > > mussel:~/eg/ruby/listc > ./listc mean < input.a > 2.0 > > > mussel:~/eg/ruby/listc > cat input.b > 1 2 > 3 4 > 5 6 > > mussel:~/eg/ruby/listc > ./listc median < input.b > 3.0 4.0 > > > mussel:~/eg/ruby/listc > cat input.c > foo 1 bar 2 > a 3 b 4 > x 5 y 6 > > mussel:~/eg/ruby/listc > ./listc minmax < input.c > 1:5 2:6 > > mussel:~/eg/ruby/listc > ./listc min < input.c > 1 2 > > mussel:~/eg/ruby/listc > ./listc max < input.c > 5 6 > > > > mussel:~/eg/ruby/listc > cat input.d > --- > - > elapsed : 770.1453289 > - > elapsed : 620.9993257 > - > elapsed : 1440.629573 > > mussel:~/eg/ruby/listc > ./listc mean < input.d > 943.924742533333 > > > > now the code (i'm not golfing, for you non-vim users strange markers are > 'folds': those lines appear as one single line to me): > > > mussel:~/eg/ruby/listc > cat ./listc > #! /usr/bin/env ruby > > class Main > #--{{{ > OPS = %w( sum add mean avg median max min minmax ) > > def main > op = ARGV.shift.to_s.strip.downcase > > klass = > case op > when 'sum', 'add' > SumFilter > when 'mean', 'avg' > MeanFilter > when 'median' > MedianFilter > when 'minmax' > MinMaxFilter > when 'max' > MaxFilter > when 'min' > MinFilter > else > abort "bad op <#{ op }> not in <#{ OPS.join ',' }>" > end > > filter = klass.new > > $stdin.each{|line| filter << line} > > filter.result >> $stdout > end > #--}}} > end > > def Main(*a, &b) Main.new(*a, &b).main end > > module FilterUtils > #--{{{ > def extract_numbers line > fields = line.strip.split(%r/\s+/) > fields.map{|f| Integer(f) rescue Float(f) rescue nil}.compact > end > > class List < Array > def >> port = STDOUT > port << join(' ') > port << "\n" > end > def self.from other > new.instance_eval{ replace other; self } > end > end > def new_list l = nil > l ? (List === l ? l : List.from(l)) : List.new > end > > class MultiList < Array > def >> port = STDOUT > port << map{|elem| elem.join(':')}.join(' ') > port << "\n" > end > def self.from other > new.instance_eval{ replace other; self } > end > end > def new_multilist ml = nil > ml ? (MultiList === ml ? ml : MultiList.from(ml)) : MultiList.new > end > #--}}} > end > > class SumFilter > #--{{{ > include FilterUtils > attr 'sum' > def initialize > @sum = new_list > end > def << line > numbers = extract_numbers line > numbers.each_with_index do |n,i| > @sum[i] ||= 0 > @sum[i] += n > end > end > def result > @sum > end > #--}}} > end > > class MeanFilter > #--{{{ > include FilterUtils > attr 'sum' > attr 'count' > def initialize > @sum = new_list > @count = new_list > end > def << line > numbers = extract_numbers line > numbers.each_with_index do |n,i| > @sum[i] ||= 0 > @count[i] ||= 0 > @sum[i] += n > @count[i] += 1 > end > end > def result > mean = new_list > @sum.zip(@count){|s,c| mean << (s.to_f/c.to_f)} > mean > end > #--}}} > end > > class MedianFilter > #--{{{ > include FilterUtils > attr 'min' > attr 'max' > def initialize > @min = new_list > @max = new_list > end > def << line > numbers = extract_numbers line > numbers.each_with_index do |n,i| > @min[i] ||= n > @min[i] = [ @min[i], n ].min > @max[i] ||= n > @max[i] = [ @max[i], n ].max > end > end > def result > median = new_list > @min.zip(@max){|mi,ma| median << (mi + ((ma - mi)/2.0))} > median > end > #--}}} > end > > class MinMaxFilter > #--{{{ > include FilterUtils > attr 'min' > attr 'max' > def initialize > @minmax = new_multilist > end > def << line > numbers = extract_numbers line > numbers.each_with_index do |n,i| > @minmax[i] ||= [n,n] > @minmax[i][0] = [ @minmax[i][0], n ].min > @minmax[i][1] = [ @minmax[i][1], n ].max > end > end > def result > @minmax > end > #--}}} > end > > class MinFilter < MinMaxFilter > #--{{{ > def result > new_list @minmax.map{|minmax| [minmax.first]} > end > #--}}} > end > > class MaxFilter < MinMaxFilter > #--{{{ > def result > new_list @minmax.map{|minmax| [minmax.last]} > end > #--}}} > end Here's another short version. This can handle very large files. ops = { :sum, proc{|cum,cur| (cum||0) + cur}, :mean, [ proc{|cum,cur| (cum||0) + cur}, proc{|cum,count| Float(cum)/count} ], :min, proc{|cum,cur| cum ? [cum,cur].min : cur}, :max, proc{|cum,cur| cum ? [cum,cur].max : cur}, :minmax, proc{|cum,cur| cum ? [[cum[0],cur].min, [cum[1],cur].max] : [cur,cur] } } ops[:add] = ops[:sum] ops[:avg] = ops[:mean] op = ops[ ARGV.shift.to_sym ] or abort "op not in #{ops.map{|k,v| k}.join(',')}" count = 0; cumulative = nil; values = [] ARGF.each_line{|line| count += 1 values = line.split.map{|s| Integer(s) rescue Float(s) rescue nil}.compact cumulative ||= [nil] * values.size values.each_with_index{|val,i| cumulative[i] = Array(op)[0].call( cumulative[i], val ) } } puts cumulative.map{|x| op.class==Array ? op[1].call(x,count) : x}. map{|x| Array(x).join(":")}.join(" ")