On 11.08.2007 06:19, Ryan Davis wrote: > > On Aug 10, 2007, at 13:54 , William James wrote: > >> On Aug 10, 1:29 pm, Frank Meyer <lolz.ll... / gmail.com> wrote: >>> I've written a little ruby program which can sort logfiles with the >>> following format: >>> >>> 4.text text text >>> 1.text text text >>> 2.text text text >>> 10.text text text >>> 2.text2 text2 text2 >> ... >> File.open( ARGV.first, "r+" ){|file| >> array = file.readlines >> file.rewind >> file.truncate(0) >> file.puts array.sort_by{|s| s[/^\d+/].to_i } >> } > > your version takes a lot of memory, is slow, and doesn't properly sort > the content of the line, just the number. swap the two "2." lines and > you'll see what I mean. Using the right tool for the job (`sort`) does > wonders: > > % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m = > rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }' > % cp blah.txt blah2.txt > % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array = > file.readlines; file.rewind; file.truncate(0); file.puts > array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt > real 0m8.182s ... > % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" > > "#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt > real 0m3.175s ... > % cmp blah.txt blah2.txt > blah.txt blah2.txt differ: char 50, line 3 > % head blah.txt blah2.txt > ==> blah.txt <== > 3. file4 file4 file4 > 4. file4 file4 file4 > 6. file3 file3 file3 > 6. file1 file1 file1 > 6. file0 file0 file0 > 7. file0 file0 file0 > 7. file4 file4 file4 > 8. file1 file1 file1 > 8. file3 file3 file3 > 8. file3 file3 file3 > > ==> blah2.txt <== > 3. file4 file4 file4 > 4. file4 file4 file4 > 6. file0 file0 file0 > 6. file1 file1 file1 > 6. file3 file3 file3 > 7. file0 file0 file0 > 7. file4 file4 file4 > 8. file1 file1 file1 > 8. file3 file3 file3 > 8. file3 file3 file3 > 532 % It's a one liner: ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file Less memory usage: ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=> b[/^\d+/].to_i}' file Kind regards robert