On Aug 11, 7:15 am, Robert Klemme <shortcut... / googlemail.com> wrote:
> On 11.08.2007 06:19, Ryan Davis wrote:
>
>
>
>
>
> > On Aug 10, 2007, at 13:54 , William James wrote:
>
> >> On Aug 10, 1:29 pm, Frank Meyer <lolz.ll... / gmail.com> wrote:
> >>> I've written a little ruby program which can sort logfiles with the
> >>> following format:
>
> >>> 4.text text text
> >>> 1.text text text
> >>> 2.text text text
> >>> 10.text text text
> >>> 2.text2 text2 text2
> >> ...
> >> File.open( ARGV.first, "r+" ){|file|
> >>   array = file.readlines
> >>   file.rewind
> >>   file.truncate(0)
> >>   file.puts array.sort_by{|s| s[/^\d+/].to_i }
> >> }
>
> > your version takes a lot of memory, is slow, and doesn't properly sort
> > the content of the line, just the number. swap the two "2." lines and
> > you'll see what I mean. Using the right tool for the job (`sort`) does
> > wonders:
>
> > % ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
> > rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
> > % cp blah.txt blah2.txt
> > % time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
> > file.readlines; file.rewind; file.truncate(0); file.puts
> > array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
> > real    0m8.182s ...
> > % time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
> > "#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
> > real    0m3.175s ...
> > % cmp blah.txt blah2.txt
> > blah.txt blah2.txt differ: char 50, line 3
> > % head blah.txt blah2.txt
> > ==> blah.txt <==
> > 3. file4 file4 file4
> > 4. file4 file4 file4
> > 6. file3 file3 file3
> > 6. file1 file1 file1
> > 6. file0 file0 file0
> > 7. file0 file0 file0
> > 7. file4 file4 file4
> > 8. file1 file1 file1
> > 8. file3 file3 file3
> > 8. file3 file3 file3
>
> > ==> blah2.txt <==
> > 3. file4 file4 file4
> > 4. file4 file4 file4
> > 6. file0 file0 file0
> > 6. file1 file1 file1
> > 6. file3 file3 file3
> > 7. file0 file0 file0
> > 7. file4 file4 file4
> > 8. file1 file1 file1
> > 8. file3 file3 file3
> > 8. file3 file3 file3
> > 532 %
>
> It's a one liner:
>
> ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

It's my understanding that when you use -i, a temporary file
is created, the original file is deleted, and the temporary
file is renamed.  Doesn't this cause unnecessary disk
fragmentation?

>
> Less memory usage:
>
> ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
> b[/^\d+/].to_i}' file

Of course, you're trading speed for memory.

>
> Kind regards
>
>         robert