"Mark Wilson" <mwilson13 / cox.net> wrote in message news:6A036FE6-89A3-11D7-BF36-000393876156 / cox.net...
> I want to do the following, using as efficiently as possible.
>
> I have a file with many lines (I'll call it 'big_file.txt').
>
> Each line of 'big_file.txt' has the following format:
>
> dir1/dir2/dir3/file_name.txt
>
> Every line in 'big_file.txt' is unique. 'dir1' and 'dir2' are the same
> for every line. 'dir3' and 'file_name.txt' have many values, but some
> lines have duplicate values for 'dir3' and some lines have duplicate
> values for 'file_name.txt'. In other words, 'big_file.txt' is a list of
> paths describing a file systems with directories that hold multiple
> files, and some files exist in more than one directory. 'big_file.txt'
> is about 100MB.
>
> I have another file with many lines (I'll call it 'smaller_file.txt').
> 'smaller_file.txt' has less lines than 'big_file.txt'.
>
> Each line of 'smaller_file.txt' has the following format:
>
> file_name.txt
>
> Every line in 'smaller_file.txt' is unique. The 'file_name.txt' strings
> in 'smaller_file.txt' are a subset of the 'file_name.txt' strings in
> 'big_file.txt'. 'smaller_file.txt' is about 20MB.
>
> 'smaller_file.txt' has been sorted. 'big_file.txt' has been sorted on
> the 'file_name.txt' substring (or field).
>
> I want to go through 'big_file.txt' and 'smaller_file.txt' and write
> output to a third file, 'expanded_file.txt', such output being the full
> paths (taken from 'big_file.txt') to the files listed in
> 'smaller_file.txt'.
>
> I have written the following program, which works but is very slow:
>
> #!/usr/local/bin/ruby
>
> a=File.open("smaller_file.txt","r")
> b=File.open("big_file.txt","r")
> c=File.open("expanded_file.txt","a")
>
> a.each { |file_name|
>
>    b.rewind
>
>    b.each {|path|
>
>      c.puts(path) if path.include?("#{file_name}")
>    }
> }
>
> a.close
> b.close
> c.close
>
> I realize that, because the files are sorted, I should not need to
> rewind 'big_file.txt' to the beginning of the file for every iteration,
> but I don't know how to tell the program how far back to rewind once it
> gets to the end of 'big_file.txt' and is ready to iterate with another
> file_name.
>
> Any thoughts, ideas, _code_, etc. would be welcome.
>
> Thank you.
>
>



b=File.open("big_file.txt","r")
c=$stdout

path = b.readline.chomp!  #  read ahead b

File.open("smaller_file.txt","r") do |a|
  a.each { |file_name|
    file_name.chomp!
    print "(", file_name, ")\n"
    begin
      loop do
        if File.basename(path) == file_name
          c.puts(path)
          path = b.readline.chomp!
        else
          break
        end
      end
    rescue EOFError
      puts "EOF big_file.txt"
    end
  }
end

b.close
#c.close


daz