"Mark Wilson" <mwilson13 / cox.net> wrote in message news:6A036FE6-89A3-11D7-BF36-000393876156 / cox.net... > I want to do the following, using as efficiently as possible. > > I have a file with many lines (I'll call it 'big_file.txt'). > > Each line of 'big_file.txt' has the following format: > > dir1/dir2/dir3/file_name.txt > > Every line in 'big_file.txt' is unique. 'dir1' and 'dir2' are the same > for every line. 'dir3' and 'file_name.txt' have many values, but some > lines have duplicate values for 'dir3' and some lines have duplicate > values for 'file_name.txt'. In other words, 'big_file.txt' is a list of > paths describing a file systems with directories that hold multiple > files, and some files exist in more than one directory. 'big_file.txt' > is about 100MB. > > I have another file with many lines (I'll call it 'smaller_file.txt'). > 'smaller_file.txt' has less lines than 'big_file.txt'. > > Each line of 'smaller_file.txt' has the following format: > > file_name.txt > > Every line in 'smaller_file.txt' is unique. The 'file_name.txt' strings > in 'smaller_file.txt' are a subset of the 'file_name.txt' strings in > 'big_file.txt'. 'smaller_file.txt' is about 20MB. > > 'smaller_file.txt' has been sorted. 'big_file.txt' has been sorted on > the 'file_name.txt' substring (or field). > > I want to go through 'big_file.txt' and 'smaller_file.txt' and write > output to a third file, 'expanded_file.txt', such output being the full > paths (taken from 'big_file.txt') to the files listed in > 'smaller_file.txt'. > > I have written the following program, which works but is very slow: > > #!/usr/local/bin/ruby > > a=File.open("smaller_file.txt","r") > b=File.open("big_file.txt","r") > c=File.open("expanded_file.txt","a") > > a.each { |file_name| > > b.rewind > > b.each {|path| > > c.puts(path) if path.include?("#{file_name}") > } > } > > a.close > b.close > c.close > > I realize that, because the files are sorted, I should not need to > rewind 'big_file.txt' to the beginning of the file for every iteration, > but I don't know how to tell the program how far back to rewind once it > gets to the end of 'big_file.txt' and is ready to iterate with another > file_name. > > Any thoughts, ideas, _code_, etc. would be welcome. > > Thank you. > > b=File.open("big_file.txt","r") c=$stdout path = b.readline.chomp! # read ahead b File.open("smaller_file.txt","r") do |a| a.each { |file_name| file_name.chomp! print "(", file_name, ")\n" begin loop do if File.basename(path) == file_name c.puts(path) path = b.readline.chomp! else break end end rescue EOFError puts "EOF big_file.txt" end } end b.close #c.close daz