On Mon, May 19, 2003 at 11:41:42AM +0900, Mark Wilson wrote:
> I have written the following program, which works but is very slow:
> 
> #!/usr/local/bin/ruby
> 
> a=File.open("smaller_file.txt","r")
> b=File.open("big_file.txt","r")
> c=File.open("expanded_file.txt","a")
> 
> a.each { |file_name|
> 
>   b.rewind
> 
>   b.each {|path|
> 
>     c.puts(path) if path.include?("#{file_name}")
>   }
> }
> 
> a.close
> b.close
> c.close

It's O(n^2)!!!
If I got it right you need not rewind, for the following holds:
(notation: F(file_name) == position of first record in big_file
corresponding to file_name)

    next file_name > file_name ==> F(next file_name) > F(file_name)

So you can make it in a single pass.

The way I see it, your problem is just iterating in both files in
parallel, which is however easily done with an external iterator:

(untested)

catch(:finished) do

while file_name = a.gets
  catch(:next) do
    loop do
      throw :finished unless big_line = b.gets
      big_line =~ %r{/([^/]+)}
      name = $1
      if name > file_name 
        b.seek( -(big_line.size), SEEK_CUR)
        throw :next
      end
      c.puts big_line if name == file_name
    end
  end
end

end
  
This should be O(n).

-- 
 _           _                             
| |__   __ _| |_ ___ _ __ ___   __ _ _ __  
| '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \ 
| |_) | (_| | |_\__ \ | | | | | (_| | | | |
|_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_|
	Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

-  long    f_ffree;    /* free file nodes in fs */
+  long    f_ffree;    /* freie Dateiknoten im Dateisystem */
	-- Seen in a translation