Carlos Diaz wrote:
> However, I'm trying to automate this script to go out and look at the 
> contents of this directory and do something for each file in there.  For 
> example, I want to do something like this:
> 
> Dir.foreach("http://username:password / 172.16.1.1/logs") {|x| <do some 
> logic here>}
> 
> However, I'm not sure if there is a simple way to do something like this 
> in Ruby.  Anyone encountered this before?

I assume you're talking about the normal automatically-generated 
directory page, where Apache generates a list of files with links to 
each file. In which case...

require 'uri'
require 'open-uri'
require 'html/htmltokenizer'

class WebPage
   attr_reader :links # URLs of all links on page

   # Get a web page from a specified URL
   def get(url)
     @uri = URI.parse(url)
     open(url) {|result| @body = result.read }
   end

   # Parse the web page, extracting links
   def parse
     if !@body
       return
     end
     tokenizer = HTMLTokenizer.new(@body)
     @links = Array.new
     while tag = tokenizer.getTag('a')
       # Normalize to a full URL
       url = tag.attr_hash['href']
       uri = @uri.merge(url)
       @links.push(uri.to_s)
     end
   end
end

wp = WebPage.new
wp.get('http://www.example.com/')
wp.parse
for link in wp.links
   puts link
end

You'll find HTMLTokenizer at 
<URL:http://rubyforge.org/projects/htmltokenizer/>. You could also do it 
with REXML, of course, but the code would probably be a little harder to 
follow.

Making the above code robust to things like <a> elements with no href is 
left as an exercise for the reader :-)


mathew