pkellner wrote: > I was really hoping for some code or pseudo code. I'm new to ruby and > have been thrashing over this for hours. I promise to put some back > later when I know more about this. (and sadly, I'm not a regular > expression wizard) I use WWW::Mechanize to slurp down numerous CafePress shop pages and snarf out the img info, which I use to automagically create the product pages for rubystuff.com. The code sample here is a much simplified version. Mechanize lets you use custom classes to encapsulate node types, which in turn makes it simpler to manipulate assorted HTML elements. I need to extract assorted data from image URLs, so I coded up some additional trickery not shown here. Also note that some sites reject bots, spiders, etc. when the declared user-agent is not something acceptable. Hence the random selection from UA here. #!/usr/local/bin/ruby require 'mechanize' UA = [ 'Windows IE 6' , 'Windows Mozilla', 'Mac Safari' , 'Mac Mozilla' , 'Linux Mozilla', 'Linux Konqueror' ] # Wrap certain nodes in an Img class to make # node attribute access a bit easier to grok. class Img attr_reader :alt, :src def initialize( node ) @node = node @alt = '' @src = '' if @node.attributes[ 'alt' ] @alt = @node.attributes[ 'alt' ].to_s.strip end if @node.attributes[ 'src' ] @src = @node.attributes[ 'src' ].to_s.strip end end end # Now with Rails tote bags and thongs and stuff! url = 'http://www.cafepress.com/rubyonrailsshop' agent = WWW::Mechanize.new {|a| a.log = Logger.new( STDERR ) } agent.user_agent_alias = UA[ rand( UA.size - 1 ) ] # This tells Mechanize to watch for certain elements, and # map matching nodes to the keyed class. Here, when an img # element is encountered, mechanize will use the node to create # an Img object and store it for us. agent.watch_for_set = { 'img' => Img } page = agent.get( url ) # Get the watch items we're interested in images = page.watches[ 'img' ] # What did we get? images.each do |img| p img.src end #---------------- Hope this helps. Get Mechanize from rubyforge.org, from the wee project page. http://rubyforge.org/projects/wee/ James Britt -- http://www.ruby-doc.org - The Ruby Documentation Site http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML http://www.rubystuff.com - The Ruby Store for Ruby Stuff http://www.jamesbritt.com - Playing with Better Toys