The top link on del.icio.us is a site with all the Calvin + Hobbes
strips. I thought I'd download them before they get taken down. Here's
the code if you want, it's very short. Newer people can also see how
easy it is to use open-uri for simple web scraping.

Before running this consider buying the comics -- what is your
motivation to avoid paying for them? If it's bad, don't do it. (I own
them all in paper already and want an electronic version.) Also create
the c+h_archives folder or change the output path. FYI the images total
about 112 megs. There's 3691 of them.

Code below or here: http://pastie.caboo.se/88946

require "open-uri"

base_url = "http://www.marcellosendos.ch/comics/ch/"

open("http://www.marcellosendos.ch/comics/ch/index.html") do |index|
  index.read.scan(/A href="(1.+?)"\>/).each do |archive_page_link|
    archive_page_link = base_url + archive_page_link[0]
    base_image_url = archive_page_link.gsub(/\/\w+\.\w+$/, "/")
    open(archive_page_link) do |archive_page|
      archive_page.read.scan(/src="(.+?\.gif)"\>/).each do |img|
        img_url = base_image_url + img[0]
        begin
          open(img_url) do |image_file|
            File.open("c+h_archives/#{img[0]}", "w") do |local_file|
              local_file.write(image_file.read)
            end
          end
        rescue Exception => e
          # there's five broken image links
          puts "failed to get #{img_url}"
        end
      end
    end
  end
end
-- 
Posted via http://www.ruby-forum.com/.