On 4/1/06, rati_lion / yahoo.com <rati_lion / yahoo.com> wrote:
> Hi all
>
> I am new to Ruby. Found it as an intersting language. Can anyone help
> me with a simple code in Ruby to check for all the dead and live links
> in a website ?
>
> Thanks
> Rati
>
>
>

A wonderful language, a sort of rough request here is some (very)
simple code to get you started, using some great ruby libraries,
including rubyful soup (http://www.crummy.com/software/RubyfulSoup/),
which is available as a gem. As written it only checks one page, you
would need to make it "walk" the links to recursively check a whole
site.

Hope it helps
pth

require 'open-uri'
require 'uri'
require 'rubyful_soup'

url = 'http://www.yahoo.com/'
uri = URI.parse(url)
html = open(uri).read
soup = BeautifulSoup.new(html)

#Search the soup
links = soup.find_all('a').map { |a| a['href'] }

# Remove javascript
links.delete_if { |href| href =~ /javascript/ }

links.each do |l|
  # resolve relative paths (there is probably a better way)
  link = URI.parse(l)
  link.scheme = 'http' unless link.scheme
  link.host = uri.host unless link.host
  link.path = uri.path + link.path unless link.path[0] == ?/
  link = URI.parse(link.to_s)

  # check the link
  begin
    open(link).read
    # if we made it here the link is probably good
  rescue Exception => e
    puts "#{link.to_s}: #{e.to_s}"
  end
end