Just remember that with screen scraping, you are anticipating a file served by a file server, on top of that you are generally anticipating a very particular structure in that document. Web sites change frequently and without notice and even the smallest changes can blow out your scraper. So be very careful to inspect the various pages of sites you plan to scrape, and then try to write your scraper to check for things and not fail if it isn't found. With some clever programming and a little knowledge of the site, you can make a simple but smart scraper. However, it will still be pretty fragile. html/xhtml is just too loose and human-language like, full of ambiguity and implicit meaning that humans would get, but machines would work hard to fail at.