------ art_25719_2024293.1190224966082 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 9/19/07, Chuck Dawit <chuckdawit / gmail.com> wrote: > > > > I submitted a post a few days ago about scraping the web for Cisco > products. I didn't receive that much input so I thought I would ask > again. Here are the requirments. I have a list of 2000 urls that all > have Cisco in its domain name. > (ex. http://www.soldbycisco.net > http://www.ciscoindia.net > http://www.ciscobootcamp.net > http://www.cisco-guy.net > > and I want to scrape through them and determine which websites are > selling new cisco products, I'm actually looking for around 20 or so > products (ex. WIC-1T, NM-4E, WS-G2950-24). One idea I was given was to > split the pages into ones with forms and those without forms. Those > without forms probably wont have anything for sale so I can eliminate > those. But then I really don't know how to handle after that. Does > anyone have a different/better approach? Any help would be appreciated. > -- > Posted via http://www.ruby-forum.com/. > > Not to make your problem worse but you will need to differentiate between new and used equipment too. -- "Hey brother Christian with your high and mighty errand, Your actions speak so loud, I can't hear a word you're saying." -Greg Graffin (Bad Religion) ------ art_25719_2024293.1190224966082--