Thanks to all who responded.

It looks like I could write some tiny perl scripts with mechanize and 
pipe their output to a ruby program so that I could do most of the work 
in ruby instead of perl.  Perl reminds me of work, and that is a bad 
thing.  Writing perl also doesn't teach me any ruby, which is an 
important part of this hack.

I also just found mention in the pickaxe book of the open-uri library 
that will allow me to grab lines from a URL.  This, I gather, gives me 
something roughly equivalent to piping the output of curl into a ruby 
program.

Thanks for the help guys, I think I'm armed with enough to be dangerous 
now.

jp


Ryan Leavengood wrote:
> On 3/26/06, Jeff Pritchard <jp / jeffpritchard.com> wrote:
>>
>> I was wondering if anyone could point me to some example code that is
>> using RubyfulSoup to parse a sitemap to get links to all the pages on
>> that site and request each page and grab things from it.
> 
> WWW::Mechanize makes this easy. The HTML parsing has been pretty
> robust in my experience. So far I've used it to scrape my library's
> web site to see when books are due and automatically renew them, as
> well as log into Cingular.com and get my mobile phone minutes. The
> library web-site has weird redirects and some other things that
> Mechanize handles great, and the Cingular has a weird multi-step login
> system that I got going as well without too much trouble.
> 
> When I needed support for check boxes in the form on the library
> web-site, the author of WWW::Mechanize, Michael Neumann, added them in
> less than 24 hours.
> 
> So anyhow, this is a slick library, and very useful.
> 
> Ryan


-- 
Posted via http://www.ruby-forum.com/.