-------- Original-Nachricht --------
> Datum: Tue, 13 Nov 2007 22:21:14 +0900
> Von: Jean Nibee <theopensourceguy / gmail.com>
> An: ruby-talk / ruby-lang.org
> Betreff: Open URI & web scraping. Part II

> Hi
> 
> (short form of a post I made yesterday that got no love, I suspect it'
> sbecuase I was long winded)
> 
> Nutshell if I use open URI (and Hpricot) to download a web page and
> 'scrape' all the images to write them to my local disk dynamic images
> always have improper format (Size 0) but static images are fine.
> 
> Example would be : <img
> src="http://myserver:8080/Someservlet?name=blah&param=value&etc=etc">
> 
> Whether I copy/paste this URL in another browser or use open URI to
> "get" the image I get an an error of:
> 
> XML Parsing Error: no element found
> Location: http://myserver:8080/Someservlet?name=blah&param=value&etc=etc
> Line Number 1, Column 1:
> 
> BUT, this image is displayed PERFECTLY in the html.
> 
> How can I get this image to download? (I suspect it's the mime type
> being set on the server side but I am not 100% sure)
> 
> ***
> OUTPUT
> ***
> [[URI information...]]
> Fetched document:
> http://myserver:8080/Someservlet?name=blah&param=value&etc=etc
> Content Type: application/voicexml+xml
> Charset:
> Content-Encoding:
> Last Modified:
> IMAGE INFO!!! ->
> Writing to file ::
> D:\sandbox\auto_attendant\archive_reports\trunk\dumps\1194882652_854.gif
> 
> Thanks for your help.
> -- 
> Posted via http://www.ruby-forum.com/.

Dear Jean,

maybe you can use ruby's rio (http://rio.rubyforge.org/) to download
an entire website. I'm thinking in particular of the examples
given in 
http://rio.rubyforge.org/classes/RIO/Doc/INTRO.html under the
headers 

"Creating a Rio that refers to a web page" and
"Creating a Rio that refers to a file or directory on a FTP server".

Otherwise, maybe you get better responses on the Rails mailing list ?

Best regards,

Axel 







-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal f Modem und ISDN: http://www.gmx.net/de/go/smartsurfer