On Thu, 8 Nov 2012 01:07:41 +0900, Panagiotis Atmatzidis wrote:

> Hello,
> 
> I'm trying to retrieve search results from the internet using nokogiri and open-uri. Apparently 'open-uri' can't handle directly UTF-8. So I'm trying to convert the string in ASCII but still I come up with an error. Here isthe chunk of code:
> ----------------------------------------
> # encoding: UTF-8
>  
> require "nokogiri"
> require "open-uri"
> 
> word = "˦˦Ǧͦɦ"
> ascii_word = word.force_encoding("ASCII").to_s
> result = open("http://search.lycos.com/web?q=#{ascii_word}", "User-Agent" => "HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.47 S
> doc = Nokogiri::HTML(result)
> ----------------------------------------
> And the error I get is: 
> ----------------------------------------
> [...]:in `open': invalid byte sequence in US-ASCII (ArgumentError)
> 	from lycos.rb:8:in `<main>'
> ----------------------------------------
> 
> I'm on MacOSX ML, using ruby (rvm) 1.9.3 .
> 

As per RFC (2396?), you need to encode the non-asci bit, thusly:

#!/usr/bin/ruby
# encoding: UTF-8

require "nokogiri"
require "open-uri"

word = URI.encode("˦˦Ǧͦɦ")
result = open("http://search.lycos.com/web?q=#{word}",
              "User-Agent" =>
              "HTTP_USER_AGENT:Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.47")
doc = Nokogiri::HTML(result)
puts doc

-jh