On Tue, Oct 9, 2012 at 6:45 PM, Sybren Kooistra <lists / ruby-forum.com> wrote:
> Hi all,
>
> I have constructed a code that opens all urls in a textfile one by one,
> parses through them and finally saves the results into an excel file.
>
> When I run the code on a textfile with just a few urls, it works
> perfectly.
> When i run the code on a textfile with many thousands of urls, I get an
> error ("in 'intitialize': getaddrinfo: Name or service not known
> (SocketError)"). What might be causing the issue?
>
> CODE:
> require 'nokogiri'
> require 'open-uri'
> require 'rubygems'
> require 'writeexcel'
>
> workbook = WriteExcel.new('parseresult.xlsx')
> worksheet = workbook.add_worksheet
> row = 0
>
> File.foreach("websites.txt") do |line| #loop on basis urls textfile
>
> searchablefile = Nokogiri::HTML(open(line)) #open each url
>
> #creation of variables
> referentieid = searchablefile.at_xpath("//td/strong[contains(text(),
> 'Referentie')]/parent::*/following-sibling::*")
> status = searchablefile.at_xpath("//td/strong[contains(text(),
> 'Status')]/parent::*/following-sibling::*")
>
> unless searchablefile.at_xpath("//td/strong[contains(text(),
> 'Referentie')]/parent::*/following-sibling::*").nil?
> worksheet.write(row, 1, referentieid.content)
> end
> unless searchablefile.at_xpath("//td/strong[contains(text(),
> 'Status')]/parent::*/following-sibling::*").nil?
> worksheet.write(row,  2, status.content)
> end
> row += 1 #next row for next url
> end
> workbook.close
>
> ERROR:
> wadiem@wadiem-TECRA-A2:~$ ruby directerubyparsewoningmarkt.rb
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
> `initialize': getaddrinfo: Name or service not known (SocketError)
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
> `open'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
> `block in connect'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/timeout.rb:44:in
> `timeout'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/timeout.rb:89:in
> `timeout'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:644:in
> `connect'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:637:in
> `do_start'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/net/http.rb:626:in
> `start'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:306:in
> `open_http'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:769:in
> `buffer_open'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:203:in
> `block in open_loop'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in
> `catch'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in
> `open_loop'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:146:in
> `open_uri'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:671:in
> `open'
>   from
> /home/wadiem/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:33:in
> `open'
>   from directerubyparsewoningmarkt.rb:12:in `block in <main>'
>   from directerubyparsewoningmarkt.rb:10:in `foreach'
>   from directerubyparsewoningmarkt.rb:10:in `<main>
>
> Thanks a bunch.
>
> --
> Posted via http://www.ruby-forum.com/.
>

1.9.2p290 :001 > require 'open-uri'
 => true
1.9.2p290 :002 > open("http://ldfmldmflasfmkdfm")
SocketError: getaddrinfo: Name or service not known

There's probably a wrong URL in that file. Can you print it before opening it?

Jesus.