On Nov 9, 10:51 ¨Âí¬ ·óôõ­­ ¼ââøø·¸¹ß°®®®Àùáèïï®ãïí¾ ÷òïôåº
> Mark Thomas wrote:
>
> > As I just posted in another message, it works for me. I wonder what's
> > different about my environment. Are you using Nokogiri 1.4.0?
>
> Yes, however I get a warning message that informs me that I'm using an
> outdated version of libxml2:
>
> $ ruby -v
> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.11.1]
>
> $ nokogiri -v
> HI.  ¨Âïõ§òõóéîìéâøíìöåòóéï²®¶®±¶ ÷èéãéó ïöåò ùåáòïìáî> has
> plenty of bugs.  ¨Âóõççåóôèáô æïíáøéíõÈÔÍ̯ØÍÌ ðáòóéîðìåáóõòå> you
> upgrade your version of libxml2 and re-install nokogiri.  ¨Âùïìéëå
> using
> libxml2 version 2.6.16, but don't like this warning, please define the
> constant
> I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 before requring
> nokogiri.
>
> /usr/local/lib/ruby/gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/xml/builder.rb:272:
> warning: parenthesize argument(s) for future version
> ---
> nokogiri: 1.4.0
> warnings: []
>
> libxml:
> compiled: 2.6.16
> loaded: 2.6.16
> binding: extension
>
> So it could be something with that, or maybe it has something to do with
> the fact that ruby 1.8.7 back ports some stuff from ruby 1.9.
> --
> Posted viahttp://www.ruby-forum.com/.

On Nov 9, 10:51 pm, 7stud -- <bbxx789_0... / yahoo.com> wrote:

- Hide quoted text -
- Show quoted text -
> Mark Thomas wrote:

> > As I just posted in another message, it works for me. I wonder what's
> > different about my environment. Are you using Nokogiri 1.4.0?

> Yes, however I get a warning message that informs me that I'm using an
> outdated version of libxml2:

> $ ruby -v
> ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.11.1]

> $ nokogiri -v
> HI.  You're using libxml2 version 2.6.16 which is over 4 years old and
> has
> plenty of bugs.  We suggest that for maximum HTML/XML parsing pleasure,
> you
> upgrade your version of libxml2 and re-install nokogiri.  If you like
> using
> libxml2 version 2.6.16, but don't like this warning, please define the
> constant
> I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 before requring
> nokogiri.

> /usr/local/lib/ruby/gems/1.8/gems/nokogiri-1.4.0/lib/nokogiri/xml/builder.rb:272:
> warning: parenthesize argument(s) for future version
> ---
> nokogiri: 1.4.0
> warnings: []

> libxml:
>   compiled: 2.6.16
>   loaded: 2.6.16
>   binding: extension

> So it could be something with that, or maybe it has something to do with
> the fact that ruby 1.8.7 back ports some stuff from ruby 1.9.
> --
> Posted viahttp://www.ruby-forum.com/.

OK, when I put Mark's code in a file and ran it (versus entering it in
a irb session) it DOES work. However, it doesn't capture the website
url, which 7stud's approach does. I haven't figure out how to do it
with this approach, and merely adding more items in xpaths doesn't
work.

So Mark, how can your approach be used to capture the url add the end
of the data section?

Here's the file I used with Mark's approach:

File: scrape1.rb
----------------------------
require 'rubygems'
require 'open-uri'
require 'nokogiri'

def scrape (id)

  id = id.to_s
  url = "http://www.xyz.org/../../..ID=#{id}"
  doc = Nokogiri::HTML.parse(open(url))

  prefix = '//div[@class="sectionHeaderText"]/following-sibling::'
  xpaths = {
   :name => "#{prefix}b/text()",
   :addr => "#{prefix}text()[2]",
   :citystzip => "#{prefix}text()[3]",
   :country => "#{prefix}text()[4]",
   :phone => "#{prefix}text()[5]",
   :web => "#{prefix}text()[6]",
   :url => "#{prefix}text()[7]"
  }

  results = {}
  xpaths.each do |data,xpath|
    results[data] = doc.search(xpath).to_s.gsub(/\n|\t|\r/,'').squeeze
(' ').strip
    puts "#{data} = " + results[data]
  end
  return results
end
------------------------------

And use as before:  info = scrape 1234