On Nov 12, 9:48   
> I'm writing this command-line ruby script and it needs to be able to
> submit a search string and get back google result links. Remarkably, I
> google this subject and I am finding Google APIs to do every possibly
> thing imaginable except for this. The only thing I found was Goose which
> is apparently based on an deprecated API.
>
> I tried just looking at the actual HTML at the google home page, but
> that is the most nasty mess of web code I've ever seen. Please tell me
> somebody else has reversed engineered it so I don't have to...
>
> --
> Posted viahttp://www.ruby-forum.com/.

An approach which can prove handy is to "screen scrape" the data
from the HTML. One of the easiest ways is with Firefox with the
Firebug add-on installed. With Firebug, you can inspect the elements
on the page, and view *formatted* source.

After you figure out how the data you are looking for is tagged,
or can be located, there are Ruby tools like Hpricot and Nokogiri
which allow one to quickly throw together an extraction routine.

For example, a few minutes ago, I did a Google search on "helium high
voice",
and came up with a few lines of code to extract the first page of
links
as follows:

1. I inspected the links, and found that they all seem to have
'class="l"'.
2. I copied the ugly source from a source-view window, and pasted it
into
scite (any editor would do), but in scite it's easy to view changes in
output
as you experiment.
3. I opened up a few lines, and pasted the HTML source under an
__END__
tag, which makes it available as the 'DATA' pseudo file.
4. I tried a couple of things using Nokigiri, and found something that
seemed
to work.

The code:

  # coding: utf-8
  require 'nokogiri'

  html_doc = Nokogiri::HTML(DATA.read)
  puts html_doc.css("a.l").collect{|el| el.attribute("href") }

  __END__
  (the ugly HTML page source goes here)

The output:

  http://www.answerbag.com/q_view/1420
  http://www.straightdope.com/columns/read/1803/why-does-helium-make-your-voice-squeaky
  http://en.wikipedia.org/wiki/Helium
  http://answers.yahoo.com/question/index?qid=20060606123434AAxjX5A
  http://blog.sciencegeekgirl.com/2009/03/26/myth-helium-makes-your-voice-high-pitched/
  http://wiki.answers.com/Q/Why_does_helium_make_your_voice_go_high
  http://ilovebacteria.com/helium.htm
  http://www.hrwiki.org/wiki/helium
  http://www.helium.com/items/1905495-why-does-helium-make-your-voice-squeaky
  http://www.youtube.com/watch?v=Pq8sCwWEG9k
  http://www.youtube.com/watch?v=MiZALF1VZe4

For production, just build the query and retrieve the page directly
to build the array of URLs.

Since there is no guarantee that Google won't tweak its technique
and break this particular code, having a very high level method
of page-scraping means that it wouldn't be hard to adjust. Moreover,
this technique can be used in many situations, and once you've done
a few sites, you'll find most applications are as easy as parsing XML
or adapting JSON from "data only" API's. After all, you get to see
exactly what data is available on the pages, which may include useful
things that an API might not make available.