Hello,

> Digression: when solving a problem like this, it is often much easier to
> write a few lines of HTML than to try to use a high-powered library to
> accomplish it.

I don't see why is it an advantage here. The first solution in this thread:

-------------------------------------------------------------------
Record = Struct.new("Record", :name, :date, :name_again, :some_num,
:buy_link, :some_num2, :letters, :price)
records = []

cells = Hpricot(doc)/"/table/tr/td"

cells.map { |elem| elem.inner_html }.each_slice(8) do |slice|
  records << Record.new(*slice)
end

p records.sort_by {|record| record.price.slice(1..record.size) }
------------------------------------------------------------------

is shorter, does not care about malformed HTML and even does the sorting
which I believe was the main intention of the OP. So why not use a
high-powered library?

Discalimer: that solution was actually mine but I am not referring to it
because of this, but rather because I think that parsing all the cells
with a one liner using a robust HTML parser is actually much better in
practice than to use a basic set of regexps and then patch the results
they yield with ad-hoc rules (missing close tags etc) looked up from 3
examples. I believe the above HPricot-powered solution will work with
100 records, too (if the other 97 does not get *really* messed up - but
in that case the regexps will fail miserably too) whereas the
we-do-not-need-any-high-powered-library approach may need another 25
patches due to the other errors in the 100-record HTML...

I do not argue that parsing the page with regexps and seeing what's
going on under the hood can provide a lot of experience, but I am really
sure that feeding a real life page to a HTML parser is safer than to use
the regexp approach.

Of course if this question is just a theoretical one, and there won't be
100 (or more than 3) records, just these 3, then forget about this mail.

Cheers,
Peter

__
http://www.rubyrailways.com