Several years ago, one of the members of the group offered me this 
routine which does a pretty good job of
extracting the text from a html page.

#--------------------------------------------------------------------
#    Strip HTML Tags from Line
#--------------------------------------------------------------------

def striphtml(line)
    line.gsub(/\n/, ' ').gsub(/<.*?>/, '')
end