Ben Giddings wrote:
> On Monday 09 May 2005 15:04, Sam Kong wrote:
> 
>>Hi, all!
>>
>>Quite often, when I need to read a list of web pages, I download the
>>html sources and save them in a single file like a.html.
>>If they are mostly texts, I open the html using web browser, select all
>>and copy it to an editor and save it.
>>I want to make the process shorter.
>>How can I extract the text from html source?
>>I'm sure there're many parsers for it.
>>What is the most convenient one?
> 
> 
> You may find my HTMLTokenizer library convenient for this.  To do what you 
> need, all you'd do is keep calling "tokenizer.getText()"
> 
> http://rubyforge.org/projects/htmltokenizer/


WWW::Mechanize sits atop such a process, but makes it easier to define 
what to do for elected elements and such.

Just sayin' ...


James