Chris Gallagher wrote:

> OK that code all works great but i have one last question :)
> 
> This is allowing me to scrape the values of the class values on tags and
> any other attribues such as that. My question is, how would i modify the
> code in order to get it to capture say a block of text such as:
> 
>  <p>this is text that i want to scrape</p>
> 
> any ideas?

Really simple:

array = page_content.scan(%r{<p>(.*?)</p>}m).flatten

Returns an array, each cell of which is a paragraph from the original page.

This is why it is a bad idea to adopt a package or library to accomplish
something that is easier to accomplish with a few lines of code, or even
one line as in this case.

At first the library seems as though it can do anyting, with no need to
understand what is actually going on. Pretty quickly you encounter
something the library cannot do, and you have to ... understand what is
going on. Then you abandon the library and write normal code.

In Ruby, writing normal code is so easy that the traditional cautions
against adopting miraculous libraries should be amplified tenfold.

-- 
Paul Lutus
http://www.arachnoid.com