On Nov 19, 7:14 pm, William James <w_a_x_... / yahoo.com> wrote:
> On Nov 19, 12:41 pm, cskilbeck <charlieskilb... / gmail.com> wrote:
>
>
>
> > Hi,
>
> > I need to extract everything between <table> and </table> on a website
> > (there's only one table on the page. So far I have:
>
> > require 'open-uri'
> > page = open('http://xxx.html').read
> > page.gsub!(/\n/,"")
> > page.gsub!(/\r/,"")
> > inner = page.scan(%r{.*<table.*>(.*)</table>.*}m)
> > print inner
>
> > but inner is empty - any ideas?
>
> > If I substitute line 2 with
>
> > page = '123<table>456</table>789
>
> > I get inner = 456, which is correct.
>
> inner = page[ %r{<table.*?>(.*?)</table>}mi, 1]

Thanks all for your help. non greedy matching is the key.