On Nov 19, 7:14 pm, William James <w_a_x_... / yahoo.com> wrote: > On Nov 19, 12:41 pm, cskilbeck <charlieskilb... / gmail.com> wrote: > > > > > Hi, > > > I need to extract everything between <table> and </table> on a website > > (there's only one table on the page. So far I have: > > > require 'open-uri' > > page = open('http://xxx.html').read > > page.gsub!(/\n/,"") > > page.gsub!(/\r/,"") > > inner = page.scan(%r{.*<table.*>(.*)</table>.*}m) > > print inner > > > but inner is empty - any ideas? > > > If I substitute line 2 with > > > page = '123<table>456</table>789 > > > I get inner = 456, which is correct. > > inner = page[ %r{<table.*?>(.*?)</table>}mi, 1] Thanks all for your help. non greedy matching is the key.