On Feb 29, 1:14 am, William James <w_a_x_... / yahoo.com> wrote: > On Feb 28, 9:50 am, William James <w_a_x_... / yahoo.com> wrote: > > > > > On Feb 28, 12:36 am, Chirantan <chirantan.rajh... / gmail.com> wrote: > > > > I have an html code into string. I want to retrieve the content (Can > > > be any HTML code with any number of tags) present inside the div after > > > the heading till the end of the div. > > > > Example, > > > > <div class="info"> > > > <h5>Tagline:</h5> > > > Yippee Ki Yay Mo - John 6:27 > > > </div> > > > > <div class="info"> > > > <h5>Plot Outline:</h5> > > > John McClane takes on an Internet-based terrorist organization who is > > > systematically shutting down the United States. <a class="tn15more > > > inline" href="http://www.imdb.com/title/tt0337978/plotsummary" > > > onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? > > > link=/title/tt0337978/plotsummary';">more</a> > > > </div> > > > > In the above example, Plot Outline is header that I am looking for > > > then, regex should give me - > > > > John McClane takes on an Internet-based terrorist organization who is > > > systematically shutting down the United States. <a class="tn15more > > > inline" href="http://www.imdb.com/title/tt0337978/plotsummary" > > > onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? > > > link=/title/tt0337978/plotsummary';">more</a> > > > > And if "Tagline:" is what I am looking for then regex should give me - > > > > Yippee Ki Yay Mo - John 6:27 > > > > I hope the problem statement is clear. > > > Note that this will give spurious results if an html comment happens > > to contain what you are looking for. > > > def find_header header, html > > # Put all of the DIVs in an array. > > divs = html.scan( %r{<div.*?>(.*?)</div>}im ).flatten > > divs.each{|s| > > if s =~ %r{<h(\d)>#{header}</h\1>(.*)}im > > return $2.strip > > end > > } > > return nil > > end > > > html = DATA.read > > > puts find_header( "Plot Outline:", html ) > > > __END__ > > <div class="info"> > > <h5>Tagline:</h5> > > Yippee Ki Yay Mo - John 6:27 > > </div> > > > <div class="info"> > > <h5>Plot Outline:</h5> > > John McClane takes on an Internet-based terrorist organization who is > > systematically shutting down the United States. <a class="tn15more > > inline" href="http://www.imdb.com/title/tt0337978/plotsummary" > > onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? > > link=/title/tt0337978/plotsummary';">more</a> > > </div> > > More concise: > > def find_header header, html > html.scan( %r{<div.*?>(.*?)</div>}im ).flatten.each{|s| > return $1.strip if s =~ %r{<h5>#{header}</h5(.*)}im } > return nil > end Thank you William and Mark, The codes worked. :-) Thanks a lot.