On Feb 28, 12:36 am, Chirantan <chirantan.rajh... / gmail.com> wrote: > I have an html code into string. I want to retrieve the content (Can > be any HTML code with any number of tags) present inside the div after > the heading till the end of the div. > > Example, > > <div class="info"> > <h5>Tagline:</h5> > Yippee Ki Yay Mo - John 6:27 > </div> > > <div class="info"> > <h5>Plot Outline:</h5> > John McClane takes on an Internet-based terrorist organization who is > systematically shutting down the United States. <a class="tn15more > inline" href="http://www.imdb.com/title/tt0337978/plotsummary" > onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? > link=/title/tt0337978/plotsummary';">more</a> > </div> > > In the above example, Plot Outline is header that I am looking for > then, regex should give me - > > John McClane takes on an Internet-based terrorist organization who is > systematically shutting down the United States. <a class="tn15more > inline" href="http://www.imdb.com/title/tt0337978/plotsummary" > onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? > link=/title/tt0337978/plotsummary';">more</a> > > And if "Tagline:" is what I am looking for then regex should give me - > > Yippee Ki Yay Mo - John 6:27 > > I hope the problem statement is clear. Note that this will give spurious results if an html comment happens to contain what you are looking for. def find_header header, html # Put all of the DIVs in an array. divs = html.scan( %r{<div.*?>(.*?)</div>}im ).flatten divs.each{|s| if s =~ %r{<h(\d)>#{header}</h\1>(.*)}im return $2.strip end } return nil end html = DATA.read puts find_header( "Plot Outline:", html ) __END__ <div class="info"> <h5>Tagline:</h5> Yippee Ki Yay Mo - John 6:27 </div> <div class="info"> <h5>Plot Outline:</h5> John McClane takes on an Internet-based terrorist organization who is systematically shutting down the United States. <a class="tn15more inline" href="http://www.imdb.com/title/tt0337978/plotsummary" onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif? link=/title/tt0337978/plotsummary';">more</a> </div>