Arun Kumar wrote:
> Hi,
> I know that what i'm going to ask is for the solution for a simple
> problem. But as I'm new to Ruby I have not learnt a lot about regular
> expressions in Ruby.
> 
> Can anybody tell me how to extract all the contents which are included
> inside the '<html>' and '</html>' tag and also to extract the text given
> in between the '<a>' and '</a>' tag using regular expression. I know it
> can be extracted using the 'scan' method but I dont know what should be
> the matching patterns or expressions. Can anybody pls help me
> 
> Regards
> Arun

s = "<a>hello world</a>"
new_s = s.gsub(/<.*?>/, "")
puts new_s

--output:--
hello world




html = DATA.read()
regex = Regexp.new("<html>(.*)</html>", Regexp::MULTILINE)
puts html[regex, 1]

__END__
<html>
<head>
  <title>html page</title>
</head>
<body>
  <div>hello</div>
  <div>world</div>
  <div>goodbye</div>
</body>
</html>


--output:--
<head>
        <title>html page</title>
</head>
<body>
        <div>hello</div>
        <div>world</div>
        <div>goodbye</div>
</body>


In the expression:

html[regex, 1]

The 1 says to return the first parenthesized group in the regex.




-- 
Posted via http://www.ruby-forum.com/.