On Jan 9, 2006, at 5:57 AM, Austin Ziegler wrote:
> On 09/01/06, jotto <jonathan.otto / gmail.com> wrote:
>> I can't find a method to remove HTML from a string in the core  
>> API. PHP
>> has something called strip_tags. Does Ruby have anything like this?
>> http://us3.php.net/manual/en/function.strip-tags.php
>
> Not built in. It's not really appropriate for the core language.
> That's one of the things that makes PHP easy to use for people who are
> trying to do simple things, but makes it hard when you get into
> engineering and maintaining real programs. As was suggested by the
> other respondent, it's relatively easy to remove:
>
>   a.gsub(%r{</?[^>]+?>}, '')

...just pray that the HTML you are modifying is valid, and not some  
garbage file that web browsers happen to treat as intended. For  
example, watch the above regexp go to town on some invalid HTML:


class String
	def strip_tags
		self.gsub( %r{</?[^>]+?>}, '' )
	end
end

source = <<ENDHTML
<html><body>
<p>I'm pretending to know how to code. I <3 HTML, it's teh best!!!!</p>
<script>
for ( i=0; i<10; i++ ){ document.write(i+'<br>') }
</script>
BLASTOFFS!!!!
</body>
ENDHTML

puts source.strip_tags
#=> I'm pretending to know how to code. I
#=>
#=> for ( i=0; i') }
#=>
#=> BLASTOFFS!!!!