Hello --
There's been altogether too little hacking on this
list recently. So grab your pickaxes!
First of all... I *know* that it is considered horrible ever
to write anything in the realm of Web programming that
reinvents, fails to use existing modules, dares to presume
that it might work, etc. Pointers to existing code that
actually does what I'm trying to do are welcome, but don't
worry about enlightening me in the abstract. I'm on the
case :-)
The idea is:
Given a string, do one of two things to it:
1. kill all SGML/HTML-style tags, by turning <,> into
<, >
2. kill all such tags except those specifically
designated as OK.
The idea, of course, is to render arbitrary text input
tag-safe. (I know this isn't the only security gap imaginable,
but it's what this exercise is trying to do.)
Here's the current code, and a test (where the 'this' tag
is OK and all others are not). The question is... can we
break this code and sneak horrible/destructive things into
the input?
module TagKiller
def kill_tags(text)
text .
gsub('>', '>') .
gsub('<', '<')
end
def kill_except(text, tags=[])
tagRE = '(<\s*/?\s*([^>/\s]+)[^>]*/?>)'
text.gsub "#{tagRE}" do |match|
tag = $2.downcase.strip
if tags.include? tag
match
else
kill_tags(match)
end
end
end
end
include TagKiller
$text = <<-EOM
<this is a tag> content </this> here's something afterward
<here is> a bad tag </here>
<this good tag starts on
one line> and ends here </this>
<this/>
EOM
print kill_except($text, %w{this} )
__END__
Output:
<this is a tag> content </this> here's something afterward
<here is> a bad tag </here>
<this good tag starts on
one line> and ends here </this>
<this/>
David
--
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web: http://pirate.shu.edu/~blackdav