Hello --

There's been altogether too little hacking on this 
list recently.  So grab your pickaxes!

First of all... I *know* that it is considered horrible ever
to write anything in the realm of Web programming that
reinvents, fails to use existing modules, dares to presume
that it might work, etc.  Pointers to existing code that
actually does what I'm trying to do are welcome, but don't
worry about enlightening me in the abstract.  I'm on the
case :-)

The idea is:

Given a string, do one of two things to it: 

   1. kill all SGML/HTML-style tags, by turning <,> into
      &lt;, &gt;
   2. kill all such tags except those specifically
      designated as OK.

The idea, of course, is to render arbitrary text input
tag-safe.  (I know this isn't the only security gap imaginable,
but it's what this exercise is trying to do.)

Here's the current code, and a test (where the 'this' tag
is OK and all others are not).  The question is... can we
break this code and sneak horrible/destructive things into
the input?

   module TagKiller

     def kill_tags(text)
       text .
         gsub('>', '&gt;') .
         gsub('<', '&lt;')
     end

     def kill_except(text, tags=[])
       tagRE = '(<\s*/?\s*([^>/\s]+)[^>]*/?>)'
       text.gsub "#{tagRE}" do |match|
         tag = $2.downcase.strip
         if tags.include? tag
           match
         else
           kill_tags(match)
         end
       end
     end

   end


   include TagKiller

   $text = <<-EOM
   <this is a tag> content </this> here's something afterward
   <here is> a bad tag </here>
   <this good tag starts on
   one line> and ends here </this>
   <this/>
   EOM

   print kill_except($text, %w{this} )

__END__

Output:

   <this is a tag> content </this> here's something afterward
   &lt;here is&gt; a bad tag &lt;/here&gt;
   <this good tag starts on
   one line> and ends here </this>
   <this/>


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav