Hello --

On Mon, 26 Nov 2001, HarryO wrote:

> I want to do something similar to what I've seen on slashdot, where the
> user is allowed to enter HTML for responses to articles, but the set of
> tags that are acceptable is only a subset of full  HTML.
>
> Now, rather than trying to do something like edit the string to remove
> any matches of UNacceptable tags, I'd prefer to be able to define which
> ones ARE acceptable and then edit the string to remove anything else
> that looks like a tag but doesn't match the ones I've decided are OK.
>
> My logic here is that I can easily define the list of those I'm happy to
> have, but that would be a relatively small subset of the entire range of
> tags.
>
> So, can anyone suggest the right way to approach this?

I've unearthed something I wrote a while ago, for exactly this purpose.
I haven't reexamined it to see whether I'd do it the same way again....
but anyway, here it is, in case it's of use.

See end of file for usage example.


------  tagkiller.rb ----
class String

  def indices(e)
    ixs = []
    tmp = dup
    i = -1
    while i = tmp.index(e,i+1)
      ixs.push i
      yield i if block_given?
    end
    return ixs
  end

end

module TagKiller

  def kill_tags(text)
    text .
      gsub('>', '>') .
      gsub('<', '&lt;')
  end

  def kill_tags_except(str, ok_tags=[])
    ok = {}
    str.indices('<') do |i|
      m = match_angle(str,i)
      if m and ok_tags.include? element_at(str,i)
	  ok[m] = ok[i] = true
      end
    end
    str.indices(/[<>]/).reject {|i| ok[i]} .reverse .each do |i|
      str[i] = kill_tags(str[i].chr)
    end
    str
  end

end

module TagFinder

  def match_angle(str,i=0)
    c = 0
    begin
      case str[i]
      when ?< then c += 1
      when ?> then c -= 1
      end
      i += 1
      return nil if i >=str.size
    end until c == 0
    return i - 1
  end

  def element_at(str,i=0)
    /<\s*\/?([^\s>\/]+)/.match(str[i..-1])
    $1
  end

end

if __FILE__ == $0
  def ttest
    include TagKiller
    include TagFinder
    text = <<EOM
    <this good tag <em ok>starts/> on <this>one line
    and ends<bad/> here> <ba></ba>
EOM

    ok_tags =  %w(this em)
    print kill_tags_except(text, ok_tags)

  end

  ttest
end

__END__



David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav