"Jim Freeze" <jim / freeze.org> wrote in message news:20030606162954.A29519 / freeze.org...
> Hi:
>
> I am trying to strip <script>blah</script> statements,
> that occur across multiple lines, from an html document.
>
> For a <script> and </script> on a single line I find that this works:
>
>   html.gsub(/<script[^>]*>.*(<\/script>)/i, "")
>
> I don't see why this does not work across multiple lines:
>
>   html.gsub(/<script[^>]*>.*(<\/script>){1,1}/mi, "")
>
> I thought the {1,1} would tell it to match only one </script>.
>
> Regexp experts, I welcome your input.
>
> --
> Jim Freeze
> ----------
> Bumper sticker:
>
> "All the parts falling off this car are of the very finest British
> manufacture"
>

? after .* makes it less greedy ...


html = <<EHTML
<P>
<IMG SRC="/images/logos/logo.gif" WIDTH="96" HEIGHT="51"
ALT="Blah" BORDER="0" ALIGN="left">&nbsp; &nbsp;
<P>
<P>&nbsp;<BR>

<SPAN CLASS="title3">Notes blah
</SPAN>
<pre> stuff
</pre>
<span CLASS="contentSectionHeading"> - lose this
</span><BR>

<pre> more stuff
</pre>
EHTML

html.gsub!(/<SPAN[^>]*>.*?(<\/SPAN>)/mi, "")
puts html


#------------------------------------
<P>
<IMG SRC="/images/logos/logo.gif" WIDTH="96" HEIGHT="51"
ALT="Blah" BORDER="0" ALIGN="left">&nbsp; &nbsp;
<P>
<P>&nbsp;<BR>


<pre> stuff
</pre>
<BR>

<pre> more stuff
</pre>
#------------------------------------


daz