>>>>> "M" == Marko Schulz <in6x059 / public.uni-hamburg.de> writes:

M> - or     look at Tom Christiansens striphtml:
M>        http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
M>          Yes, it is perl, but it should be quite easy to translate it
M>          into ruby. Mainly it consists only of 3 regexps and one
M>          hash.
         
 Yes it's in perl and this is why you'll not use it :-)

 For example these *complex* regexp can't parse this valid document

pigeon% cat aa.html
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
 <TITLE>aa</TITLE>
</HEAD>
<BODY>

Numbers 2<1

</BODY>
</HTML>
pigeon% 

 There are 2 errors in this script
  * he want to validate an HTML document and a regexp is not the right tool
for this 
  * it try to separate comment from the rest of the document, i.e. the
biggest error is here

#########################################################
# first we'll shoot all the <!-- comments -->
#########################################################

 Never, never use it :-)))


Guy Decoux