>>>>> "M" == Marko Schulz <in6x059 / public.uni-hamburg.de> writes: M> - or look at Tom Christiansens striphtml: M> http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz M> Yes, it is perl, but it should be quite easy to translate it M> into ruby. Mainly it consists only of 3 regexps and one M> hash. Yes it's in perl and this is why you'll not use it :-) For example these *complex* regexp can't parse this valid document pigeon% cat aa.html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <HTML> <HEAD> <TITLE>aa</TITLE> </HEAD> <BODY> Numbers 2<1 </BODY> </HTML> pigeon% There are 2 errors in this script * he want to validate an HTML document and a regexp is not the right tool for this * it try to separate comment from the rest of the document, i.e. the biggest error is here ######################################################### # first we'll shoot all the <!-- comments --> ######################################################### Never, never use it :-))) Guy Decoux