hemant wrote:
> On 7/23/07, Robert Dober <robert.dober / gmail.com> wrote:
>> On 7/23/07, geetha <sangeetha.geethu05 / gmail.com> wrote:
>> >  Hi,
>> > I am doing string search is one html file usign ruby.
>> > If the seach sting is htmlentities means I have not match that word.
>> > How can i do it. Please any one help me.....
>> >
>> > regards,
>> > S.Sangeetha.
>> >
>> We might be able to help you better if you post the data and what you
>> expect to get out from it exactly.
>>
>> Robert
>>
>> -- 
> 
> Robert:
> If search string has html entities, then do not proceed with search.
> 
> Well, its very hard to define if query string has HTML entities or
> not? 
No it's not...

> For example, do you consider following string has HTML entities?
> 
> b = "hello world and so what; and < and there we go >"
> 
> dunno yes and no, but if your answer is yes,
Then you'd be wrong.

irb(main):001:0> require 'rexml/text'
=> true
irb(main):002:0> re = REXML::Text::REFERENCE
=> /(?:&([\w:][\-\w\d\.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
irb(main):003:0> "this &amp; that" =~ re
=> 5
irb(main):004:0> "hello world and so what; and < and there we go >" =~ re
=> nil

Admittedly I'm not scanning for all defined HTML entities, just for 
valid XML entities, but given that one's a superset of the other, and 
undefined entity references probably shouldn't occur within a valid HTML 
document anyway, it's good enough for most purposes...

-- 
Alex