Hi --

On Thu, 21 Jun 2007, Stephen Ball wrote:

> On 6/20/07, Daniel DeLorme <dan-ml / dan42.com> wrote:
>> That doesn't really explain why the regexp finds an extra empty string.
>> I know that zero occurrences is one match but after a greedy match that
>> matches everything, there should be (logically?) no other match. I am no
>> stranger to regexps and the result is counter-intuitive to me; I would
>> consider it a bug. Or at least a very very peculiar behavior.
>> 
>> Daniel
>> 
>
> It's because the pattern /.*/ matches everything, including the
> absence of everything. Yes, with the proper regexs you can indeed have
> tea and no tea at the same time. Certainly peculiar, but occasionally
> useful.
>
> So: since * matches "zero or more" characters when it starts the
> search for .* it matches the absence (the 'zero') and then matches the
> string (the 'or more').

It's the other way around, though; it matches "hello" *first*, and
then "".  So the zero-matching (which I admit I'm among those who find
unexpected) is happening at the end.

> To prevent this you need to indicate to your regular expression that
> you only want the subset of 'everything' that is actually something.
> Here are a couple ways to do this:
>
> /.+/ will match 1 or more of something, so doesn't return the absence
>
> /^.*/ will start the search at the start of the pattern, in a way
> bypassing the match of zero (the pattern /^.*$/ makes this more
> clear).

Here, again, "hello" is first, so /^.*/ matches it but doesn't match
the second time ("") because the "" isn't anchored to ^.


David

-- 
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
   RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
     & consulting:  Ruby Power and Light, LLC (http://www.rubypal.com)