On 21.06.2007 16:12, Rob Biedenharn wrote:
> On Jun 21, 2007, at 9:47 AM, Stephen Ball wrote:
> 
>> On 6/20/07, Daniel DeLorme <dan-ml / dan42.com> wrote:
>>> That doesn't really explain why the regexp finds an extra empty string.
>>> I know that zero occurrences is one match but after a greedy match that
>>> matches everything, there should be (logically?) no other match. I am no
>>> stranger to regexps and the result is counter-intuitive to me; I would
>>> consider it a bug. Or at least a very very peculiar behavior.
>>>
>>> Daniel
>>
>> It's because the pattern /.*/ matches everything, including the
>> absence of everything. Yes, with the proper regexs you can indeed have
>> tea and no tea at the same time. Certainly peculiar, but occasionally
>> useful.
>> ...
>> -- Stephen
> 
> That still doesn't really explain why "hello".scan(/.*/) => ["hello", ""]
> 
> Why wouldn't it be ["hello", "", "", "", "", "", "", "", "", "", "", "", 
> ... ] since I (or rather the OP) could continue to match zero characters 
> (bytes) at the end of the string forever?  It does seem that it might be 
> that a termination condition is checked a bit later than it should be in 
> this case.

As far as I remember it works like this: first .* matches the whole 
sequence.  Then the "cursor" is placed behind the match, i.e. after the 
last char of the match and matching starts again.  At this place the 
empty sequence matches because we're at the end of the match.  After 
that match the cursor is advanced one step (to avoid endless 
repetitions) and - alas! - we're at the end of the string and matching 
stops.

For learning regular expressions this is a great program: it allows to 
graphically step through the matching process:
http://weitz.de/regex-coach/

See also this thread: 
http://groups.google.de/group/comp.lang.ruby/browse_frm/thread/9bf7989dd42374f7/f759612390ff905f?lnk=st&q=&rnum=10#f759612390ff905f

Btw, for replacing the whole string this is much better:

irb(main):001:0> s = "foo"
=> "foo"
irb(main):002:0> s.object_id
=> 1073540760
irb(main):003:0> s.replace "bar"
=> "bar"
irb(main):004:0> s.object_id
=> 1073540760
irb(main):005:0> s
=> "bar"
irb(main):006:0>

Kind regards

	robert