On Wed, 18 Aug 2004 19:31:01 +0900, Robert Klemme <bob.news / gmx.net> wrote:
> "Austin Ziegler" <halostatue / gmail.com> schrieb im Newsbeitrag
> news:9e7db91104081713254f2eb39e / mail.gmail.com...
>> str = '<span id="1"> <span> ...</span> </span> '
>> re = /(<(\/?)span> )/i
>> 
>> str.scan(re)
>> # => [["<span> ", ""], ["</span> ", "/"], ["</span> ", "/"]]
>> 
>> matches = []
>> str.scan(re) do
>>   matches << Regexp.last_match
>> end
>> 
>> matches.each do |match|
>>   match.captures.each_with_index do |capture, ii|
>>     soff, eoff = match.offset(ii + 1)
>>     puts %Q("#{capture}" #{soff} .. #{eoff})
>>   end
>> end
> While that works, isn't it ridiculous that one has to resort to a
> class method ("Regexp.last_match")? I mean, there should rather be
> something like
> 
> /o/.each( "foo" ) do |md|
>   # md is MatchData
> end

There's a simple solution, and I'll probably open an RCR about this
if others agree with it. String#scan, #sub, and #gsub should yield
MatchData objects, not Strings. There are probably others, but those
are the ones that come to mind. This *will* break some code,
unfortunately, but that can be mitigated by adding #to_str. IMO,
this will make #gsub much easier to deal with, as you won't have to
resort to either Regexp.last_match or $[0-9] variables to be able to
work with captures. My Regexp.last_match call only presumes that
Regexp.last_match is actually threadsafe, whereas we know that the
ugly Perlish $ variables are threadsafe. I think this is an
acceptable level of incompatibility because of the use of #to_str
and the amount of flexibility that would be gained. As far as I
know, it wouldn't require *that* big a change, because for
Regexp.last_match to work, there must still be a MatchData object
*somewhere*.

What do you think?

-austin
-- 
Austin Ziegler * halostatue / gmail.com
               * Alternate: austin / halostatue.ca