On Oct 3, 2005, at 7:01 AM, Gavin Kistner wrote:
> str = 'foo,bar ,, baz,qux,,,jorb,jing,,,,blat'
> out = []
> str.scan( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |a,b|
>     out << a.gsub( ',,', ',' )
> }
> p out
> #=> ["foo", "bar , baz", "qux,", "jorb", "jing,,blat"]

Whenever I find myself about to do something like the above, I say to  
myself:

"Hey, buddy, pre-allocating an array and shoving stuff onto it in a  
block is neat as an exercise of the closure, but you should be using  
something like #map."

Unfortunately, it would appear that #scan doesn't automagically map  
the returned value from each iteration to an array. Man, wouldn't  
that be nice?

Following is my hackish attempt to make a String#scan_and_map  
function that does the above.

A few questions for the gurus:
a) Is there a better way to deal with bol? with StringScanner? (Boy,  
it'd be nice if there was a Regexp#uses_bol_at_start_of_match? method.)

b) Is there a clean way to tell the 'arity' of a regexp (how many  
captures it has, at max)? (Boy, it'd be nice if there was a  
Regexp#arity method.)

c) Without knowing the arity, is there a clean/fast way to gather all  
the 1..n submatches held in StringScanner? (Boy, it'd be nice if  
StringScanner gave you access to an array of subcaptures as a single  
property. And if it set the $1..$9 vars.)

require 'strscan'
class String
   def scan_and_map( regexp )
     # A naive check for beginning of line
     use_bol = regexp.inspect =~ /\/(?:\((?:\?:)?)*\^/

     # A naive check for sub-expression groups
     # Will fail for unescaped ( inside [], for example
     use_groups = regexp.inspect =~ /(\^|[^\\])\\{2}*\(/

     results = []
     ss = StringScanner.new( self )
     while !ss.eos?
       ss.scan_until( regexp ) unless ss.match?( regexp )
       if use_bol and not ss.bol?
         ss.pos += 1
       else
         result = ss.scan( regexp )
         if use_groups
           result = (1..9).to_a.map{ |i| ss[i] }
         end
         results << yield( result )
       end
     end
     results
   end
end


str = 'foo,bar ,, baz,qux,,,jorb,jing,,,,blat'
p str.scan_and_map( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |saved,others|
   saved
}
#=> ["foo", "bar , baz", "qux,", "jorb", "jing,,blat"]