Daniel Berger wrote:
> Robert Klemme wrote:
>> William James <w_a_x_man / yahoo.com> wrote:
>>
>>> =begin
>>>
>>> Unlike Gawk and Mawk, Ruby won't accept a regular expression as a
>>> record-separator. Let's fix that. The substring matched by the
>>> record-separator is automatically removed from the record, but it
>>> can be obtained by RecSep#terminator.
>>>
>>> Typical usage:
>>>
>>> File.open("stuff.txt"){|handle|
>>>  reader = RecSep.new( handle, /^\d+\.\n/ )
>>>  reader.each {|x| p x }
>>> }
>>
>>
>> I'd prefer something integrated with IO, e.g.
>>
>> File.open("foo") {|io| io.each_chunk(/:/) {|ch| p ch}}
>>
>> module RegularIOChunks
>>  def each_chunk(rx, read_buffer = 1024)
>>    buff = ""
>>    loop do
>>      until ( match = ( rx.match( buff ) ) )
>>        part = read(read_buffer)
>>
>>        if part.nil?
>>          yield buff
>>          return self
>>        end
>>
>>        buff << part
>>      end
>>
>>      yield match.pre_match
>>      buff = match.post_match
>>    end
>>  end
>> end
>>
>> class IO
>>  include RegularIOChunks
>> end
>>
>> Kind regards
>>
>>    robert
>
> This would *not* be easy to implement.  Consider backtracking (do we
> put it back in the stream?) and greediness (how much do we read?).
> Unless you want to forbid greedy regular expressions and ignore
> backtracking (not to mention certain switches), this gets real ugly,
> real quick.

Right!  My main point was that I'd prefer a solution that is integrated
with IO, i.e. no extra instance needs to be created (at least not
explicitely).  Just a question of usability.

One implementation option would be to continue reading not until the first
match but until matches don't differ any more.  That would deal at least
with cases like /a{3,10}/ where the sequence is cut in the middle of a
sequence of 10 "a"'s.  And you would get a match for the first half while
you wanted to match the whole sequence.

> This has come up wrt Perl as well on p5p.  Take a look here for one
> thread in midstream:
>
> http://www.nntp.perl.org/group/perl.perl5.porters/64830
>
> Rumor has it that setting $/ to a regex will be legal in Perl 6, but
> I think there will be several restrictions.

As you mention, the general problem with applying regexps is a conceptual
one: because of greedy quantifiers in the worst case the whole file is
read into memory (just consider using /.+/ as delimiter) which doesn't fit
well with the streaming approach. :-)

Kind regards

    robert