This version reads farther ahead in an attempt to cope
with greedy regular expressions.

=begin

Unlike Gawk and Mawk, Ruby won't accept a regular expression as a
record-separator. Let's fix that. The substring matched by the
record-separator is automatically removed from the record, but it
can be obtained by RecSep#terminator.

Typical usage:

File.open("stuff.txt"){|handle|
  reader = RecSep.new( handle, /^\d+\.\n/ )
  reader.each {|x| p x }
}

=end


class RecSep

  def initialize( file_handle, record_separator, chunk_size=10_000 )
    @handle = file_handle
    @rec_sep = record_separator
    @chunk_size = chunk_size
    @buffer = ""
    @terminator = nil
  end

  attr_reader :terminator, :buffer

  def get_rec
    ## The record-separator may be something like /\n\s*\n/,
    ## so we read until there's something left over in the buffer
    ## after the match.
    loop  do
      @rec_sep.match( @buffer )
      break  if  $~  &&  $~.post_match.size > 0
      s = @handle.read( @chunk_size )
      break  if not s
      @buffer << s
    end

    if $~
      @buffer = $~.post_match
      @terminator = $~.to_s
      $~.pre_match
    else
      @terminator = nil
      return nil  if "" == @buffer
      s, @buffer = @buffer, ""
      s
    end
  end

  def each
    while s = self.get_rec
      yield s
    end
  end

end