On Wed, Apr 27, 2011 at 10:02 PM, Cee Joe <cyril_jose / ymail.com> wrote:
> Hi all,
>
> In a bit of a rut. Have a file with a lot of text. I want to seperate
> the text in this file as entries. Each entry that I would be seperating,
> would be done so using IO.pos and when that cursor reaches a certain
> character in the file, it will ideally place all the content before that
> character into a buffer. Then the cursor will continue reading until it
> hits that same character again and put that content into a buffer, so on
> and so forth. (Character I'll be reading would be a greater than symbol)
>
> =A0Would I use a do iterator or use a while loop with a gets method? Or
> readlines perhaps?
>
> File:
>>entry 1
> rubyrubyrubyrubyrubyrubyrubyruby
> (newline here which I don't want)
>>entry 2
> rubyrubyrubyrubyrubyrubyrubyruby
>
> Entry1 and entry2 will be in seperate buffers which I would be able to
> access again.
>
> buffer1 =3D >entry 1
> rubyrubyrubyrubyrubyrubyrubyruby
>
> buffer2 =3D >entry 2
> rubyrubyrubyrubyrubyrubyrubyruby
>
>
> PS. The file is huge, so I don't want to read it into memory. What is
> the best way to approach this? Any suggestions or comments would be
> helpful. Thanks!

One of the simplest approaches is to use Ruby's ability to use
arbitrary record delimiters:

File.foreach file_name, ">" do |chunk|
  chunk.chomp! ">"
  chunk.gsub! /\r\n?|\n/, '' # remove line terminators
  # if you need the leading ">":
  # chunk[0,0] =3D ">"
  p chunk
end

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/