2009/9/17 Jason Lillywhite <jason.lillywhite / gmail.com>:
> I want to make sure I do what is most efficient when dealing with
> multiple and potentially large files.
>
> I need to take row(n) and row(n+1) from a file and use the data to do
> things in other parts of my program. Then the program will iterate by
> incrementing n. I may have up to 30 files, each having 50,000 rows.
>
> My question is should I read row(n) and row(n+1), accessing the file
> again and again on each iteration of the main program? Or should I just
> read the whole file into memory (say, an array) then just grab items
> from the array by index in the main program?

Other schemes can be devised too:

1. read the file once remembering indexes for every file and row
(IO#tell) and then access rows via IO#seek

2. since you are incrementing n, read row n, remember pos, read row n
+ 1, next time round #seek to position and continue reading

3. as 2 but remember line n+1 so you do not have to read it again

4. if the access pattern to files is not round robin but different,
you might get better results by storing more information in memory
forr least recently accessed files

5. read files in chunks of x lines and remember them in memory thus
reducing file accesses

...

It really depends on what you do with those files, how your access
patterns are etc.

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/