On Thu, Feb 24, 2011 at 4:09 AM, Philip Rhoades <phil / pricom.com.au> wrote: > I have script that does: > > - statistical processing from data in 50x32x20 (32,000) large input files > > - writes a small text file (22 lines with one or more columns of numbers) > for each input file > > - read all small files back in again for final processing. > > Profiling shows that IO is taking up more than 60% of the time - short of > making fewer, larger files for the data (which is inconvenient for random > viewing/ processing of individual results) are there other alternatives to > using the "File" and "IO" classes that would be faster? I think whatever you do, as long as you do not get rid of the IO or improve IO access patterns your performance gains will only be marginally. Even a C extension would not help you if you stick with the same IO patterns. We should probably learn more about the nature of your processing but considering that you only write 32,000 * 22 * 80 (estimated line length) = 56,320,000 bytes (~ 54MB) NOT writing those small files is probably an option. Burning 54MB of memory in a structure suitable for later processing (i.e. you do not need to parse all those small files) is a small price compared to the large amount of IO you need to do to read that data back again (plus the CPU cycles for parsing). The second best option would be to keep the data in memory as before but still write those small files if you really need them (for example for later processing). In this case you could put this in a separate thread so your main processing can continue on the state in memory. That way you'll gain another improvement. For reading of the large files I would use at most two threads because I assume they all reside on the same filesystem. With two threads one can do calculations (e.g. parsing, aggregating) while the other thread is doing IO. If you have more threads you'll likely see a slowdown because you may introduce too many seeks etc. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/