On Thu, 24 Feb 2011 12:09:48 +0900, Philip Rhoades wrote:
> People,
>
> I have script that does:
>
> - statistical processing from data in 50x32x20 (32,000) large input 
> files
>
> - writes a small text file (22 lines with one or more columns of 
> numbers)
> for each input file
>
> - read all small files back in again for final processing.
>
> Profiling shows that IO is taking up more than 60% of the time - 
> short of
> making fewer, larger files for the data (which is inconvenient for 
> random
> viewing/ processing of individual results) are there other 
> alternatives to
> using the "File" and "IO" classes that would be faster?
>
> Thanks,
>
> Phil.

 I can think of two approaches here.

 First, you can write one large file (perhaps creating it in memory 
 first) and then splitting it afterwards.

 Second, if you're on *nix, you can write your output files to a tmpfs.

 Both should reduce number of seeks and improve performance.

-- 
   WBR, Peter Zotov.