Ken Bloom wrote:

> Now let's start writing. We write a block at the beginning of line 0, 
> then we seek to the beginning of line 1 and write another block. Since 
> Linux allows sparse files it doesn't write anything in the blocks between 
> these two blocks.

Yes

> They'll get filled in later when we write more blocks 
> there. The two blocks we just wrote are now next to each other on disk. 

Not necessarily. Filesystems might protect themselves against these 
access patterns to avoid fragmentation (ext2/3 already prevent several 
processes accessing the disk at the same time writing to the same 
cylinder groups to avoid fragmentation), in fact if I understand the 
source code of ext2 correctly this is one of the things indirect block 
allocation prevent partially (ie: non-contiguous parts are allocated in 
different parts of the disk and the fs tries to put a part next to a 
contiguous one).
I guess you'd probably end with a matrix split in large continuous 
blocks in various parts of one or perhaps more cylinder groups on disk.

> Keep going, and you'll find that both your reads and writes are 
> contiguous.

This probably isn't true and if you want to get down to the real meat, 
you'll have to evaluate the seek times and amount of them. On a modern 
drive, a cylinder probably stores something of the order of the 
megabytes or tens of megabytes. This means that more than one line of 
the matrix fit in one single cylinder : consecutive lines can be 
accessed without seek or minimal (track to next track) seek. The only 
penalty is that you access a low percentage of what the disk could have 
in one revolution.

On the other end, if you access lines in non contiguous parts of the 
matrix with smaller segments (typical when you transpose by sub-matrix), 
you diminish the amount of data used on each revolution and get more 
seeks. If you are doing tens or hundred of thousands of them, this 
begins to take a huge toll. Indeed reading 2G on a modern SATA disk 
takes at most 40s (50+MB/s), so seeking can become the performance 
bottleneck as soon as you get into the tens of thousands with quick 
seeks, and a 2G 64bit integer matrix already have a number of lines in 
this ballpark.

> On the other hand, if you try to transpose back, the file 
> that you just finished writing is the pathological worst case for seek 
> time.

This is not obvious and at least not the case for ext2/3 (unless your 
filesystem is already heavily fragmented). I believe the same allocation 
heuristics are used in various modern fs to avoid file fragmentation.

> 
> Does this sound about right? In practice it may not be worth optimizing 
> seek time, since that's very much the realm of the operating system.
> 

Unfortunately when you deal with data that can't fit in cache, you 
pretty much defeat any mechanism that the OS uses to optimize disk 
access so you can't rely on it. By the number of lines in the matrix, 
you can guess the order of seek numbers neeeded and you can guess that 
algorithms not optimizing them will get far worse performance than 
others optimizing them.

Now what would be real fun is the OP telling that he uses a SSD with 
access times in the order of 0.1ms for storing these matrices :-) Then 
I'd eat my hat and advise to go with the easiest algorithm to code :-)

Lionel