On Wed, 26 Oct 2005, Cameron McBride wrote:

> On 10/25/05, Ara.T.Howard <Ara.T.Howard / noaa.gov> wrote:
>
>> what did you do?  mine suports read/write mmap so an narray can be backed by a
>> file and any write to the narray writes to the file - very dangerous, not
>> reccomended, and extremely useful for tweaking a gb grid with minimal io.
>
> btw, this sounds like a fun hack.  this anywhere public?  (yes, I'm ignoring
> the "not reccomended" - sometimes the dangerous toys are the most fun...)

here you go:

   http://codeforpeople.com/lib/ruby/nmap/

a teaser:


   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > cat sample.rb
   require "mmap"
   require "narray"

   class NMap
     %w( path na_type dims bytes_per_pixel size mmap narray ).each do |m|
       attr_accessor m
     end
     alias_method "na", "narray"
     alias_method "na=", "narray="

     def initialize path, na_type, *dims
       self.path = path
       self.na_type = na_type
       self.dims = dims
       self.bytes_per_pixel = NArray::new(na_type, 1).to_s.size
       self.size = dims.inject(1){|s,d| s * d}
     #
     # this is a race condition - atomic creation is required for real code
     #
       unless test ?e, path
         open(path, "w"){|f| f.truncate(size * bytes_per_pixel)}
         map
         narray[] = 0
       else
         map
       end
     end
     def map
       self.mmap = Mmap::new path, "rw", Mmap::MAP_SHARED
       self.narray = NArray::str mmap.to_str, na_type, *dims
     end
   end

   nmap = NMap::new 'data', NArray::INT, 3, 2

   p nmap.na
   nmap.na[] = nmap.na + 1

   nmap.mmap.msync
   nmap.mmap.munmap


we run a few times:


   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > ruby sample.rb
   NArray(ref).int(3,2):
   [ [ 0, 0, 0 ],
     [ 0, 0, 0 ] ]

   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > ruby sample.rb
   NArray(ref).int(3,2):
   [ [ 1, 1, 1 ],
     [ 1, 1, 1 ] ]

   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > ruby sample.rb
   NArray(ref).int(3,2):
   [ [ 2, 2, 2 ],
     [ 2, 2, 2 ] ]

   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > ruby sample.rb
   NArray(ref).int(3,2):
   [ [ 3, 3, 3 ],
     [ 3, 3, 3 ] ]


note

   - data is backed by a file
   - file is magically changed with no io done explicitly
   - if you changed only one row of a gb grid only a few pages would be
     read/written.  example:

   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > ls -ltar gb
   -rw-rw-r--    1 ahoward  ahoward  1073741824 Oct 26 11:15 gb


   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > time ruby a.rb 1
   NArray(ref).byte(1024,1024,1024):
   [ [ [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
    ...

   real    0m0.029s
   user    0m0.010s
   sys     0m0.020s


   jib:~/eg/ruby/nmap/narray_mmap/nmap-0.0.0 > time ruby a.rb 42
   NArray(ref).byte(1024,1024,1024):
   [ [ [ 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
       [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ],
    ...

   real    0m0.028s
   user    0m0.020s
   sys     0m0.000s


so, that's about 0.28 seconds to read in a gigabyte sized cube, change one row
of it, and write it back out!  not too shabby.

the speed, of course, is a result of the fact that mmap doesn't actually
read/write the entire file.  but using narray to get around having to use
loops and pack to write binary data is also a huge speedup.  for instance
doing something like

   nmap.na = nmap.na * 2

doubles the entire grid without loops or type conversion (pack).

regards.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================