On Tue, Sep 21, 2004 at 06:24:52PM +0900, Dick Davies wrote:
> > I first learned about this approach via Eivind Eklund when talking about
> > OVCS.  It's the method used by Subversion and monotone (AFAIR): index
> > data by its digest. A number of interesting things happen when you do so:
> > * full-tree versioning
> > * "implicit deltas" and fairly efficient compression of the data
> > * ...
> 
> By 'index by digest', do you mean something like Venti:
>  
> http://www.cs.bell-labs.com/sys/doc/venti/venti.html

Yes, the fundamental idea is the same.

> ? I tried playing with a ruby-based version of this a while ago, but couldn't 
> find a good way of chopping up files to store them efficiently.....

A moving CRC will do, e.g.

  if crc(buffer, offset, CRCLEN) % AVERAGE_LENGTH == 1
     chop up to current offset
	 insert fragment
  else
     offset += 1
     ... logic if offset >= MAX_FRAGMENT_SIZE ...
  end

that gives you chunks of length averaging AVERAGE_LENGTH, in most
cases. Lower values mean higher P(node reuse) but there's a per-chunk overhead
(key + pointer to it in a list, etc).

-- 
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com