--- Nikolai Weibull
<mailing-lists.ruby-talk / rawuncut.elitemail.org> wrote:

> Eric Mahurin wrote:
> 
> > Nikolai Weibull wrote:
> 
> > > Eric Mahurin wrote:
> > > 
> > > [...]
> > > 
> > > > This sure would be nice for easy and high performance
> > > > implementations of circular and gap buffers.
> 
> > > Please do explain,
> > >         nikolai
> 
> > OK.  I've been thinking about this stuff quite a bit while
> working on
> > my cursor package.  Let's start with a gap buffer.  The
> traditional
> > approach is to a have an array with the data before the
> cursor at the
> > beginning of the array and data after the cursor at the end
> of the
> > array.  In the middle is the "gap" and could be gigabytes
> of virtual
> > address space if you have enough control over virtual
> memory (you
> > don't in Ruby).  Something like this:
> > 
> >    A  B  C  D  E -------------- F  G  H  I
> >    0  1  2  3  4 ---the gap--- -4 -3 -2 -1
> >   ^             ^              ^          ^
> > begin          before         after      end
> > 
> > All single element operations (move cursor, read, write,
> and
> > especially insert/delete) are O(1) operations.  Compare
> this to an
> > array/string where insert/delete are O(n).
> 
> I meant, please explain how this applies to circular (?) and
> gap
> buffers, whatever you mean by "circular and gap buffers".  I
> know how a
> gap buffer works (but perhaps other people on this list
> don't, so your
> explanation has hopefully not fallen on muted ears).
> 
> The operations are not all O(1).  Insert and delete are still
> O(n).  It
> can, however, be reasoned that for normal use these
> operations will
> perform as if O(1).  Doing random insertions/deletions will
> require O(n)
> time, though.  The move-to operation (as I refer to it) can
> be made to
> always operate in O(1) time, if the gap is only moved when an
> insert/delete is actually performed.

Sorry, I was referring to all single element operations done at
the cursor location - those are all O(1).  To move the cursor
to a random location is O(n).

> > Another way you could organize the data above would be like
> this:
> >  
> >     F   G   H   I   A   B   C   D   E
> >     0   1   2   3   4   5   6   7   8
> >   ^               ^                   ^
> > after=0      begin_end             before
> > 
> > In this case, the "gap" is what is outside the array.  If
> single all
> > single element operations on the ends of this array (push,
> pop, shift,
> > unshift) were O(1), then all our single element operations
> at the
> > cursor in this structure would also be O(1).  The problem
> is
> > shift/unshift usually aren't.  But
> > they could be by simply moving the start array pointer
> around
> > (shift looks to be O(1)).
> 
> No, it would be very strange to have both O(1) shift and push
> on an array.

Well, in the current Array implementation, push/pop are
obviously O(1), but also shift is too (judging from the
benchmarks).  I'd imagine a shift is done by just moving the
start of array pointer and not any data.  unshifts are also
O(1) as long as you don't unshift more than you shifted (then
they become O(n)).  If unshifts tried to stretch the memory
allocation to the left or realloc the array in a new space like
pushes did, you could make them always O(1) (discounting
ocassional allocation time).  Then the above scheme would work
fine for both a gap buffer and easily a circular buffer.

> A deque is a lot easier to implement using a linked
> list, but
> then you loose O(1) lookup.
> 
> A possible solution is to use a "circular gap", i.e., a gap
> that can
> span both ends of the buffer as necessary, e.g.,
> 
>         +---------+--------------------------+---------+
>         |   Gap   |          Buffer          |   Gap   | .
>         +---------+--------------------------+---------+
> 
> Allowing for such gaps in a buffer can be very beneficial for
> certain
> modification patterns for sure.  Climacs has support for this
> kind of
> gap buffering strategy through the Flexichain package.

Thats what I was trying to describe below with the circular
buffers.  It is like the conventional gap buffer, but the gap
can wraparound (can be in the middle or the ends).

> > Another approach I'm taking now is implementing this gap
> buffer by
> > using 2 arrays/strings, where one represents what's before
> the cursor
> > and one represents what's after the cursor:
> > 
> >     A   B   C   D   E
> >     0   1   2   3   4
> >   ^                   ^
> > begin               before
> > 
> >     I   H   G   F
> >     0   1   2   3
> >   ^               ^
> >  end            after
> > 
> > I store the "after" array/string in reverse so that all
> operations at
> > the cursor occur at the ends of one or both of these
> arrays/strings.
> 
> > I haven't seen either of these approaches before.  Has
> anybody else
> > done them?
> 
> This is more or less typical of modifications of lists in,
> for example,
> Haskell, albeit I haven't seen it being used in a text
> editor.  The
> problem with this solution is that you'll have to modify two
> strings/arrays whenever you move the cursor, but
> pushing/popping is
> efficient, so it's quite a good solution (for small buffers).
> 
> > For implementing a circular buffer, the first two
> implementations
> > above can be easily adapted by removing the begin, end, or
> begin_end
> > indices and allowing to wrap around or pass through them.
> > Implementing a circular buffer using ideas from the last
> > implementation above is a little trickier, but I think I
> have a
> > solution.
> 
> Well, why not just use a circular list?

That will be one of the implementations in Cursor::Circular
eventually.  Along with others.

> People seem to be
> forgetting
> about linked lists lately.  They do have their applications
> you know
> :-).

My application started with a general parser.  I wanted a good
cursor API for the character stream and the token stream.  I've
gone overboard on this cursor stuff.

> > This is just a sampling of implementations that my next
> cursor
> > release will provide.  There are many more possibilities
> > (linked-lists, hybrids, trees, etc) that I probably won't
> get
> > to yet.
> 
> It's nice to see that someone is taking an interest in this
> kind of
> stuff.  String and Array are great for many applications, but
> perform
> terribly for certain kinds of behavioral patterns and are in
> no way a
> universal solution.  I believe Ruby would benefit by having
> more data
> structures that would perform gracefully for concatenations,
> e.g., ropes
> from STL or something similar,

Thanks.  This thing I'm doing is STL on a lot of steriods.  I
think it more generally gives a full API into any sequential
data structure you can think of - array, string, IO, buffered
IO, linked lists (double and single linked), gap buffer, linked
lists of fixed size blocks, various circular buffers, a
reversal of an of these, a concatenation of any of these, etc. 
A nice feature that is missing from many of these external
iterator API's that mine has is ability to save/restore the
position of a cursor (the thing holding the position is in fact
another cursor).



		
____________________________________________________ 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com