Eric Mahurin wrote:

> Nikolai Weibull wrote:

> > Eric Mahurin wrote:
> > 
> > [...]
> > 
> > > This sure would be nice for easy and high performance
> > > implementations of circular and gap buffers.

> > Please do explain,
> >         nikolai

> OK.  I've been thinking about this stuff quite a bit while working on
> my cursor package.  Let's start with a gap buffer.  The traditional
> approach is to a have an array with the data before the cursor at the
> beginning of the array and data after the cursor at the end of the
> array.  In the middle is the "gap" and could be gigabytes of virtual
> address space if you have enough control over virtual memory (you
> don't in Ruby).  Something like this:
> 
>    A  B  C  D  E -------------- F  G  H  I
>    0  1  2  3  4 ---the gap--- -4 -3 -2 -1
>   ^             ^              ^          ^
> begin          before         after      end
> 
> All single element operations (move cursor, read, write, and
> especially insert/delete) are O(1) operations.  Compare this to an
> array/string where insert/delete are O(n).

I meant, please explain how this applies to circular (?) and gap
buffers, whatever you mean by "circular and gap buffers".  I know how a
gap buffer works (but perhaps other people on this list don't, so your
explanation has hopefully not fallen on muted ears).

The operations are not all O(1).  Insert and delete are still O(n).  It
can, however, be reasoned that for normal use these operations will
perform as if O(1).  Doing random insertions/deletions will require O(n)
time, though.  The move-to operation (as I refer to it) can be made to
always operate in O(1) time, if the gap is only moved when an
insert/delete is actually performed.

> Another way you could organize the data above would be like this:
>  
>     F   G   H   I   A   B   C   D   E
>     0   1   2   3   4   5   6   7   8
>   ^               ^                   ^
> after=0      begin_end             before
> 
> In this case, the "gap" is what is outside the array.  If single all
> single element operations on the ends of this array (push, pop, shift,
> unshift) were O(1), then all our single element operations at the
> cursor in this structure would also be O(1).  The problem is
> shift/unshift usually aren't.  But
> they could be by simply moving the start array pointer around
> (shift looks to be O(1)).

No, it would be very strange to have both O(1) shift and push on an
array.  A deque is a lot easier to implement using a linked list, but
then you loose O(1) lookup.

A possible solution is to use a "circular gap", i.e., a gap that can
span both ends of the buffer as necessary, e.g.,

        +---------+--------------------------+---------+
        |   Gap   |          Buffer          |   Gap   | .
        +---------+--------------------------+---------+

Allowing for such gaps in a buffer can be very beneficial for certain
modification patterns for sure.  Climacs has support for this kind of
gap buffering strategy through the Flexichain package.

> Another approach I'm taking now is implementing this gap buffer by
> using 2 arrays/strings, where one represents what's before the cursor
> and one represents what's after the cursor:
> 
>     A   B   C   D   E
>     0   1   2   3   4
>   ^                   ^
> begin               before
> 
>     I   H   G   F
>     0   1   2   3
>   ^               ^
>  end            after
> 
> I store the "after" array/string in reverse so that all operations at
> the cursor occur at the ends of one or both of these arrays/strings.

> I haven't seen either of these approaches before.  Has anybody else
> done them?

This is more or less typical of modifications of lists in, for example,
Haskell, albeit I haven't seen it being used in a text editor.  The
problem with this solution is that you'll have to modify two
strings/arrays whenever you move the cursor, but pushing/popping is
efficient, so it's quite a good solution (for small buffers).

> For implementing a circular buffer, the first two implementations
> above can be easily adapted by removing the begin, end, or begin_end
> indices and allowing to wrap around or pass through them.
> Implementing a circular buffer using ideas from the last
> implementation above is a little trickier, but I think I have a
> solution.

Well, why not just use a circular list?  People seem to be forgetting
about linked lists lately.  They do have their applications you know
:-).

> This is just a sampling of implementations that my next cursor
> release will provide.  There are many more possibilities
> (linked-lists, hybrids, trees, etc) that I probably won't get
> to yet.

It's nice to see that someone is taking an interest in this kind of
stuff.  String and Array are great for many applications, but perform
terribly for certain kinds of behavioral patterns and are in no way a
universal solution.  I believe Ruby would benefit by having more data
structures that would perform gracefully for concatenations, e.g., ropes
from STL or something similar,
        nikolai

-- 
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}