On 10/26/07, Ruby Quiz <james / grayproductions.net> wrote:
> The task is to implement data structures for efficiently editing text. One
> common data structure is a gap buffer:
>
>         http://en.wikipedia.org/wiki/Gap_buffer
>
> Other options to consider include: ropes (see quiz #137), linked lists, simple
> strings, or a multi-level combination of data structures (i.e. for lines vs.
> characters in a line). There are many data structures that may work efficiently
> with simple editing operations, but not all of those data structures will work
> well for more complex functionality.

Since I haven't seen any solutions yet, I'd thought I'd elaborate on
some possible solutions I was thinking of:

* conventional gap buffer implemented with a string.  The string would
have 3 parts (a couple indices might be used for the markers): before
the cursor, the "gap" (containing garbage), and after the cursor.
When you have no more space in the gap, copy the data to a new string
using a larger gap.  Think about how the gap should be sized.  I
believe most text editors use a gap buffer.

* gap buffer implemented with two strings.  Represent the text before
the cursor with one string and the text after the cursor with another
string that is reversed.  All operations around the cursor correspond
to operations at the ends of these strings - O(1).  Unlike the
conventional gap buffer, you don't have to deal with reallocation (gap
filing up).  Instead let ruby do its string reallocation when a
string's capacity fills up.  The memory efficiency might not be as
good as the conventional gap buffer, though.

* gap buffer with deferred gap movement.  The "gap" is really only
useful when making edits (insertions/deletions).  For simple cursor
movements, you don't have to "move" the gap immediately.  The cursor
position could be changed independently of the gap.  But, you'll have
to synchronize the gap to the cursor position once an edit is needed.
Random access cursor movement can also be done in O(1) time, but
you'll still have to pay for it if you immediately make an edit at
that new position.

* use a rope implementation (quiz #137).  You could start with the
inefficient string solution I gave earlier and operate on a rope
instead (making new ones if using an immutable implementation).  It
may be difficult to make the solution as fast as the gap buffers for
simple stuff (likely slower by O(log(n))), but ropes will probably be
more powerful for complex functionality.  I believe ropes were
invented for text editing (something called Coral I believe).

* linked lists.  The text could be represented as a linked list of
characters.  The cursor position might correspond to one of the linked
list nodes or links.  All operations around the cursor should be O(1).
 An advantage over gap buffers is that you can efficiently implement
multiple cursors (possibly from different editing panes or even
different users).  The downside is the memory usage is higher and
random access might be higher.

* use a two level data structure.  One data structure could be used
for lines and another for the characters within each line.  All of the
above data structures apply to both.  For managing lines, you'd need
to use arrays instead of strings in the data structure since the
element is a line instead of a character.  Using a two-level data
structure will give better performance for line operations (i.e.
up/down).

Enjoy!

Eric