------art_117148_5280709.1147875390932
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

So much depends on the application. And it's important to decide whether or
not we're willing to write fairly specialized implementations for particular
application classes. I've done a fair bit of that and I'm not afraid to do
it, but it's axiomatically an inefficient process. But if you go the
opposite direction, toward creating a general-purpose lightweight data
manager, you might as well just use sqllite. If we put our heads together,
we ought to be able to find a middle ground: what, for example, would be a
generic architecture for a lightweight database? What are the essential
pieces and the optional pieces, and precisely what interfaces do they
expose? Done right, that would enable optimizations for particular apps by
swapping out components.

The two big problems with journaling data-storage are: collecting the
garbage, and rebuilding the indices when the run starts (because there's no
graceful place to put them in the journal). One of the apps I often deal
with is indexed searching on logfiles. A journalling arena implemented in
virtual memory works well here because garbage collection is automatic. But
fast index generation would be a huge plus. These apps generally index whole
words, Lucene-style.

Another thing I do a lot is LDAP directories augmented with presence data.
Here, journalling is a tougher fit because most of the data is not garbage,
but some of it becomes garbage at very high rates. The indexing is also a
serious problem because LDAP permits partial-word searches.

Skiplists look really interesting, thanks for pointing them out.

On 5/17/06, Jamey Cribbs <cribbsj / oakwood.org> wrote:
>
> Francis Cianfrocca wrote:
> > I've been interested for a long time in doing a fast system for
> > datasets up
> > to a few million rows with journalled updates. (I wrote the
> journal-arena
> > already, I use it for fast access to log entries.) If you like, I'd be
> > happy
> > to help out also.
> Sure thing!
>
> I've been doing some thinking about Logan's idea of offering a choice of
> multiple backends.  That sounds intriguing.  I could see that idea,
> coupled with something like qdbm, to provide different options for the
> user.
>
> The default backend could remain the plain-text files that KirbyBase
> currently uses.  But, if the user has qdbm (as an example) installed,
> KirbyBase could optionally use that as the backend resulting in a
> noticeable (I would hope) performance increase.
>
> By the way, does anyone have experience using qdbm?  It looks like the
> library is well maintained and there is plenty of documentation.  I'm
> curious if it is reliable and does it seem to scale.
>
> As far as the indexing goes, I've recently become infatuated with
> Skiplists.  In some informal testing, my very crude skiplist
> implementation seemed to be a nice balance between a hash and an array
> for general lookups.  I've also thought about trying out a B-Tree
> implementation for the indexing, with the added possibility of
> storing/accessing the indexes from disk as opposed to loading them
> completely into memory before use.  I think Logan mentioned that he had
> some thoughts in that direction.
>
> I would definitely be interested in seeing your work with journaling.  I
> was thinking of possibly using journaling in conjunction with having the
> entire table in memory.  Updates and deletes would be made directly to
> the table in memory as well as written to a journal file, so that, when
> it came time to write the table back out to disk, I would only have to
> update the table's disk file with the contents of the journal file as
> opposed to having to write out the memory-based table.  Like I said,
> this is just brainstorming.  It may not even make sense to try to keep
> the entire table in memory.
>
> Jamey
>
> Confidentiality Notice: This email message, including any attachments, is
> for the sole use of the intended recipient(s) and may contain confidential
> and/or privileged information. If you are not the intended recipient(s), you
> are hereby notified that any dissemination, unauthorized review, use,
> disclosure or distribution of this email and any materials contained in any
> attachments is prohibited. If you receive this message in error, or are not
> the intended recipient(s), please immediately notify the sender by email and
> destroy all copies of the original message, including attachments.
>
>

------art_117148_5280709.1147875390932--