The problem:  Read a structured file (the details are irrelevant)
and generate a program which, when run, would produce the original
file.  Modify the program, re-run it, and you have a relatively
easy way to change the original structured file.

For various reasons, C++ seemed like the way to go when I first did
this.  It worked, until I was handed structured files that were 4 MB
long and the generated C++ program went on for some 600,000 lines
and g++ crashed trying to compile it.

I needed a scripting language.  I should have realized that a long
time ago.

I tried Ruby.  Ruby had a political advantage:  It wasn't Perl.
The problem with Perl was that it's been around long enough to have
a reputation, deserved or not, for being hard to read.  I don't
mind Perl, but my boss was visibly relieved to hear I was using
something else.

I could have tried Python, but the Ruby book (Programming Ruby)
was shorter.  Ruby had some other, less tangible, advantages
which I won't go into.

If what follows has already been discussed, let me know.  I looked,
but I couldn't find anything.

I wrote a Ruby script generator that worked for small input files.
It worked for large input files, too, but it was slow, very slow.
It took 18 minutes to reproduce a 4 MB structured file.  My
generated C++ program couldn't do it at all, but 18 minutes was
still too long.

When I took a closer look, I realized that most of those 18 minutes
were spent parsing the script, and most of that time was spent in
list_append.

The problem, I finally figured out, was the arrays.  The generated
script had lots of them, and many of them were big, some with more
than 10,000 elements.

At this point, I realized that mine wasn't your typical Ruby
application.  Normal, handwritten programs don't have arrays that
huge.

What to do...  I could give up and try, say, Python, or I could see
if there was a way to tweak Ruby.

I didn't have time to learn Python.

Back to Ruby...

list_append (from 1.6.8) looks like this:

---
static NODE*
list_append(head, tail)
     NODE *head, *tail;
{
     NODE *last;

     if (head == 0) return NEW_LIST(tail);

     last = head;
     while (last->nd_next) {
	last = last->nd_next;
     }

     last->nd_next = NEW_LIST(tail);
     head->nd_alen += 1;
     return head;
}
---

Following nd_next links for each of n array elements takes
O(n**2) time.  A tail pointer would have been useful, but with
the node structure being as compact as it is, there was no room
for one.

Desperate times called for desperate measures.  I don't expect
anyone to like what I did.  I'm not crazy about it myself.

I added another word to the node:

---
typedef struct RNode {
     ...
     union {
	struct RNode *node;
     } u4;
} NODE;

....

#define nd_tail  u4.node

....

static NODE*
list_append(head, tail)
     NODE *head, *tail;
{
     NODE *last;

     if (head == 0) return NEW_LIST(tail);

     if (head->nd_tail) {
	last = head->nd_tail;
     } else {
	last = head;
     }

     head->nd_tail = last->nd_next = NEW_LIST(tail);
     head->nd_alen += 1;
     return head;
}
---

It worked.  The monster script that generated 4 MB of structured
stuff ran in under six minutes instead of 18.  This is almost
acceptable, and it's a lot better than a g++ compilation that
takes 15 minutes to not work at all.

My immediate problem seems to be solved, but I don't want to make
my company dependant on a patched version of Ruby forever, so I'd
like some feedback.

I can see this going one of several ways:

1.  Array parsing should be improved.  Adding a word to the node
structure is crude and unimaginitive, but it may be necessary.

2.  Array parsing should be improved, and there's a better way
to fix it than by bloating the node size by 33%.

3.  Array parsing could be improved, but don't expect it to
happen any time soon.  If you insist on having 10,000-element
arrays, you might consider finding another language.

If the answer is #3, I don't mind.  I can deliver what I've got
and take the time to try something else.  On the other hand,
I'd be a fool to assume anything.

Thanks for taking the time to read this.

Opinions...?

Louis Krupp