Hi y'all,

This may end up being regarded as an incendiary posting, but it's not
meant to be. This is just an observation from a relative Ruby (in
general) and Rails (in particular) newb.

So I'm beavering away at my lovely little start-up desk and really
rather enjoying Ruby (in between the moments of utter frustration :)
and I start coding up some ETL processes to load and merge masses of
data into my bouncing baby web-system. And all is relatively good
until I get to my first tricky merge process where I have to
disambiguate names and otherwise harmonize my various data sources.

The process takes over 12 hours to run using ActiveRecord to provide
my DB access. For 5500 records.

I tweak DB indices. I get out CachedModel. I read a lot of code and
eat heaps of metaprogrammed object spaghetti. I run benchmarks. And I
finally conclude that my access patterns are totally defeating the
metaprogramming and requiring excessive DB traffic- even though I
can't really prove it.

So eventually I rewrote the program using a different language (which
I don't mention to avoid starting a flame-war - I could have used Ruby
on top of the MySQL interfaces) with a cache strategy that is better
suited to the DB access pattern.

The new process takes 55 *seconds*.

The moral of the story: there isn't one really. If I was 100% sure
that I wouldn't need to re-run the data (either because of
undiscovered bugs or b0rk3n data from the vendor) the  12 hour run
would have been an efficient use of my time, probably. But performance
does matter when it has an impact on the amount of time I have to
spend waiting for critical-path processes. If there is a moral, it is
simply: know your tools. And that community excitement doesn't
substitute for good documentation.

Anyway, I am still happy with both Ruby and Rails. But this was a
lovely opportunity to re-learn a lesson I've learned too many times
before.

david rush
--
http://cyber-rush.org/drr -- a very messy web^Wconstruction site