2007/7/27, David Rush <kumoyuki / gmail.com>: > So I'm beavering away at my lovely little start-up desk and really > rather enjoying Ruby (in between the moments of utter frustration :) > and I start coding up some ETL processes to load and merge masses of > data into my bouncing baby web-system. And all is relatively good > until I get to my first tricky merge process where I have to > disambiguate names and otherwise harmonize my various data sources. > > The process takes over 12 hours to run using ActiveRecord to provide > my DB access. For 5500 records. > > I tweak DB indices. I get out CachedModel. I read a lot of code and > eat heaps of metaprogrammed object spaghetti. I run benchmarks. And I > finally conclude that my access patterns are totally defeating the > metaprogramming and requiring excessive DB traffic- even though I > can't really prove it. > > So eventually I rewrote the program using a different language (which > I don't mention to avoid starting a flame-war - I could have used Ruby > on top of the MySQL interfaces) with a cache strategy that is better > suited to the DB access pattern. > > The new process takes 55 *seconds*. Makes me wonder: why did you choose a different language? As you said yourself, you could have implemented the same strategy in Ruby as well. Also, it seems for 5500 records you do not need a caching strategy - you could just slurp in all the stuff into mem, do your transformations and write it back. It seems with this approach even AR would have provided sufficient performance, wouldn't it? Kind regards robert