Hello Hugh,
I'd propose modifying your main logic as like follows:
require 'benchmark'
include Benchmark
puts measure { new_table = TableMaker.new("hugh.csv",
"update_tables.sql") }
puts measure { new_table.update_database() }
puts measure { old_table = TableMaker.new("hugh.csv.old") }
puts measure { new_table.make_cards("cards.out") }
puts measure { new_table.make_cards("new_cards.out",
new_table.diff_students(old_table)) }
and then running it with a reduced test set. That should give you a hint
as to where time is spent. I have read the code you posted, but cannot
find a performance hog in it. Perhaps you meant to say 'huge.csv' instead
of 'hugh.csv' ? How many students are there ? How many courses ? How many
average courses per student ?
Also, I assume you know that fetching the image files from http can
potentially be very slow. To speed that up, you could parallelize the
process by using a queue, a few workers and a stub image that you can
return.
Or you can of course just wait for the machine ;) .. Too bad Moores law
doesn't say that you actually get a new machine every 18 months, only
that it is available.
best greetings,
kaspar