James Dinkel wrote:
> I need to store some information with my ruby program and I am not sure
> on what would be the best method.  I'm mostly concerned about what would
> be the most efficient use of cpu resources.
> 
> Basically, I will have a list of names each belonging to one of 5
> categories.  Sort of like this:
> 
> Cat1
> -name1
> -name2
> -name3
> -etc...
> 
> Cat2
> -name4
> -name5
> -name6
> -etc...
> 
> Cat3
> -name7
> -name8
> -name9
> -etc...
> 
> There will be hundreds of names, evenly divided between the categories.
> But each name will go in only one category, there is no relation between
> categories or anything like that.  All the information will be
> completely rewritten once a day and then read several times throughout
> the day.
> 
> My choices for storage are an sqlite database (using ActiveRecord), a
> flat text file of my own design, a YAML file, or an XML file.

IMHO Databases are best when you have concurrent access to data being 
modified regularly and want to enforce constraints during concurrent 
write accesses.

In your case, the data is mostly static and constraints are easily 
handled outside the storage layer (you overwrite all data with another 
consistent version in one pass). I'd advise to use the simplest storage 
method, which probably is a YAML dump of an object holding all this data.

Marshall.dump/load is an option too. It may be faster than YAML if this 
matters to you (I've not benchmarked it, so you better do it if you need 
fast read/write). It's not human-readable, so it can be a drawback when 
debugging.

That was the code/integration complexity side of your problem.

For the performance side of the problem :

If you dump your data in a temporary file and then rename it to 
overwrite the final destination, you can use a neat hack for long 
running processes needing fresh data: you can design a little cache that 
checks the mtime of the backing store (the final destination) on read 
accesses and reload it when it changes.
mtime checks are cheap and simple to code and if the need arise for 
really high throughput you can minimize them by coding a TTL logic.

Lionel