Igal Koshevoy wrote:
> Janus Bor wrote:
>> No, I didn't try it and it might actually work: Every sequence has a 
>> size of ~1kb, so 50 000 sequences would probably be around 50mb. But 
>> getting all this data will take hours, so I need to implement a system 
>> that will not lose all data if the program is terminated abnormally.
>>   
> Here are some simple alternatives for persisting and retrieving your 
> data in the order I'd recommend them based on what you've described so far:
> 
> 1. PStore standard library: Put your objects into a magical hash, that's 
> automatically persisted to a file. Probably the quickest and easiest 
> solution. See 
> http://www.ruby-doc.org/stdlib/libdoc/pstore/rdoc/classes/PStore.html

PStore writes the whole file at once, not incrementally. Not really what 
OP is looking for, IMO.

> 2. Lightweight SQL database: Maybe store sequences in SQLite as BLOBs. 
> Probably the best long-term solution, but will require you to work 
> harder to transform data to and from storage. See 
> http://sqlite-ruby.rubyforge.org/

Not clear that would be better than files. Maybe so, if the individual 
strings are short. Would be interesting to get some benchmarks on this 
question.

> 3. Marshall core class: Dump objects to and from strings, and then 
> files. Useful if you need something more than PStore, but still want to 
> persist objects directly. See http://ruby-doc.org/core/classes/Marshal.html

PStore uses Marshall, so it's odd to say that Marshall is more than PStore.

If you're looking for a way to manage marshalled (or string or yaml...) 
data in multiple files, using file paths as db keys, look no further than:

http://raa.ruby-lang.org/project/fsdb/

I think the Set/Hash + many files option is best here, though.

-- 
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407