phlip wrote:
> Janus Bor wrote:
> 
>> I'm pretty new to Ruby and programming in general. Here's my problem:
>> 
>> I'm writing a program that will automatically download protein sequences
>> from a server and write them into the corresponding file. Every single
>> sequence has a unique id and I have to eliminate duplicates. However, as
>> the number of sequences might exceed 50 000, I can't simply save all
>> sequences in a hash (with their id as key)
> 
> How do you know that? Did you try it, as an experiment?

No, I didn't try it and it might actually work: Every sequence has a 
size of ~1kb, so 50 000 sequences would probably be around 50mb. But 
getting all this data will take hours, so I need to implement a system 
that will not lose all data if the program is terminated abnormally.

Joel VanderWerf wrote:
> Janus Bor wrote:
>> 
>> Would this method be more efficient? Is there a more elegant way? Also,
>> can Ruby handle arrays/hashes of this size?
> 
> It's not so bad to use true as a hash value. But if it bothers you,
> there is the Set class, which is really a hash underneath, but the
> interface is set-membership rather than associative lookup:
> 
> require 'set'
> 
> s = Set.new
> 
> s << 123
> s << 456
> 
> p s.include?(456) # ==> true
> p s.include?(789) # ==> false

Thanks, that's exactly what I was looking for! I didn't know set 
basically works like a hash without a key...
-- 
Posted via http://www.ruby-forum.com/.