Philip Rhoades wrote in post #988264:
>> Your sample code looks like it's handling numeric-style data (although I
>> realise this is just a test case for the problems you're having).
>> Integers in the range -2^30..+2^30 (or larger in on a 64-bit machine)
>> have their values encoded within the reference, so no memory allocation
>> is done.
>
>
> Are you talking about the hash key or the hash values?

Either.

> - the values in
> the real script will all be floats . .

Then they will be allocated on the heap, just like strings. I presume 
you're aware of the inherent inaccuracy of floats (in any language), and 
are OK with this.

>> 1.0/2.0 == 1.0 + 1.0/2.0 - 1.0
=> true
>> 1.0/10.0 == 1.0 + 1.0/10.0 - 1.0
=> false

>> Or, if you're handling a relatively small set of unique values, you
>> could use symbols instead of strings. Each symbol reference again
>> doesn't allocate any memory; it just points to the entry in the symbol
>> table.
>
> Not sure what you mean - example?

a = []
a[0] = :foo
a[1] = :foo
a[2] = :foo

puts a[0].object_id
puts a[1].object_id
puts a[2].object_id

>> Or you could use frozen strings and share the references.
>>
>> LABEL1 = "00".freeze
>> LABEL2 = "01".freeze
>> MAP = {LABEL1 =>  LABEL1, LABEL2=>LABEL2}
>> a = MAP["00"]
>> puts a.object_id
>> puts LABEL1.object_id
>
>
> I ran that code but I don't understand how it helps . .

It uses less memory if you have (say) millions of identical strings. It 
may help garbage collection performance, but not much else.

>> Although that's more work than symbols, it might be useful depending on
>> your use case. For example, you could replace a subset of the values you
>> see with these frozen strings (which covers the majority of the data),
>> whilst still allowing arbitrary other strings.
>
>
> Still not clear - examples?

Suppose the strings "foo" and "bar" comprise 80% of your hash keys or 
values. Then mapping them to the same frozen string means that you only 
have one instance of string "foo" and one instance of string "bar" in 
the system, instead of (say) millions of distinct strings. You can still 
use individual strings for the other 20%.

This is really an edge optimisation though, you really shouldn't need to 
be worrying about these things - if they are significant, then perhaps 
ruby is the wrong language for the problem in hand.

> The other thing that occurred to me was that on my 64-bit machine maybe
> I could run 2-3 threads for inserting into the hash table?

Noooo..... even in ruby 1.9, there is a global interpreter lock. 
Multiple threads gain you nothing really, except for threads which are 
blocked on I/O.

Even if there were not, having multiple threads contending on the same 
hash (and controlling access via, say, a mutex) would be pretty much 
guaranteed to make performance worse not better.

Regards,

Brian.

-- 
Posted via http://www.ruby-forum.com/.