Hello - A friend and I have been working on a Ruby implementation of a bayesian spam filter as described in Paul Graham's, Plan For Spam. It's fully functional, but I've been trying to squeeze more performance out of it as it's quite slow ATM(15 minutes to run across 20 megs of email). By using profile and rbprof, I've determined that our tokenizer method is the main source of slowness. After some careful benchmarking, I've found this to be the problem. # This is ran about a million times h = Hash.new(0) data.scan(iptok).each do |tok| h[tok] += 1 end At first, I though I could do something like this: # This is ran about a million times h = Hash.new(0) data.scan(iptok).each do |tok| h[tok].succ end But I realized that it doesn't modify the final value; however, I did notice that it ran twice as fast than the former example. So, my question is, does the assignment in the first example really have that much overhead? If so, is there any way to do the first example using Inline::C or something similar? Thanks in advance, Travis