I think I'm going to end up answering my own question here. I tried a bloom filter kind of approach and various grep schemes. None of which scaled well to large data sets. So far my best solution has been to use Ferret and index on trigrams instead of unigrams like I was doing before. That sped up my search by ~100x. I'm open to other ideas if anyone has them, but for now this should be fast enough. -- Posted via http://www.ruby-forum.com/.