I am trying to discover similar files to reduce redundancy on a large project. The 'Text' gem works well for this, but even short strings take a long time. Large strings - like 20k HTML files - take an amazing amount of time. My script looks like this: require 'rubygems' require 'text' a = file_one b = file_two puts Text::Levenshtein.distance(a, b) It would be nice to be able to short-circuit the comparison when the distance crossed a max value, but that isn't possible. It would be even BETTER to be able to compare long stings like with PHPs similar_text, which has nice percentage output. I have to do a lot of comparisons, about 40 million. Is there something already written?