On Wed, 7 Sep 2005, ruby / danb64.com wrote: > What I'm trying to accomplish is this: I am processing a large > number of items (almost 100,000 rows) of data and trying to find > the duplicate items. [\n inserted by hgs] If you wish to remove duplicate items, read about uniq ... > I create an MD5 hash based on all of the > elements within each row. Then I check to see if the MD5 value > already exists in the list, and if it does, I know the item is a Which is basically a search. You may find in practice that for largish number of items, it is quicker to search a Hash or a Set than an array. > duplicate. If not, then I add it to the list. Very few of them [...] > searching and a lot of inserting going on. For that reason, it > won't be acceptable to re-sort it each time I insert an item. and Hashes don't get sorted, so they are more suited to this. > > Thanks in advance, Other speed tips I've been gathering are at: http://www.eng.cse.dmu.ac.uk/~hgs/ruby/performance/ you'll see I was trying to solve a similar problem.... > > Dan > > Hugh