On Wed, 7 Sep 2005, ruby / danb64.com wrote:

> What I'm trying to accomplish is this:  I am processing a large
> number of items (almost 100,000 rows) of data and trying to find
> the duplicate items. [\n inserted by hgs]

If you wish to remove duplicate items, read about uniq ...

> I create an MD5 hash based on all of the
> elements within each row.  Then I check to see if the MD5 value
> already exists in the list, and if it does, I know the item is a

Which is basically a search.  You may find in practice that for
largish number of items, it is quicker to search a Hash or a Set
than an array.

> duplicate.  If not, then I add it to the list.  Very few of them
         [...]
> searching and a lot of inserting going on.  For that reason, it
> won't be acceptable to re-sort it each time I insert an item.

and Hashes don't get sorted, so they are more suited to this.
>
> Thanks in advance,

Other speed tips I've been gathering are at:

http://www.eng.cse.dmu.ac.uk/~hgs/ruby/performance/

you'll see I was trying to solve a similar problem....
>
> Dan
>
>
         Hugh