On Feb 9, 2006, at 7:48, chrisjroos / gmail.com wrote: > Since then it has been running for a further 12 hours trying to use > that index to obtain likely matches for the same 3000 items; i.e. for > each of the 3000 items I am trying to get the best matches from the > index (using find related). > > Should I even bother waiting for it to finish or should I be > investigating something else to achieve similar results? Can't comment on the time it takes, but the data you're using doesn't seem particularly suited to LSI, in my opinion (and this sort of thing is my occupation these days). LSI's not magic - what it's doing is taking advantage of the statistical properties of language. So it needs two things to work well: a relatively large set of words compared to the number of items, and the items should be (more or less) standard language. Obviously I don't know exactly what the product names are, but as a class, product names don't strike me as fitting those constraints very well. Firstly because I expect them to be fairly short (5-6 words, tops?), and secondly because they lack a lot of the syntax and semantic relations that you'd find in a sentence (nominals don't have very much internal structure, in general). Other approaches that might be promising might be standard word/ document search (like ferret, already mentioned), or a language model approach, which works using relative frequencies of words. In the power tool domain, for instance, "grit" might correlate highly with "sander", and so you could say that anything with "grit" in it is related to sanding. That said, I'm not aware of any Ruby libraries which implement this sort of thing, so if you wanted to stick with Ruby, you'd be doing it yourself (it's not a particularly sophisticated approach, though, so it likely wouldn't be that hard). matthew smillie. ---- Matthew Smillie <M.B.Smillie / sms.ed.ac.uk> Institute for Communicating and Collaborative Systems University of Edinburgh