On Wed, 07 Feb 2007 04:05:35 +0900, Jason Frankovitz wrote: > Giles Bowkett wrote: >>> First of all Giles and Ken, thanks for your answers. It sounds like a >>> Bayesian approach won't work for what I want to do. This same gem has >>> another classifer inside it, called Classifier::LSI which does latent >>> semantic indexing. I don't know much about it yet other than it's not as >>> fast or as small as a Bayesian classifier. However, would it be more >>> suited to supporting a "none of the above" feature? >>> >>> Or would you recommend something entirely different? >> >> Well, a latent semantic indexer is a whole different thing. I know of >> a company that built a search engine with latent semantic analysis. If >> you search it for naked pictures of Britney Spears -- just as a stupid >> example -- it'll also ask you if you want to hear her music or if >> you're interested in naked pictures of Lindsay Lohan as well. Latent >> semantic indexers are a very smart technology but I think they require >> **extremely** large data sets to be useful. They compare patterns of >> linkage to identify things which must have some latent semantic >> connection, that is to say, words that are different but mean similar >> things. There are very few problems for which latent semantic analysis >> **isn't** overkill. >> > > Well, within the not-too-distant future, we'll be handling a sizable > dataset so LSI might make sense after all. This would be for a system > we're building that's doing something quite cool but I can't shout all > the details from the rooftops just yet :) Would it be all right for me > to give you specifics via email? I'd be happy to edit the Ruby-germane > portions of our offline conversation and post them back onto the forum. > My email is jason at seethroo dot us. I suggest learning about machine learning techniques in general before you try to do *anything* quite cool that you can't shoud from the rooftops just yet. I recommend "Machine Learning" by Tom Mitchell[1]. --Ken [1] http://www.cs.cmu.edu/~tom/mlbook.html -- Ken Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/