Dan Fitzpatrick wrote: > I am trying to build an indexing structure on some phrases. Most phrases > will have 2 - 5 parts (words). The resulting array will be dumped into > an index to find the matching phrases. I don't want to do wildcard > searching on the resulting array to find the phrase. > > I would like to turn "This is some text" into > > ["This", > "This is", > "This is some", > "This is some text", > "is", > "is some", > "is some text", > "some", > "some text", > "text"] > > The order of the resulting array doesn't matter. When someone searches > for "is some" or "some text", I want it to find this phrase. I don't > want a search for "is text" to find this phrase though. > > My solution so far can find all but the middle elements. In this case, > "is some". But when the original phrase has more parts, then more middle > parts are not added to the array. > > text = "This is some text" > #=> "This is some text" > ws = ''; text.split(/\W/).collect{|w| ws = (ws+' '+w).strip; ws} > #=> ["This", "This is", "This is some", "This is some text"] > ws = ''; text.split(/\W/).reverse.collect{|w| ws = (w+' '+ws).strip; ws} > #=> ["text", "some text", "is some text", "This is some text"] > text.split(/\W/).collect{|w| w} > => ["This", "is", "some", "text"] > > Is there an better Ruby way to do this? Or is there a better data > structure for retrieving a word or an exact phrase within a > phrase/sentence without wild-carding the search. > > Thanks, > > Dan I think what you want is a suffix tree. http://en.wikipedia.org/wiki/Suffix_tree http://www.google.com/search?q=suffix+tree&ie=UTF-8&oe=UTF-8 Luca