Dan Fitzpatrick wrote:

> I am trying to build an indexing structure on some phrases. Most phrases
> will have 2 - 5 parts (words). The resulting array will be dumped into
> an index to find the matching phrases. I don't want to do wildcard
> searching on the resulting array to find the phrase.
> 
> I would like to turn "This is some text" into
> 
> ["This",
> "This is",
> "This is some",
> "This is some text",
> "is",
> "is some",
> "is some text",
> "some",
> "some text",
> "text"]
> 
> The order of the resulting array doesn't matter. When someone searches
> for "is some" or "some text", I want it to find this phrase. I don't
> want a search for "is text" to find this phrase though.
> 
> My solution so far can find all but the middle elements. In this case,
> "is some". But when the original phrase has more parts, then more middle
> parts are not added to the array.
> 
> text = "This is some text"
> #=> "This is some text"
> ws = ''; text.split(/\W/).collect{|w| ws = (ws+' '+w).strip; ws}
> #=> ["This", "This is", "This is some", "This is some text"]
> ws = ''; text.split(/\W/).reverse.collect{|w| ws = (w+' '+ws).strip; ws}
> #=> ["text", "some text", "is some text", "This is some text"]
> text.split(/\W/).collect{|w| w}
> => ["This", "is", "some", "text"]
> 
> Is there an better Ruby way to do this? Or is there a better data
> structure for retrieving a word or an exact phrase within a
> phrase/sentence without wild-carding the search.
> 
> Thanks,
> 
> Dan

I think what you want is a suffix tree.

http://en.wikipedia.org/wiki/Suffix_tree
http://www.google.com/search?q=suffix+tree&ie=UTF-8&oe=UTF-8

Luca