Hi --

On Sat, 7 Jan 2006, James Edward Gray II wrote:

> On Jan 6, 2006, at 6:08 PM, Richard Livsey wrote:
>
>> I want to split a string into words, but group quoted words together such 
>> that...
>> 
>> some words "some quoted text" some more words
>> 
>> would get split up into:
>> 
>> ["some", "words", "some quoted text", "some", "more", "words"]
>> 
>> So far I'm drawing a blank on the 'Ruby way' to do this and the only 
>> solutions I can think of are turning out to be fairly ugly.
>> 
>> Any advice would be great. Thanks in advance.
>
> I agree that CSV is the way to go, but here's a direct attempt:

Me too (end of disclaimer :-)


>>> example = %Q{some words "some quoted text" some more words}
> => "some words \"some quoted text\" some more words"
>>> example.scan(/\s+|\w+|"[^"]*"/).
> ?>         reject { |token| token =~ /^\s+$/ }.
> ?>         map { |token| token.sub(/^"/, "").sub(/"$/, "") }
> => ["some", "words", "some quoted text", "some", "more", "words"]

I think you could do less work:

   example.scan(/"[^"]+"|\S+/).map { |word| word.delete('"') }

(Or am I overlooking some reason you'd want to capture sequences of
spaces?)

I changed the \w+ to \S+ (and moved it after the | to avoid having it
sponge up too much) in case the words included non-\w characters.

I guess with zero-width positive lookbehind/ahead one could do it
without the map operation.


David

-- 
David A. Black
dblack / wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!
http://www.manning.com/books/black