On Wed, Dec 11, 2013 at 10:58 AM, Xavier Noria <fxn / hashref.com> wrote:
> Doing this is tricky, the robustness of a regexp approach depends on what
> you can assume about the input. For example, in a programming language
> escaping a quote \" would be valid but unsupported, or in English
> apostrophes could be taken as single quotes.
>
> A regexp solution that is broken in those scenarios but works for the easy
> cases is:
>
>     ("|')((?:(?!\1).)*)\1
>
> The regexp says: if you match either " o ', then countinue matching as long
> as you do not find the matched quote, and until you find the closing quote
> (needed because you could reach end of file with an unbalanced quote).
>
> The second group has the string without quotes.

Interesting solution!  I also tried

("|')([^\1]*)\1

which looked fine initially

irb(main):025:0> "foo 'bar' \"baz\" buz".scan(/("|')([^\1]*)\1/).map(&:last)
=> ["bar", "baz"]

but broke later:

irb(main):030:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')([^\1]*)\1/)
=> [["'", "bar' \"baz\" buz \"bongo"]]

where your solution still works:

irb(main):031:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')((?:(?!\1).)*)\1/)
=> [["'", "bar"], ["\"", "baz"], ["\"", "bongo's kongo"]]

However, we can also use non greediness to achieve the same:

irb(main):032:0> "foo 'bar' \"baz\" buz \"bongo's kongo\"".scan(/("|')(.*?)\1/)
=> [["'", "bar"], ["\"", "baz"], ["\"", "bongo's kongo"]]
irb(main):033:0> "foo 'bar' \"baz\" buz \"bongo's
kongo\"".scan(/("|')(.*?)\1/).map(&:last)
=> ["bar", "baz", "bongo's kongo"]

Adding some escaping capabilities we get ("|')((?:\\.|(?!\1).)*)\1

irb(main):038:0> "foo 'bar' \"baz\" buz \"bongo's kongo\" gingo said
\"foo \\\" bar\" yes".scan(/("|')((?:\\.|(?!\1).)*)\1/).map(&:last)
=> ["bar", "baz", "bongo's kongo", "foo \\\" bar"]

;-)

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/