Hello --

[This comes from some peripheral playing around having to do with the
various String#each threads, but (I promise! :-) it's not directly on
that topic.]

This is something I've been discussing and investigating on #ruby-lang
with Martin Chase, Holden Glova, Michael Granger.

According to the docs I've seen, String#split can take either a string
or a regex as the separator/delimiter argument.  However -- very
surprisingly to me -- it turns out that if you provide a string:

  str.split(aString) ...

and if aString is longer than one character, then aString is
automatically converted to a regex.  Examples:

  One-char strings, treated as strings:

    irb(main):001:0> "abc.+def".split("e")
    ["abc.+d", "f"]
    irb(main):002:0> "abc.+def".split(".")
    ["abc", "+def"]

  strings of >1 char, converted to regexes (!)

    irb(main):003:0> "abc.+def".split(".e")
    ["abc.+", "f"]
    irb(main):004:0> "abc.+def".split(".+")
    []

This means also that strings without any regex special characters are
really "splitting on a string" only by coincidence.  They're really
splitting on a regex which happens to provide the results one would
have expected from splitting on a string.  Thus, for example:

  irb(main):003:0> "here there and everywhere".split("er")
  ["h", "e th", "e and ev", "ywh", "e"]

is really treating the string arg as a regex, as shown by:

  irb(main):005:0> "here there and everywhere".split(".r")
  ["h", "e th", "e and ev", "ywh", "e"]

producing the same results.

Any insights on why #split does this?  I found it quite surprising
when I discovered it, and I don't know of anywhere where it's
documented as working this way.


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav