Issue #5120 has been updated by Alexey Muranov.


Adam Prescott wrote:
>  
>  str.split("") already gets you the array of "letters" (as does
>  str.chars.to_a), but since you feel that str.split("") should raise an
>  error or have another return value, do you think str.split("") should
>  break existing code which uses split("") to get characters?
>  

Thanks for pointing out str.chars.to_a, but i think that it would be more natural to have a single method that would do this.
I understand that this would break existing code, i was discussing the issue not from the point of view of maintaining existing code, but from the point of view of "improving" the language, according what would look like an improvement to me.
As a person new to Ruby, i expressed my "astonishment" at the current behavior of #split, and tried to contribute to POLA.

>  What's the reasoning behind str.split("") raising an error? I can't
>  see a good reason for it. Equally, I can see no good reason for
>  treating "a".split("") the same in return value as "a".split("a"). In
>  the latter, there is more to be considered because the receiver itself
>  contains "a". Why should "a".split("") return ["", "a", ""]?

I think that #split should treat all strings equally, whether empty or not.
Maybe i've missed something (then please point me to the explanation), but i do not see how the treatment of empty and non-empty strings can be particular cases of a general rule.
What is the general rule, which gives such different results for empty and non-empty strings?

I think that "a".split("") should return ["", "a", ""], because this would be more logical, then when "a".split("",-1) returns ["a", ""], as it does now.
I think that in most other cases #split(str) should behave as  #split(str,-1) behaves now, because the decision to discard trailing empty elements seems arbitrary.
By analogy with "a".split(",",-1) currently returning ["",""], i think that:
",".split(",") should return ["",""],
"".split("") should return ["",""] (if not forbidden altogether),
",".split("") should return ["", ",", ""] (if not forbidden).

But, as i said, this is only a suggestion to preserve consistency: use the same general rule to split on empty and non-empty strings.
What is the rule now?
It seems like the #split on the empty string is treated separately, but then it should be a separate method.
The easiest way to be consistent, in my opinion, is to forbid splitting on the empty string and to use a different method for the array of letters.

----------------------------------------
Feature #5120: String#split needs to be logical
http://redmine.ruby-lang.org/issues/5120

Author: Alexey Muranov
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 


I would call this a bug, but i am new to Ruby, so i report this as a feature request.

Here are examples showing a surprising and inconsistent behavior of String#split method:

"aa".split('a')  # => []
"aab".split('a')  # => ["", "", "b"]

"aaa".split('aa')  # => ["", "a"] 
"aaaa".split('aa')  # => []
"aaaaa".split('aa')  # => ["", "", "a"] 

"".split('')  # => []
"a".split('')  # => ["a"]

What is the definition of *split*?
In my opinion, there should be given a simple one that would make it clear what to expect.
For example:

  str1.split(str2) returns a maximal array of non-empty substrings of str1 which can be concatenated with copies of str2 to form str1.

Additional precisions can be made to this definition to make clear what to expect as the result of "baaab".split("aa").

Thanks for attention.


-- 
http://redmine.ruby-lang.org