Hi -- On Mon, 11 Jul 2005, Daniel Brockman wrote: > "David A. Black" <dblack / wobblini.net> writes: > >> String#chop chops off the rightmost character: >> >> irb(main):001:0> "abc".chop >> => "ab" > > Except if the string ends with a CRLF pair: > > "abc\r\n".chop #=> "abc" > >> You may be thinking of "chomp", which is a specialized "chop" >> operating only on newline characters. > > If you read the docstrings, you get the impression that String#chop > is more-or-less deprecated in favor of the ``safer'' String#chomp: > > +String#chomp+ is ofter a safer alternative, as it leaves > the string unchanged if it doesn't end in a record separator. I believe that means safer in the sense that if you're going through, say, lines in a file, and for some reason there's no \n at the end of the last line, you won't accidentally cut off a non-\n character. In the general case, #chop can't be deprecated in favor of #chomp, because #chomp doesn't offer the same functionality (chopping off the last character). >> So the idea of lchop would be to serve as a left-hand equivalent >> of chop. > > So I suppose if the string starts with a CRLF pair, String#lchop would > chop off two characters from the left? That's a good question. One could argue that the only reason they are treated together in the first place is that they represent the more abstract concept "newline" -- and that where they aren't representing that concept, they should be treated separately. Or one could go for the complete symmetry approach. I guess I'd tend to favor the former notion, since the idea of left-end/right-end is already irreducibly asymmetrical in a left-to-right writing system. (Though then there's the matter of what would happen given a right-to-left writing system, etc.) > Why not go all the way and let all string methods treat CRLF pairs as > single characters? See above -- there's no magic association between those two characters, just the historical fact of their serving the newline role, and the practical need to acknowledge that role. I don't think there would be any advantage to, say, having String#count combine them, etc. (though of course there would have been an advantage to global agreement several decades ago on how to represent newline on various platforms :-) > I think it's a problem that strings are the only way to go for raw > byte arrays in Ruby, yet > > * strings lack a few random useful array methods > > * the string methods are not binary safe. String, like Hash, raises interesting questions about the relation between itself, Array, and Enumerable. It's interesting that String#to_a breaks the string into lines as opposed to characters or bytes. That's certainly a behavior one would not expect if the "arrayness" of strings resided strictly in their status as ordered collections of bytes. On the other hand, they are ordered collections of characters of bytes :-) I still find myself expecting String#each to go bytewise. But the fact that these different objects don't map exactly on to each other is, I think, one of the points of having a higher and separate abstraction like Enumerable. It decouples them, while still not making it impossible to assimilate them to each other when necessary. David -- David A. Black dblack / wobblini.net