Hi --

On Mon, 11 Jul 2005, Daniel Brockman wrote:

> "David A. Black" <dblack / wobblini.net> writes:
>
>> String#chop chops off the rightmost character:
>>
>>    irb(main):001:0> "abc".chop
>>    => "ab"
>
> Except if the string ends with a CRLF pair:
>
>   "abc\r\n".chop  #=> "abc"
>
>> You may be thinking of "chomp", which is a specialized "chop"
>> operating only on newline characters.
>
> If you read the docstrings, you get the impression that String#chop
> is more-or-less deprecated in favor of the ``safer'' String#chomp:
>
>   +String#chomp+ is ofter a safer alternative, as it leaves
>   the string unchanged if it doesn't end in a record separator.

I believe that means safer in the sense that if you're going through,
say, lines in a file, and for some reason there's no \n at the end of
the last line, you won't accidentally cut off a non-\n character.

In the general case, #chop can't be deprecated in favor of #chomp,
because #chomp doesn't offer the same functionality (chopping off the
last character).

>> So the idea of lchop would be to serve as a left-hand equivalent
>> of chop.
>
> So I suppose if the string starts with a CRLF pair, String#lchop would
> chop off two characters from the left?

That's a good question.  One could argue that the only reason they are
treated together in the first place is that they represent the more
abstract concept "newline" -- and that where they aren't representing
that concept, they should be treated separately.  Or one could go for
the complete symmetry approach.  I guess I'd tend to favor the former
notion, since the idea of left-end/right-end is already irreducibly
asymmetrical in a left-to-right writing system.  (Though then there's
the matter of what would happen given a right-to-left writing system,
etc.)

> Why not go all the way and let all string methods treat CRLF pairs as
> single characters?

See above -- there's no magic association between those two
characters, just the historical fact of their serving the newline
role, and the practical need to acknowledge that role.  I don't think
there would be any advantage to, say, having String#count combine
them, etc. (though of course there would have been an advantage to
global agreement several decades ago on how to represent newline on
various platforms :-)

> I think it's a problem that strings are the only way to go for raw
> byte arrays in Ruby, yet
>
>  * strings lack a few random useful array methods
>
>  * the string methods are not binary safe.

String, like Hash, raises interesting questions about the relation
between itself, Array, and Enumerable.  It's interesting that
String#to_a breaks the string into lines as opposed to characters or
bytes.  That's certainly a behavior one would not expect if the
"arrayness" of strings resided strictly in their status as ordered
collections of bytes.  On the other hand, they are ordered collections
of characters of bytes :-)  I still find myself expecting String#each
to go bytewise.  But the fact that these different objects don't map
exactly on to each other is, I think, one of the points of having a
higher and separate abstraction like Enumerable.  It decouples them,
while still not making it impossible to assimilate them to each other
when necessary.


David

-- 
David A. Black
dblack / wobblini.net