Issue #14975 has been updated by shyouhei (Shyouhei Urabe).


ioquatix (Samuel Williams) wrote:
> > Changing String#<< and String#concat is not acceptable.
> 
> Since the current behaviour is underspecified, I feel like we should at least consider it as an option.
> 
> Practically speaking, I don't expect the encoding to change on the receiver. It's surprising, unexpected behaviour.
> 
> I actually think that the arguments should be converted into the encoding of the receiver and appended, not changing the receivers encoding. In the case the conversion fails, it should throw an exception.

Though perhaps not documented at the String#<<'s place, this is not how our M17N is designed in principle.  There is no automatic encoding conversion.  Programmers should be aware of the fact that encodings do exist, and should treat them by their own.  If you want to change this you need a concrete reason why, not just because it _seems_ handy.

That said, you are not requesting to "convert" the string contents but just ignore their encodings.  This is possible I think, depending on operations.  So my advice here is: don't expand your battle front.  Stick to the idea to push the feature you want, that is, appending arbitrary string to a binary.

----------------------------------------
Feature #14975: String#append without changing receiver's encoding
https://bugs.ruby-lang.org/issues/14975#change-73459

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that `String#<<`, `String#concat` or the (proposed) `String#append` can avoid changing the encoding of the receiver?

Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do:

```ruby
class Buffer < String
	BINARY = Encoding::BINARY
	
	def initialize
		super
		
		force_encoding(BINARY)
	end
	
	def << string
		if string.encoding == BINARY
			super(string)
		else
			super(string.b) # Requires extra allocation.
		end
		
		return self
	end
	
	alias concat <<
end
```

When the receiver is binary, but contains byte sequences, appending UTF_8 can fail:

```
"Foobar".b << "Fbar"
=> "FoobarFbar"

> "Fbar".b << "Fbar"
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
```

So, it's not possible to append data, generally, and then call `force_encoding(Encoding::BINARY)`. One must ensure the string is binary before appending it.

It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a `memcpy`.

See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3

There are two options to fix this:

1/ Don't change receiver encoding in any case.
2/ Apply 1, but only when receiver is using `Encoding::BINARY`




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>