Issue #14975 has been updated by ioquatix (Samuel Williams).


@jeremyevans0 I agree with you. It's a problem.

Just for completeness, here is the error you talk about:

```ruby
b = 'a'.force_encoding(Encoding::BINARY)
u = "\u00ff".force_encoding(Encoding::UTF_8)

b << u
b.force_encoding(Encoding::BINARY)

# Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
u << b
```

IMHO, anyone relying on this behaviour is walking on fire. But, you are right, there is the potential to break existing code. I believe the correct solution is for people to avoid using binary buffers for this use case. There already exists `Encoding::ASCII` which would make more sense. So if we limited to `Encoding::BINARY` it at least has a specific semantic model. One way to fix the above, would be to turn the `Encoding::UTF_8` receiver into `Encoding::BINARY`. I'm not sure I like that solution, but it does work in a predictable way and avoids introducing exceptions where none existed before.

Do you think there is a way we can find a compromise? I'd rather not add yet another string concatenation function. I sort of admire Ruby for being opinionated, so I think if we can find a solution here without adding more options/arguments/methods, that would be ideal. WDYT?

----------------------------------------
Feature #14975: String#append without changing receiver's encoding
https://bugs.ruby-lang.org/issues/14975#change-73466

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that `String#<<`, `String#concat` or the (proposed) `String#append` can avoid changing the encoding of the receiver?

Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do:

```ruby
class Buffer < String
	BINARY = Encoding::BINARY
	
	def initialize
		super
		
		force_encoding(BINARY)
	end
	
	def << string
		if string.encoding == BINARY
			super(string)
		else
			super(string.b) # Requires extra allocation.
		end
		
		return self
	end
	
	alias concat <<
end
```

When the receiver is binary, but contains byte sequences, appending UTF_8 can fail:

```
"Foobar".b << "Fbar"
=> "FoobarFbar"

> "Fbar".b << "Fbar"
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
```

So, it's not possible to append data, generally, and then call `force_encoding(Encoding::BINARY)`. One must ensure the string is binary before appending it.

It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a `memcpy`.

See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3

There are two options to fix this:

1/ Don't change receiver encoding in any case.
2/ Apply 1, but only when receiver is using `Encoding::BINARY`




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>