Issue #13588 has been updated by duerst (Martin Drst).


haines (Andrew Haines) wrote:
> phluid61 (Matthew Kerwin) wrote:
> > I hope there are no encodings where valid characters might not be a multiple of the minimum size.
> 
> Me too :) it works for now... the only encodings on Ruby 2.4.1 with `min_enc_len` > 1 are UTF-16 and UTF-32; UTF-16 is variable-length with either 1 or 2 16-bit code units, and UTF-32 is fixed-length.

Not true. There are quite a few East Asian encodings with max length of 2, 3, or 4. E.g. Shift_JIS, EUC_JP, GB18030,... But it's still true that the maximum size is a multiple of the minimum size.


----------------------------------------
Feature #13588: Add Encoding#min_char_size, #max_char_size, #minmax_char_size
https://bugs.ruby-lang.org/issues/13588#change-65151

* Author: haines (Andrew Haines)
* Status: Feedback
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
When implementing an IO-like object, I'd like to handle encoding correctly. To do so, I need to know the minimum and maximum character sizes for the encoding of the stream I'm reading. However, I can't find a way to access this information from Ruby (I ended up writing a gem with a native extension [1] to do so).

I'd like to propose adding instance methods `min_char_size`, `max_char_size`, and `minmax_char_size` to the `Encoding` class to expose the information stored in the `OnigEncodingType` struct's `min_enc_len` and `max_enc_len` fields.

~~~ ruby
Encoding::UTF_8.min_char_size     # => 1
Encoding::UTF_8.max_char_size     # => 6
Encoding::UTF_8.minmax_char_size  # => [1, 6]
~~~

[1] https://github.com/haines/char_size



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>