Issue #13016 has been updated by Shyouhei Urabe.


Martin Drst wrote:
> Shyouhei Urabe wrote:
> > I noticed that I can't purge `NKF.nkf '-Z4'`.  It can neither be rewritten using String#tr, String#encode, nor String#unicode_normalize.
> 
> Can you give (a pointer to) a detailed description of what NKF, and in particular NKF.nkf -Z4, does exactly? For example, I can't find it at http://blog.layer8.sh/ja/2012/03/31/nkf_command_option/.

It seems there are quite few resources describing this feature on line.

- I learned it by command line "nkf --help".  The output says "4: JISX0208 Katakana to JISX0201 Katakana".
- A few minutes of googling let me realize that it has beed there at least since 2009. https://osdn.net/projects/nkf/news/17482 (Japanese).
- It seems this is the particular commit which implemented the feature in nkf: https://github.com/nurse/nkf/commit/958de30bc09aef38f2a44b5da0dbb1bb3c79e7d3
- and then copied into our repository in this commit: https://github.com/ruby/ruby/commit/086e5b1a63d77bf5a4ebe10396a430d544fbe505

So in short it converts characters into what Unicode calls the "Halfwidth" ones.

> Please note that String#unicode_normalize, as currently implemented, also uses some huge regular expressions (though program-generated). And also has (hopefully) successfully been debugged, although with the help of testing data from Unicode.

Thank you.  That still sounds like a hustle to me.  The proposed functionality would make it a lot easier for me to emulate NKF's Z4.

----------------------------------------
Feature #13016: String#gsub(hash)
https://bugs.ruby-lang.org/issues/13016#change-61931

* Author: Shyouhei Urabe
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Background: I wanted to drop NKF dependency of my script.  By doing so I noticed that I can't purge `NKF.nkf '-Z4'`.  It can neither be rewritten using String#tr, String#encode, nor String#unicode_normalize. It is doable using String#gsub theoretically, but that requires a hand-crafted nontrivial regular expression that exactly matches what Z4 expects to convert.  This is almost impossible to do, and is definitely not something debuggable.

Proposal: extend String#gsub so that it also accepts hash as its only argument, specifying input-output mapping.

```ruby
# now
def convert str
  require 'nkf'
  NKF.nkf '-Z4xm0', str
end

# proposed
def convert str
  map = {  "\u3002" => "\uFF61", "\u300C" => "\uFF62", ... }
  str.gsub map
end
```



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>