Issue #13241 has been updated by Yukihiro Matsumoto.
I am neutral about the proposal, but the method names are too generic. It should be prefixed by `unicode_` for example.
Matz.
----------------------------------------
Feature #13241: Method(s) to access Unicode properties for characters/strings
https://bugs.ruby-lang.org/issues/13241#change-63090
* Author: Martin Drst
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
[This is currently an exploratory proposal.]
Onigmo allows Unicode properties in regular expressions. With this, it's e.g. possible to check whether a string contains some Hiragana:
```
"ABC あ DEF" =~ /\p{hiragana}/
```
However, it is currently impossible to ask for e.g. the script of a character. I propose to add a method (or some methods) to String to be able to get such properties. Various (to some extent conflicting) examples:
```
"Aあア".script => :latin # returns script of first character only
"Aあア".script => [:latin, :hiragana, :katakana] # returns array of property values
"Aあア".property(:script) => :latin # returns specified property of first character only
"Aあア".property(:script) => [:latin, :hiragana, :katakana] # returns array of specified properties' values
"Aあア".properties([:script, :general_category]) => [[:latin, :Lu], [:hiragana, :Lo], [:katakana, :Lo]]
# returns arrays of property values, one array per character
```
The interface is still in flux, comments welcome!
Implementation depends on #13240.
In Python, such functionality (however, quite limited in property coverage, and not directly on String) is available in the standard library (see https://docs.python.org/3/library/unicodedata.html).
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>