Issue #13241 has been updated by Dan0042 (Daniel DeLorme).


I had a go at this, and a naive implementation is quite simple. The only issue really  is where to store the list of unicode properties.

```ruby
class String
  def unicode_properties(*categs)
    @@props ||= Hash.new.tap do |hash|
      categ = nil
      #downloaded from https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/UnicodeProps.txt
      txt = File.read(File.expand_path('../UnicodeProps.txt',__FILE__))
      txt.scan(/^\* (\S+)|^    (\S.*)/) do |c,prop|
        hash[categ=c.to_sym] = {} if c
        hash[categ][prop.to_sym] = /\p{#{prop}}/ rescue next if prop
      end
    end
    categs = @@props.keys - [:DerivedAges] if categs.empty?
    result = []
    categs.each do |categ|
      @@props[categ]&.each do |prop,rx|
        result << prop if self =~ rx
      end
    end
    result
  end
end

"".unicode_properties #=> [:Alpha, :Graph, :Lower, :Print, :Word, :Alnum, :Any, :Assigned, :L, :LC, :Ll, :Latin, :Alphabetic, :Cased, :Changes_When_Casefolded, :Changes_When_Casemapped, :Changes_When_Titlecased, :Changes_When_Uppercased, :Grapheme_Base, :ID_Continue, :ID_Start, :Lowercase, :XID_Continue, :XID_Start, :CWCF, :CWCM, :CWT, :CWU, :Gr_Base, :IDC, :IDS, :XIDC, :XIDS, :Latn, :In_Latin_Extended_A]

"".unicode_properties(:DerivedAges) #=> [:"Age=1.1", :"Age=10.0", :"Age=2.0", :"Age=2.1", :"Age=3.0", :"Age=3.1", :"Age=3.2", :"Age=4.0", :"Age=4.1", :"Age=5.0", :"Age=5.1", :"Age=5.2", :"Age=6.0", :"Age=6.1", :"Age=6.2", :"Age=6.3", :"Age=7.0", :"Age=8.0", :"Age=9.0"]

"あ".unicode_properties #=> [:Alpha, :Graph, :Print, :Word, :Alnum, :Any, :Assigned, :L, :Lo, :Hiragana, :Alphabetic, :Grapheme_Base, :ID_Continue, :ID_Start, :XID_Continue, :XID_Start, :Gr_Base, :IDC, :IDS, :XIDC, :XIDS, :Hira, :In_Hiragana]
```

----------------------------------------
Feature #13241: Method(s) to access Unicode properties for characters/strings
https://bugs.ruby-lang.org/issues/13241#change-80499

* Author: duerst (Martin Drst)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
[This is currently an exploratory proposal.]

Onigmo allows Unicode properties in regular expressions. With this, it's e.g. possible to check whether a string contains some Hiragana:

```
"ABC あ DEF" =~ /\p{hiragana}/
```

However, it is currently impossible to ask for e.g. the script of a character. I propose to add a method (or some methods) to String to be able to get such properties. Various (to some extent conflicting) examples:

```
"Aあア".script => :latin # returns script of first character only

"Aあア".script => [:latin, :hiragana, :katakana] # returns array of property values

"Aあア".property(:script) => :latin # returns specified property of first character only

"Aあア".property(:script) => [:latin, :hiragana, :katakana] # returns array of specified properties' values

"Aあア".properties([:script, :general_category]) => [[:latin, :Lu], [:hiragana, :Lo], [:katakana, :Lo]]
                        # returns arrays of property values, one array per character
```

The interface is still in flux, comments welcome!

Implementation depends on #13240.


In Python, such functionality (however, quite limited in property coverage, and not directly on String) is available in the standard library (see https://docs.python.org/3/library/unicodedata.html).



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>