Hello Nobu,

Many thanks for looking into this. Your commits make the segmentation 
fault go away, that's great!

I can reproduce the problem here, in a simple one-liner that doesn't 
even rely on the \X / grapheme_clusters logic:
   ./ruby -e 'puts "abc" =~ /\p{Grapheme_Cluster_Break=E_Modifier}/'
Before upgrading to Unicode 11.0.0, this works without errors. After 
upgrading, it produces the following error:
-e:1: invalid character property name 
{Grapheme_Cluster_Break=E_Modifier}: /\p{Grapheme_Cluster_Break=E_Modifier}/

Looking for all the appearances of "Extend" in 
enc/unicode/1[01].0.0/name2ctype.h, I have found only the following 
differences:

 >>>>>>>>
11c11,12
< #define CR_Grapheme_Cluster_Break_Extend CR_Grapheme_Extend
---
 > static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {
 > }; /* CR_Grapheme_Cluster_Break_Extend */
29a31,33
 > /* 'In_Georgian_Extended': Block */
 > static const OnigCodePoint CR_In_Georgian_Extended[] = {
 > }; /* CR_In_Georgian_Extended */
90a95
 >   CR_In_Georgian_Extended,
 >>>>>>>>

The Georgian stuff is new, a new block of characters for an already 
existing script is often called _Extended. The first few changes may be 
more relevant. As far as I understand, character properties are 
preprocessed by removing spaces/underscores and lowercasing everything.

The problem seems to be related to the following text at 
http://www.unicode.org/versions/Unicode11.0.0/:
 >>>>>>>>
Segmentation-related Changes
Four Grapheme_Cluster_Break and Word_Break classes have become obsolete 
and are no longer used: E_Base, E_Modifier, Glue_After_Zwj, and 
E_Base_GAZ. Those values are still part of the enumeration of the 
property values, because stability constraints prevent removal of 
enumerated property values, even if obsolete; however, these are no 
longer assigned to any characters, and are no longer referred to 
explicitly by any rules in the algorithms.
 >>>>>>>>

That suggests that we may have some kinds of problems with properties 
that exist but don't have any characters assigned.

Also, the current problem is related to the fact that I'm trying to do 
the update from Unicode 10.0.0 to Unicode 11.0.0 in incremental steps; 
it is possible that there are some dependencies that make that 
impossible, but I'll try to find a way maybe to temporarily 'cheat' a 
bit to allow upgrading in steps.

I'll investigate more, but if you find something, that would also be great.

Regards,   Martin.

On 2018/10/16 18:16, Ŀ wrote:
> 
> Hi,
> 
> On 2018/10/16 16:31, Martin J. Drst wrote:
>> The commit below fails on travis-ci
>> (see https://travis-ci.org/ruby/ruby/jobs/442027489).
>>
>> It produces a Ruby segmentation fault when running the tests in
>> test/lib/minitest/unit.rb. I have tried to look at that file and find
>> any kinds of dependencies to Unicode 11.0.0, but I haven't found any.
>>
>> I have also tried to run that file independently, but
>> ./ruby test/runner.rb test/lib/minitest/unit.rb
>> just says
>> Finished tests in 0.006213s, 0.0000 tests/s, 0.0000 assertions/s.
>> 0 tests, 0 assertions, 0 failures, 0 errors, 0 skips
>>
>> Any help is appreciated, thanks very much in advance.
> 
> Added missing checks and error details at r65094-65096.
> Seems some property names gone.
> 
> ```
> $ ./ruby -e '"\u{80}"[/\X/]'
> Traceback (most recent call last):
>   1: from -e:1:in `<main>'
> -e:1:in `[]': invalid character property name 
> {Grapheme_Cluster_Break=E_Modifier}: /\X/ (RegexpError)
> ```
> ```
> $ ./ruby -e '"\u{80}".grapheme_clusters'
> Traceback (most recent call last):
>   1: from -e:1:in `<main>'
> -e:1:in `grapheme_clusters': cannot compile grapheme cluster regexp: 
> invalid character property name {Grapheme_Cluster_Break=E_Modifier} (fatal)
> ```


Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>