Issue #10416 has been updated by Martin Drst.


Yui NARUSE wrote:
> For years, file structures of Unicode Data was changed some times.
> Therefore there's no guarantee that Unicode 12 can work with the current script.

I agree (but see last paragraph of this comment). But that's not what this issue is about.

What I'm talking about is that next year, at some point in time, we decide that ruby trunk is upgraded to Unicode 8.0 (and so on probably every year).This was the case this year for Unicode 7.0, see issue #9092.

We do this after checking that the new Unicode data files work with the current script (first the beta files and then the final releases), and if theydon't work, then we upgrade the script. Then we commit, and everybody on trunk gets the changes when they update. But currently, this is not the casefor the Unicode data files, and people on trunk will have to use a specialeffort to upgrade.

Besides committing lib/unicode_normalize/tables.rb (nobu reverted it but didn't give any reason why), there's another way to achieve this goal:

Note in a file the versions or timestaps of the 'official' version of the Ruby trunk Unicode data files. This could be part of a .mk file, or a new file. Of the three files we currently download, two have a header (first two lines) like this:
# NormalizationTest-7.0.0.txt
# Date: 2013-11-27, 09:54:41 GMT [MD]
So we could note the version and/or date we want people on trunk to use, and check against it. But one file, UnicodeData.txt, doesn't contain the information in the file, so we have to rely on the date of the Last-Modified http header (which we already use to avoid repeated downloads of the same file).

The reason why UnicodeData.txt doesn't contain is these header lines is that this is a very old file and the Unicode Consortium is actually quite careful to not make any changes that could affect the users of a file. If data of a different type is needed, then it is provided in a separate file.

----------------------------------------
Bug #10416: Create mechanism for updating of Unicode data files downstreamswhen we want
https://bugs.ruby-lang.org/issues/10416#change-49667

* Author: Martin Drst
* Status: Open
* Priority: Normal
* Assignee: Nobuyoshi Nakada
* Category: build
* Target version: current: 2.2.0
* ruby -v: ruby 2.2.0dev (2014-10-22 trunk 48092) [x86_64-cygwin]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
The current mechanism for updating Unicode data files will create the following problem:
Downstream compilers/packagers will download Unicode data files ONE time (they may already have done so).

However, if they don't activate ALWAYS_UPDATE_UNICODE = yes, these files will never get updated, and they will stay on Unicode version 7.0 even if in five years Unicode is e.g. on version 12.0.
On the other hand, if they activate ALWAYS_UPDATE_UNICODE = yes (and assuming issue #10415 gets fixed), they constantly update to the latest versionof Unicode. That's good for those who actually want this, but now what ourcurrent policy is.
What's missing is that we (Ruby core) can make sure downstream checkouts update to a new Unicode version when we want then to do so (as we e.g. can dofor other parts that are based on Unicode data, see e.g. https://bugs.ruby-lang.org/issues/9092), without sending an email to everybody and hoping they read and follow it.

[Currently, the only solution I know will work is the one pointed out by Yui Naruse in https://bugs.ruby-lang.org/issues/10084#note-17, but I'm okay with any other solution.]






-- 
https://bugs.ruby-lang.org/