Issue #11706 has been updated by Chris Seaton.


I've been dealing with an issue related to this. When Ruby updated to MRI 7.0 the name2ctype.h was updated but not the name2ctype.src, so they're now inconsistent (look at CR_Blank for example).

I found this problem when I tried to update JCodings (part of JRuby) which generated its tables from these files. It uses the name2ctype.src, so got the wrong values.

I'll update JCodings to read from name2ctype.h instead.

You've listed name2ctype.h as an intermediate that should be deleted. I'm not sure that's right - it's actually the original source now isn't it? Pulled from Onigmo. I don't think that one can be deleted.

----------------------------------------
Bug #11706: Clean up files etc/unicode/name2ctype.{h.blt,kwd,src}
https://bugs.ruby-lang.org/issues/11706#change-55942

* Author: Martin Drst
* Status: Open
* Priority: Normal
* Assignee: Nobuyoshi Nakada
* ruby -v: 
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The files name2ctype.{h.blt,kwd,src} in etc/unicode are intermediate products that are not needed in the repository, and haven't been committed consistently. I propose to remove them.

[I'm not sure this is a bug or a feature, but it doesn't provide any new functionality, so feature doesn't seem right.]

[I've assigned this to Nobu for feedback; I can execute it once we agree on a way forward.]


On 2015/11/17 15:39, Nobuyoshi Nakada wrote:

> Please update name2ctype.{h.blt,kwd,src} files too.

Thanks for the reminder. I had a look at these files. Maybe before further commits, we can try to simplify things a bit, and/or to ignore irrelevant stuff.

Sorry this message is long. Looking at the three files you mentioned, I noticed the following:

enc/unicode/name2ctype.h.kwd was produced on the Onigmo side, when I worked on the update (see also https://github.com/k-takata/Onigmo/pull/58), too. However, it is not part of the Onigmo distribution.
It was last committed by Yui Naruse at r36070, on 2012/06/14. This is way before the update to Unicode 7.0.0 with r46831.

On 2011/11/20, K. Takata introduced https://github.com/k-takata/Onigmo/blob/master/tool/convert-name2ctype.sh, which is used as:
convert-name2ctype.sh name2ctype.kwd > name2ctype.h
to directly convert from name2ctype.kwd to name2ctype.h (although it produces a few numbered intermediary files which are removed in the last step).

enc/unicode/name2ctype.h.blt was last committed by yourself in r49292 on 2015/01/17. Your log message mentions r46831, but it is unclear why you updated .h.blt and not .kwd and .src. The last commit before this was r36070, same as for name2ctype.h.kwd.

enc/unicode/name2ctype.src also was last committed in r36070.

Looking at Makefile.in, it contains instructions to create enc/unicode/name2ctype.h from enc/unicode/name2ctype.kwd at http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/Makefile.in?view=markup#l340. There, .h.blt and .src are mentioned, but my knowledge of shell syntax isn't good enough to understand what's exactly supposed to go on.

My conclusions so far would be:
- name2ctype.{h.blt,kwd,src} are all intermediary files that are not
  actually used directly for building Ruby.
- In the last few years, these three files have been committed only
  rarely and accidentally, not in any visible sync with actual bug fixes
  or feature additions.
- Onigmo no longer uses name2ctype.h.blt and .src, and does not commit
  .kwd.
- The build process on the Onigmo side, although I did it manually, was
  well documented and painless; on the Ruby side, it may be possible to
  build enc/unicode/name2ctype.h (the file that's finally used for
  compilation), but I haven't found how to do so.
- For a process that needs to be done about once a year, this amount of
  manual work seems perfectly fine (at least for me, and I volunteer to
  do it again next year).
- Therefore, I suggest that we don't care about committing
  name2ctype.{h.blt,kwd,src}. If you want me to commit
  enc/unicode/name2ctype.h.kwd, I can do it (because I have the new
  version). Indeed, it might be better to remove these three files;
  they only make checkouts heavier.
- If we want to simplify the production process, my preference would be
  to update Makefile.in based on convert-name2ctype.sh, or to directly
  integrate convert-name2ctype.sh into tool/enc-unicode.rb
  (why would one want to use sed and friends if we already use ruby?)





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>