Issue #15764 has been updated by duerst (Martin D=FCrst).

Backport set to 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
Assignee set to matz (Yukihiro Matsumoto)
Tracker changed from Feature to Bug

I also think this is a bug. I have changed the category accordingly.

I think we should restrict the characters usable in identifiers to some rea=
sonable ranges. I agree that we mainly want to focus on ASCII programs, but=
 we should do at least a sanity check for the rest of Unicode, and that's c=
learly not happening now.

As a base for this, it's best to look at Unicode Standard Annex #31, Unicod=
e Identifier And Pattern Syntax (http://www.unicode.org/reports/tr31/). A r=
egular expression for the identifier syntax defined in UAX #31 is easily av=
ailable in Ruby: `/\p{id_start}\p{id_continue}*/`. The character ranges cov=
ered by these properties can be checked in enc/unicode/12.1.0/name2ctype.h,=
 from lines 15267 and 15881 (the file is too large for the Web interface to=
 svn).

The only additions we seem to need are '_' in initial position, sigils for =
the different kinds of identifiers, and  final '!', '?', and '=3D' for meth=
od names.

I suspect that it may take @nobu just a few hours to actually implement thi=
s, and that the backwards-compatibility issues (existing Ruby programs stop=
ping to work) are extremely minimal and limited to examples that show the p=
roblem.

I have added this to the list of issues to be discussed at next week's deve=
lopers' meeting, but I will not be at the meeting itself. If needed, I can =
join the discussion at the first day of RubyKaigi itself. I have assigned t=
his issue to Matz because I'd like him to give it a sanity check.

----------------------------------------
Bug #15764: Whitespace and control characters should not be permitted in to=
kens
https://bugs.ruby-lang.org/issues/15764#change-77610

* Author: BatmanAoD (Kyle Strand)
* Status: Open
* Priority: Normal
* Assignee: matz (Yukihiro Matsumoto)
* Target version: =

* ruby -v: =

* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
As of Ruby 2.5.1p57, it appears that all valid Unicode code-points above 12=
8 are permitted in tokens. This includes whitespace and control characters.

This was demonstrated here: https://gist.github.com/qrohlf/7045823

I have attached the raw download from the above gist.

The issue has been discussed on StackOverflow: https://stackoverflow.com/q/=
34455427/1858225

I would say this is arguably a bug, but I am marking this ticket as a "feat=
ure" since the current behavior could be considered by-design.

---Files--------------------------------
helloworld.rb (543 Bytes)


-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>