Issue #12306 has been updated by Sam Saffron.


Just to expand on how hard this is to get right without the framework providing it 

See: 

https://gist.github.com/SamSaffron/d1a9cc8e141e7415e06306369fdedfe5

`/[[:^space:]]/ === str` can cause significantly more data to allocate including invisible MatchData and Strings vs 

 `/\A[[:space:]]*\z/ === str`

(depending on the string being tested) 

There is no way to invoke the regex engine without it magically setting a pile of globals, making it very inefficient to do lots of things with regex. 3 years ago when I brought this up, Nobu suggested allowing String#include? to accept a regex and set no globals, that may be a way to get a bunch of perf out of the regex engine. Or simply stop with all the globals in Ruby 3 and have specific methods for getting match data always used. I don't know. 

My point is, doing something even trivial here is practically impossilbe to do fast in Ruby today.  

----------------------------------------
Feature #12306: Implement String #blank? #present? and improve #strip and family to handle unicode
https://bugs.ruby-lang.org/issues/12306#change-58361

* Author: Sam Saffron
* Status: Open
* Priority: Normal
* Assignee: Yukihiro Matsumoto
----------------------------------------
Time and again there have been rejected feature requests to Ruby core to implement `blank` and `present` protocols across all objects as ActiveSupport does. I am fine with this call and think it is fair. 

However, for the narrow case of String having `#blank?` and `#present?` makes sense. 

- Provides a natural extension over `#strip`, `#lstrip` and `#rstrip`. `("   ".strip.length == 0) == "    ".blank?`

- Plays nicely with ActiveSupport, providing an efficient implementation in Ruby core: see: https://github.com/SamSaffron/fast_blank, implementing blank efficiently requires a c extension. 

However, if this work is to be done, `#strip` and should probably start dealing with unicode blanks, eg: 

```
irb(main):008:0> [0x3000].pack("U")
=> "กก"
irb(main):009:0> [0x3000].pack("U").strip.length
=> 1
```

So there are 2 questions / feature requests here

1. Can we add blank? and present? to String? 
2. Can we amend strip and family to account for unicode per: https://github.com/SamSaffron/fast_blank/blob/master/ext/fast_blank/fast_blank.c#L43-L74



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>