Issue #8206 has been updated by sam.saffron (Sam Saffron).


Fair enough:

 > "" =~ /()|()/
 => 0 
 > "".include? ""
 => true  

I guess optimising for the empty string is tough. 

The empty? trick does reduce the cost significantly for the empty string (0.06 vs 0.18) though it increases cost for all the other cases by 20% 

We are dealing with a pretty subtle micro optimisation here. 
----------------------------------------
Feature #8206: Should Ruby core implement String#blank? 
https://bugs.ruby-lang.org/issues/8206#change-38262

Author: sam.saffron (Sam Saffron)
Status: Open
Priority: Normal
Assignee: 
Category: core
Target version: 


There has been some discussion about porting the #blank? protocol over to Ruby in the past that has been rejected by Matz. 

This proposal is only about String however. 

At the moment to figure out if you have a blank string you would 

"  ".strip.length == 0

The disadvantage is that this forces unneeded allocations and does too much work: 

An optimal implementation would be:

static VALUE
rb_str_blank(VALUE str)
{
  rb_encoding *enc;
  char *s, *e;

  enc = STR_ENC_GET(str);
  s = RSTRING_PTR(str);
  if (!s || RSTRING_LEN(str) == 0) return Qtrue;

  e = RSTRING_END(str);
  while (s < e) {
	  int n;
	  unsigned int cc = rb_enc_codepoint_len(s, e, &n, enc);

	  if (!rb_isspace(cc) && cc != 0) return Qfalse;
    s += n;
  }
  return Qtrue;
}

This in turn is about 5-8x than the regex solution to the problem and way faster than allocating one massive string with strip when length is large. 

Should Ruby take on this method, to accompany #strip following its practice. 

--- 

A slight caveat though is that active support has a somewhat different definition of blank? 

const unsigned int as_blank[26] = {9, 0xa, 0xb, 0xc, 0xd,
  0x20, 0x85, 0xa0, 0x1680, 0x180e, 0x2000, 0x2001,
  0x2002, 0x2003, 0x2004, 0x2005, 0x2006, 0x2007, 0x2008,
  0x2009, 0x200a, 0x2028, 0x2029, 0x202f, 0x205f, 0x3000
};

static VALUE
rb_str_blank_as(VALUE str)
{
  rb_encoding *enc;
  char *s, *e;
  int i;
  int found;

  enc = STR_ENC_GET(str);
  s = RSTRING_PTR(str);
  if (!s || RSTRING_LEN(str) == 0) return Qtrue;

  e = RSTRING_END(str);
  while (s < e) {
	  int n;
	  unsigned int cc = rb_enc_codepoint_len(s, e, &n, enc);

    found = 0;
    for(i=0;i<26;i++){
      unsigned int current = as_blank[i];
      if(current == cc) {
        found = 1;
        break;
      }
      if(cc < current){
        break;
      }
    }

	  if (!found) return Qfalse;
    s += n;
  }
  return Qtrue;
}

Clearly it makes no sense to have such a method. 

If Ruby took over implementing String#blank? it would clash with Active Support. But imho would enforce better API consistency. 

Thoughts?


 


-- 
http://bugs.ruby-lang.org/