Issue #13750 has been updated by naruse (Yui NARUSE).


When you avoid that case, you have a option around coderange: coderange is =
a cached information whether the string contains (1) only ASCII 7 bit chara=
cters (2) also has 8 bit characters (3) broken byte sequence (4) unknown. S=
ome strings are already scanned its coderange and caches it in a string obj=
ect, but others are not. Whether this casecmp? optimization uses the cache =
and not scan string if the cache doesn't exist, or scan if it doesn't have =
a cache. If you use the cache, I wonder whether strings in real application=
s have cache or not. If you scan, I wonder if it still gets faster.

----------------------------------------
Feature #13750: Improve String#casecmp? and Symbol#casecmp? performance wit=
h ASCII string
https://bugs.ruby-lang.org/issues/13750#change-90074

* Author: watson1978 (Shizuo Fujita)
* Status: Open
* Priority: Normal
----------------------------------------
I think String#casecmp and String#casecmp? are similar methods. But they ha=
ve different performance with ASCII strings.

It seems that String#casecmp handles ASCII string only, but it is faster th=
an String#casecmp?.

This patch uses the code of String#casecmp on String#casecmp? for ASCII str=
ings. However, it introduces a minor penalty for UTF8 strings due to detect=
ion of ASCII/UTF8 strings.

~~~
String#casecmp? ASCII -> 61.3 % up
String#casecmp? UTF8  ->  1.3 % down
Symbol#casecmp? ASCII -> 80.0 % up
Symbol#casecmp? UTF8  ->  4.0 % down
~~~

### Before
~~~
Calculating -------------------------------------
      String#casecmp      5.961M (=B1 3.8%) i/s -     29.838M in   5.017907s
String#casecmp? ASCII
                          3.530M (=B1 8.6%) i/s -     17.554M in   5.034848s
String#casecmp? UTF8      1.252M (=B1 7.4%) i/s -      6.213M in   5.012168s
      Symbol#casecmp      8.555M (=B1 2.4%) i/s -     42.822M in   5.009280s
Symbol#casecmp? ASCII
                          4.235M (=B1 9.7%) i/s -     20.824M in   5.001368s
Symbol#casecmp? UTF8      1.329M (=B1 0.1%) i/s -      6.704M in   5.043725s
~~~

### After
~~~
Calculating -------------------------------------
      String#casecmp      5.984M (=B1 6.4%) i/s -     29.829M in   5.020331s
String#casecmp? ASCII
                          5.658M (=B1 1.5%) i/s -     28.308M in   5.004547s
String#casecmp? UTF8      1.215M (=B1 4.3%) i/s -      6.132M in   5.060292s
      Symbol#casecmp      8.651M (=B1 0.9%) i/s -     43.313M in   5.007215s
Symbol#casecmp? ASCII
                          7.462M (=B1 0.5%) i/s -     37.489M in   5.023892s
Symbol#casecmp? UTF8      1.275M (=B1 0.2%) i/s -      6.444M in   5.052743s
~~~


### Test code
~~~ruby
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "String#casecmp" do |loop|
    loop.times { "aBcDeF".casecmp("abcdefg") }
  end
  x.report "String#casecmp? ASCII" do |loop|
    loop.times { "aBcDeF".casecmp?("abcdefg") }
  end
  x.report "String#casecmp? UTF8" do |loop|
    loop.times { "\u{e4 f6 fc}".casecmp?("\u{c4 d6 dc}") }
  end

  x.report "Symbol#casecmp" do |loop|
    loop.times { :aBcDeF.casecmp(:abcdefg) }
  end
  x.report "Symbol#casecmp? ASCII" do |loop|
    loop.times { :aBcDeF.casecmp?(:abcdefg) }
  end
  x.report "Symbol#casecmp? UTF8" do |loop|
    loop.times { :"\u{e4 f6 fc}".casecmp?(:"\u{c4 d6 dc}") }
  end
end
~~~

### Patch
https://github.com/ruby/ruby/pull/1668



-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>