Issue #8129 has been updated by nobu (Nobuyoshi Nakada).


You may want to:
* use regexp, e.g. scan.
* convert to fix width wide char encoding, i.e., UTF-32LE or UTF-32BE.
----------------------------------------
Bug #8129: String#index has drastically different performance when a single unicode character is included
https://bugs.ruby-lang.org/issues/8129#change-37750

Author: zmoazeni (Zach Moazeni)
Status: Rejected
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0.0-p0


=begin
I created a simple ruby script:

 #! /usr/bin/env ruby

 raise "need a file name" unless ARGV[0]
 contents = File.read(ARGV[0])

 326_000.times do |i|
   contents[(i + 23) % contents.size]
 end

And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash").

String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes!

Any idea why the performance is so dramatically different between the two?
=end



-- 
http://bugs.ruby-lang.org/