< :前の番号
^ :番号順リスト
> :次の番号
P :前の記事(スレッド移動)
N :次の記事(スレッド移動)
|<:前のスレッド
>|:次のスレッド
^ :返事先
_:自分への返事
>:同じ返事先を持つ記事(前)
<:同じ返事先を持つ記事(後)
---:分割してスレッド表示、再表示
| :分割して(縦)スレッド表示、再表示
~ :スレッドのフレーム消去
.:インデックス
..:インデックスのインデックス
Issue #13110 has been updated by Shugo Maeda.
Eric Wong wrote:
> For reading and parsing operations, I'm not sure they're needed
> because IO#read/read_nonblock/etc all return binary strings when
> passed explicit length arg; and //n exists for Regexp. (And any
> socket server reading without a length arg would be dangerous)
Let me clarify my intention.
I'd like to handle not only singlebyte characters but multibyte
characters efficiently by byte-based operations.
Once a string is scanned, we have a byte offset, so we don't need
scan the string from the beginning, but we are forced to do it by
the current API.
In the following example, the byteindex version is much faster than
the index version.
```
lexington:ruby$ cat bench.rb
require "benchmark"
s = File.read("README.ja.md") * 10
Benchmark.bmbm do |x|
x.report("index") do
pos = 0
n = 0
loop {
break unless s.index(/\p{Han}/, pos)
n += 1
_, pos = Regexp.last_match.offset(0)
}
end
x.report("byteindex") do
pos = 0
n = 0
loop {
break unless s.byteindex(/\p{Han}/, pos)
n += 1
_, pos = Regexp.last_match.byteoffset(0)
}
end
end
lexington:ruby$ ./ruby bench.rb
Rehearsal ---------------------------------------------
index 1.060000 0.010000 1.070000 ( 1.116932)
byteindex 0.000000 0.010000 0.010000 ( 0.004501)
------------------------------------ total: 1.080000sec
user system total real
index 1.050000 0.000000 1.050000 ( 1.080099)
byteindex 0.000000 0.000000 0.000000 ( 0.003814)
```
----------------------------------------
Bug #13110: Byte-based operations for String
https://bugs.ruby-lang.org/issues/13110#change-62409
* Author: Shugo Maeda
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v:
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
How about to add byte-based operations for String?
```
s = "あああいいいあああ"
p s.byteindex(/ああ/, 4) #=> 18
x, y = Regexp.last_match.byteoffset(0) #=> [18, 24]
s.bytesplice(x...y, "おおお")
p s #=> "あああいいいおおおあ"
```
---Files--------------------------------
byteindex.diff (2.83 KB)
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>