This is a multi-part message in MIME format.
--------------080200040501060506090307
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi!

Some days ago I made some tests with Ruby 1.9.0-0 in respect to
performance when handling utf-8 encoded strings, compared with
ascii strings. In one case I received a result which I couldn't
believe - my first impression was, that I made an error. I presented
the code and the result in the German Ruby forum an the results were
confirmed.

I made the test with Ruby 1.9.0-0, build on Windows2000 using
MinGW/MSYS, 256MB, 700MHz.

in two cases of the following test I used two utf-8 encoded string
(utf8_1 and utf8_2), both containing 'abcdefgh' * 1000000 + '.
The Test uses 10 times String#<< to add one character, " (three
bytes in case of utf8_1, and 'x' (one byte) in case of utf8_2.

The result was, that it takes "no time" to add the Euro-sign, but
more that six seconds to add the "x" (I attached the source and the
result - both in Windows format with "\r\n", utf-8 encoded with BOM).

Here is the listing of the program and the result.

 >>>>> Code >>>>>

require 'benchmark'
include Benchmark

utf8_1     'abcdefgh' * 1000000 + '
utf8_2     'abcdefgh' * 1000000 + '
ascii    'abcdefgh' * 1000000 + 'x'

puts "Zeichen in 'utf8_1':          #{utf8_1.length}"
puts "Bytes in 'utf8_1':            #{utf8_1.bytesize}"
puts "Encoding von 'utf8_1':        #{utf8_1.encoding}"
puts "Zeichen 8000001 in 'utf8_1':  #{utf8_1[8000000]}"
puts ""
puts "Zeichen in 'utf8_2':          #{utf8_2.length}"
puts "Bytes in 'utf8_2':            #{utf8_2.bytesize}"
puts "Encoding von 'utf8_2':        #{utf8_2.encoding}"
puts "Zeichen 8000001 in 'utf8_2':  #{utf8_2[8000000]}"
puts ""
puts "Zeichen in 'ascii':           #{ascii.length}"
puts "Bytes in 'ascii':             #{ascii.bytesize}"
puts "Encoding von 'ascii':         #{ascii.encoding}"
puts "Zeichen 8000001 in 'ascii':   #{ascii[8000000]}"
puts ""

bmbm(10) do |t|
   t.report('utf8_1'){10.times{utf8_1 << '}}
   t.report('utf8_2'){10.times{utf8_2 << 'x'}}
   t.report('ascii'){10.times{ascii << 'x'}}
end

puts "Zeichen in 'utf8_1':          #{utf8_1.length}"
puts "Bytes in 'utf8_1':            #{utf8_1.bytesize}"
puts "Encoding von 'utf8_1':        #{utf8_1.encoding}"
puts "Zeichen 8000011 in 'utf8_1':  #{utf8_1[8000010]}"
puts ""
puts "Zeichen in 'utf8_2':          #{utf8_2.length}"
puts "Bytes in 'utf8_2':            #{utf8_2.bytesize}"
puts "Encoding von 'utf8_2':        #{utf8_2.encoding}"
puts "Zeichen 8000011 in 'utf8_2':  #{utf8_2[8000010]}"
puts ""
puts "Zeichen in 'ascii':           #{ascii.length}"
puts "Bytes in 'ascii':             #{ascii.bytesize}"
puts "Encoding von 'ascii':         #{ascii.encoding}"
puts "Zeichen 8000011 in 'ascii':   #{ascii[8000010]}"

 >>>>> Result >>>>>

Zeichen in 'utf8_1':          8000001
Bytes in 'utf8_1':            8000003
Encoding von 'utf8_1':        UTF-8
Zeichen 8000001 in 'utf8_1':  
Zeichen in 'utf8_2':          8000001
Bytes in 'utf8_2':            8000003
Encoding von 'utf8_2':        UTF-8
Zeichen 8000001 in 'utf8_2':  
Zeichen in 'ascii':           8000001
Bytes in 'ascii':             8000001
Encoding von 'ascii':         ASCII-8BIT
Zeichen 8000001 in 'ascii':   x

Rehearsal ---------------------------------------------
utf8_1      0.050000   0.010000   0.060000 (  0.080000)
utf8_2      6.279000   0.030000   6.309000 (  6.349000)
ascii       0.050000   0.010000   0.060000 (  0.071000)
------------------------------------ total: 6.429000sec

                 user     system      total        real
utf8_1      0.000000   0.000000   0.000000 (  0.000000)
utf8_2      6.229000   0.000000   6.229000 (  6.369000)
ascii       0.000000   0.000000   0.000000 (  0.000000)
Zeichen in 'utf8_1':          8000021
Bytes in 'utf8_1':            8000063
Encoding von 'utf8_1':        UTF-8
Zeichen 8000011 in 'utf8_1':  
Zeichen in 'utf8_2':          8000021
Bytes in 'utf8_2':            8000023
Encoding von 'utf8_2':        UTF-8
Zeichen 8000011 in 'utf8_2':  x

Zeichen in 'ascii':           8000021
Bytes in 'ascii':             8000021
Encoding von 'ascii':         ASCII-8BIT
Zeichen 8000011 in 'ascii':   x

 >>>>> EoR >>>>>

Wolfgang NĂ¡dasi-Donner

--------------080200040501060506090307
Content-Type: text/plain;
 name
ddchar.rb"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filename
ddchar.rb"

77u/cmVxdWlyZSAnYmVuY2htYXJrJw0KaW5jbHVkZSBCZW5jaG1hcmsNCg0KdXRmOF8xID0g
ICAgICdhYmNkZWZnaCcgKiAxMDAwMDAwICsgJ+KCrCcNCnV0ZjhfMiA9ICAgICAnYWJjZGVm
Z2gnICogMTAwMDAwMCArICfigqwnDQphc2NpaSA9ICAgICdhYmNkZWZnaCcgKiAxMDAwMDAw
ICsgJ3gnDQoNCnB1dHMgIlplaWNoZW4gaW4gJ3V0ZjhfMSc6ICAgICAgICAgICN7dXRmOF8x
Lmxlbmd0aH0iDQpwdXRzICJCeXRlcyBpbiAndXRmOF8xJzogICAgICAgICAgICAje3V0Zjhf
MS5ieXRlc2l6ZX0iDQpwdXRzICJFbmNvZGluZyB2b24gJ3V0ZjhfMSc6ICAgICAgICAje3V0
ZjhfMS5lbmNvZGluZ30iDQpwdXRzICJaZWljaGVuIDgwMDAwMDEgaW4gJ3V0ZjhfMSc6ICAj
e3V0ZjhfMVs4MDAwMDAwXX0iDQpwdXRzICIiDQpwdXRzICJaZWljaGVuIGluICd1dGY4XzIn
OiAgICAgICAgICAje3V0ZjhfMi5sZW5ndGh9Ig0KcHV0cyAiQnl0ZXMgaW4gJ3V0ZjhfMic6
ICAgICAgICAgICAgI3t1dGY4XzIuYnl0ZXNpemV9Ig0KcHV0cyAiRW5jb2Rpbmcgdm9uICd1
dGY4XzInOiAgICAgICAgI3t1dGY4XzIuZW5jb2Rpbmd9Ig0KcHV0cyAiWmVpY2hlbiA4MDAw
MDAxIGluICd1dGY4XzInOiAgI3t1dGY4XzJbODAwMDAwMF19Ig0KcHV0cyAiIg0KcHV0cyAi
WmVpY2hlbiBpbiAnYXNjaWknOiAgICAgICAgICAgI3thc2NpaS5sZW5ndGh9Ig0KcHV0cyAi
Qnl0ZXMgaW4gJ2FzY2lpJzogICAgICAgICAgICAgI3thc2NpaS5ieXRlc2l6ZX0iDQpwdXRz
ICJFbmNvZGluZyB2b24gJ2FzY2lpJzogICAgICAgICAje2FzY2lpLmVuY29kaW5nfSINCnB1
dHMgIlplaWNoZW4gODAwMDAwMSBpbiAnYXNjaWknOiAgICN7YXNjaWlbODAwMDAwMF19Ig0K
cHV0cyAiIg0KDQpibWJtKDEwKSBkbyB8dHwNCiAgdC5yZXBvcnQoJ3V0ZjhfMScpezEwLnRp
bWVze3V0ZjhfMSA8PCAn4oKsJ319DQogIHQucmVwb3J0KCd1dGY4XzInKXsxMC50aW1lc3t1
dGY4XzIgPDwgJ3gnfX0NCiAgdC5yZXBvcnQoJ2FzY2lpJyl7MTAudGltZXN7YXNjaWkgPDwg
J3gnfX0NCmVuZA0KDQpwdXRzICJaZWljaGVuIGluICd1dGY4XzEnOiAgICAgICAgICAje3V0
ZjhfMS5sZW5ndGh9Ig0KcHV0cyAiQnl0ZXMgaW4gJ3V0ZjhfMSc6ICAgICAgICAgICAgI3t1
dGY4XzEuYnl0ZXNpemV9Ig0KcHV0cyAiRW5jb2Rpbmcgdm9uICd1dGY4XzEnOiAgICAgICAg
I3t1dGY4XzEuZW5jb2Rpbmd9Ig0KcHV0cyAiWmVpY2hlbiA4MDAwMDExIGluICd1dGY4XzEn
OiAgI3t1dGY4XzFbODAwMDAxMF19Ig0KcHV0cyAiIg0KcHV0cyAiWmVpY2hlbiBpbiAndXRm
OF8yJzogICAgICAgICAgI3t1dGY4XzIubGVuZ3RofSINCnB1dHMgIkJ5dGVzIGluICd1dGY4
XzInOiAgICAgICAgICAgICN7dXRmOF8yLmJ5dGVzaXplfSINCnB1dHMgIkVuY29kaW5nIHZv
biAndXRmOF8yJzogICAgICAgICN7dXRmOF8yLmVuY29kaW5nfSINCnB1dHMgIlplaWNoZW4g
ODAwMDAxMSBpbiAndXRmOF8yJzogICN7dXRmOF8yWzgwMDAwMTBdfSINCnB1dHMgIiINCnB1
dHMgIlplaWNoZW4gaW4gJ2FzY2lpJzogICAgICAgICAgICN7YXNjaWkubGVuZ3RofSINCnB1
dHMgIkJ5dGVzIGluICdhc2NpaSc6ICAgICAgICAgICAgICN7YXNjaWkuYnl0ZXNpemV9Ig0K
cHV0cyAiRW5jb2Rpbmcgdm9uICdhc2NpaSc6ICAgICAgICAgI3thc2NpaS5lbmNvZGluZ30i
DQpwdXRzICJaZWljaGVuIDgwMDAwMTEgaW4gJ2FzY2lpJzogICAje2FzY2lpWzgwMDAwMTBd
fSI--------------080200040501060506090307
Content-Type: text/plain;
 name
ddchar.txt"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filename
ddchar.txt"

77u/WmVpY2hlbiBpbiAndXRmOF8xJzogICAgICAgICAgODAwMDAwMQ0KQnl0ZXMgaW4gJ3V0
ZjhfMSc6ICAgICAgICAgICAgODAwMDAwMw0KRW5jb2Rpbmcgdm9uICd1dGY4XzEnOiAgICAg
ICAgVVRGLTgNClplaWNoZW4gODAwMDAwMSBpbiAndXRmOF8xJzogIOKCrA0KDQpaZWljaGVu
IGluICd1dGY4XzInOiAgICAgICAgICA4MDAwMDAxDQpCeXRlcyBpbiAndXRmOF8yJzogICAg
ICAgICAgICA4MDAwMDAzDQpFbmNvZGluZyB2b24gJ3V0ZjhfMic6ICAgICAgICBVVEYtOA0K
WmVpY2hlbiA4MDAwMDAxIGluICd1dGY4XzInOiAg4oKsDQoNClplaWNoZW4gaW4gJ2FzY2lp
JzogICAgICAgICAgIDgwMDAwMDENCkJ5dGVzIGluICdhc2NpaSc6ICAgICAgICAgICAgIDgw
MDAwMDENCkVuY29kaW5nIHZvbiAnYXNjaWknOiAgICAgICAgIEFTQ0lJLThCSVQNClplaWNo
ZW4gODAwMDAwMSBpbiAnYXNjaWknOiAgIHgNCg0KUmVoZWFyc2FsIC0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KdXRmOF8xICAgICAgMC4wNTAwMDAg
ICAwLjAxMDAwMCAgIDAuMDYwMDAwICggIDAuMDgwMDAwKQ0KdXRmOF8yICAgICAgNi4yNzkw
MDAgICAwLjAzMDAwMCAgIDYuMzA5MDAwICggIDYuMzQ5MDAwKQ0KYXNjaWkgICAgICAgMC4w
NTAwMDAgICAwLjAxMDAwMCAgIDAuMDYwMDAwICggIDAuMDcxMDAwKQ0KLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tIHRvdGFsOiA2LjQyOTAwMHNlYw0KDQogICAgICAg
ICAgICAgICAgdXNlciAgICAgc3lzdGVtICAgICAgdG90YWwgICAgICAgIHJlYWwNCnV0Zjhf
MSAgICAgIDAuMDAwMDAwICAgMC4wMDAwMDAgICAwLjAwMDAwMCAoICAwLjAwMDAwMCkNCnV0
ZjhfMiAgICAgIDYuMjI5MDAwICAgMC4wMDAwMDAgICA2LjIyOTAwMCAoICA2LjM2OTAwMCkN
CmFzY2lpICAgICAgIDAuMDAwMDAwICAgMC4wMDAwMDAgICAwLjAwMDAwMCAoICAwLjAwMDAw
MCkNClplaWNoZW4gaW4gJ3V0ZjhfMSc6ICAgICAgICAgIDgwMDAwMjENCkJ5dGVzIGluICd1
dGY4XzEnOiAgICAgICAgICAgIDgwMDAwNjMNCkVuY29kaW5nIHZvbiAndXRmOF8xJzogICAg
ICAgIFVURi04DQpaZWljaGVuIDgwMDAwMTEgaW4gJ3V0ZjhfMSc6ICDigqwNCg0KWmVpY2hl
biBpbiAndXRmOF8yJzogICAgICAgICAgODAwMDAyMQ0KQnl0ZXMgaW4gJ3V0ZjhfMic6ICAg
ICAgICAgICAgODAwMDAyMw0KRW5jb2Rpbmcgdm9uICd1dGY4XzInOiAgICAgICAgVVRGLTgN
ClplaWNoZW4gODAwMDAxMSBpbiAndXRmOF8yJzogIHgNCg0KWmVpY2hlbiBpbiAnYXNjaWkn
OiAgICAgICAgICAgODAwMDAyMQ0KQnl0ZXMgaW4gJ2FzY2lpJzogICAgICAgICAgICAgODAw
MDAyMQ0KRW5jb2Rpbmcgdm9uICdhc2NpaSc6ICAgICAgICAgQVNDSUktOEJJVA0KWmVpY2hl
biA4MDAwMDExIGluICdhc2NpaSc6ICAgeA0K
--------------080200040501060506090307--