This is a multi-part message in MIME format. --------------080200040501060506090307 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi! Some days ago I made some tests with Ruby 1.9.0-0 in respect to performance when handling utf-8 encoded strings, compared with ascii strings. In one case I received a result which I couldn't believe - my first impression was, that I made an error. I presented the code and the result in the German Ruby forum an the results were confirmed. I made the test with Ruby 1.9.0-0, build on Windows2000 using MinGW/MSYS, 256MB, 700MHz. in two cases of the following test I used two utf-8 encoded string (utf8_1 and utf8_2), both containing 'abcdefgh' * 1000000 + '. The Test uses 10 times String#<< to add one character, " (three bytes in case of utf8_1, and 'x' (one byte) in case of utf8_2. The result was, that it takes "no time" to add the Euro-sign, but more that six seconds to add the "x" (I attached the source and the result - both in Windows format with "\r\n", utf-8 encoded with BOM). Here is the listing of the program and the result. >>>>> Code >>>>> require 'benchmark' include Benchmark utf8_1 'abcdefgh' * 1000000 + ' utf8_2 'abcdefgh' * 1000000 + ' ascii 'abcdefgh' * 1000000 + 'x' puts "Zeichen in 'utf8_1': #{utf8_1.length}" puts "Bytes in 'utf8_1': #{utf8_1.bytesize}" puts "Encoding von 'utf8_1': #{utf8_1.encoding}" puts "Zeichen 8000001 in 'utf8_1': #{utf8_1[8000000]}" puts "" puts "Zeichen in 'utf8_2': #{utf8_2.length}" puts "Bytes in 'utf8_2': #{utf8_2.bytesize}" puts "Encoding von 'utf8_2': #{utf8_2.encoding}" puts "Zeichen 8000001 in 'utf8_2': #{utf8_2[8000000]}" puts "" puts "Zeichen in 'ascii': #{ascii.length}" puts "Bytes in 'ascii': #{ascii.bytesize}" puts "Encoding von 'ascii': #{ascii.encoding}" puts "Zeichen 8000001 in 'ascii': #{ascii[8000000]}" puts "" bmbm(10) do |t| t.report('utf8_1'){10.times{utf8_1 << '}} t.report('utf8_2'){10.times{utf8_2 << 'x'}} t.report('ascii'){10.times{ascii << 'x'}} end puts "Zeichen in 'utf8_1': #{utf8_1.length}" puts "Bytes in 'utf8_1': #{utf8_1.bytesize}" puts "Encoding von 'utf8_1': #{utf8_1.encoding}" puts "Zeichen 8000011 in 'utf8_1': #{utf8_1[8000010]}" puts "" puts "Zeichen in 'utf8_2': #{utf8_2.length}" puts "Bytes in 'utf8_2': #{utf8_2.bytesize}" puts "Encoding von 'utf8_2': #{utf8_2.encoding}" puts "Zeichen 8000011 in 'utf8_2': #{utf8_2[8000010]}" puts "" puts "Zeichen in 'ascii': #{ascii.length}" puts "Bytes in 'ascii': #{ascii.bytesize}" puts "Encoding von 'ascii': #{ascii.encoding}" puts "Zeichen 8000011 in 'ascii': #{ascii[8000010]}" >>>>> Result >>>>> Zeichen in 'utf8_1': 8000001 Bytes in 'utf8_1': 8000003 Encoding von 'utf8_1': UTF-8 Zeichen 8000001 in 'utf8_1': Zeichen in 'utf8_2': 8000001 Bytes in 'utf8_2': 8000003 Encoding von 'utf8_2': UTF-8 Zeichen 8000001 in 'utf8_2': Zeichen in 'ascii': 8000001 Bytes in 'ascii': 8000001 Encoding von 'ascii': ASCII-8BIT Zeichen 8000001 in 'ascii': x Rehearsal --------------------------------------------- utf8_1 0.050000 0.010000 0.060000 ( 0.080000) utf8_2 6.279000 0.030000 6.309000 ( 6.349000) ascii 0.050000 0.010000 0.060000 ( 0.071000) ------------------------------------ total: 6.429000sec user system total real utf8_1 0.000000 0.000000 0.000000 ( 0.000000) utf8_2 6.229000 0.000000 6.229000 ( 6.369000) ascii 0.000000 0.000000 0.000000 ( 0.000000) Zeichen in 'utf8_1': 8000021 Bytes in 'utf8_1': 8000063 Encoding von 'utf8_1': UTF-8 Zeichen 8000011 in 'utf8_1': Zeichen in 'utf8_2': 8000021 Bytes in 'utf8_2': 8000023 Encoding von 'utf8_2': UTF-8 Zeichen 8000011 in 'utf8_2': x Zeichen in 'ascii': 8000021 Bytes in 'ascii': 8000021 Encoding von 'ascii': ASCII-8BIT Zeichen 8000011 in 'ascii': x >>>>> EoR >>>>> Wolfgang NĂ¡dasi-Donner --------------080200040501060506090307 Content-Type: text/plain; name ddchar.rb" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename ddchar.rb" 77u/cmVxdWlyZSAnYmVuY2htYXJrJw0KaW5jbHVkZSBCZW5jaG1hcmsNCg0KdXRmOF8xID0g ICAgICdhYmNkZWZnaCcgKiAxMDAwMDAwICsgJ+KCrCcNCnV0ZjhfMiA9ICAgICAnYWJjZGVm Z2gnICogMTAwMDAwMCArICfigqwnDQphc2NpaSA9ICAgICdhYmNkZWZnaCcgKiAxMDAwMDAw ICsgJ3gnDQoNCnB1dHMgIlplaWNoZW4gaW4gJ3V0ZjhfMSc6ICAgICAgICAgICN7dXRmOF8x Lmxlbmd0aH0iDQpwdXRzICJCeXRlcyBpbiAndXRmOF8xJzogICAgICAgICAgICAje3V0Zjhf MS5ieXRlc2l6ZX0iDQpwdXRzICJFbmNvZGluZyB2b24gJ3V0ZjhfMSc6ICAgICAgICAje3V0 ZjhfMS5lbmNvZGluZ30iDQpwdXRzICJaZWljaGVuIDgwMDAwMDEgaW4gJ3V0ZjhfMSc6ICAj e3V0ZjhfMVs4MDAwMDAwXX0iDQpwdXRzICIiDQpwdXRzICJaZWljaGVuIGluICd1dGY4XzIn OiAgICAgICAgICAje3V0ZjhfMi5sZW5ndGh9Ig0KcHV0cyAiQnl0ZXMgaW4gJ3V0ZjhfMic6 ICAgICAgICAgICAgI3t1dGY4XzIuYnl0ZXNpemV9Ig0KcHV0cyAiRW5jb2Rpbmcgdm9uICd1 dGY4XzInOiAgICAgICAgI3t1dGY4XzIuZW5jb2Rpbmd9Ig0KcHV0cyAiWmVpY2hlbiA4MDAw MDAxIGluICd1dGY4XzInOiAgI3t1dGY4XzJbODAwMDAwMF19Ig0KcHV0cyAiIg0KcHV0cyAi WmVpY2hlbiBpbiAnYXNjaWknOiAgICAgICAgICAgI3thc2NpaS5sZW5ndGh9Ig0KcHV0cyAi Qnl0ZXMgaW4gJ2FzY2lpJzogICAgICAgICAgICAgI3thc2NpaS5ieXRlc2l6ZX0iDQpwdXRz ICJFbmNvZGluZyB2b24gJ2FzY2lpJzogICAgICAgICAje2FzY2lpLmVuY29kaW5nfSINCnB1 dHMgIlplaWNoZW4gODAwMDAwMSBpbiAnYXNjaWknOiAgICN7YXNjaWlbODAwMDAwMF19Ig0K cHV0cyAiIg0KDQpibWJtKDEwKSBkbyB8dHwNCiAgdC5yZXBvcnQoJ3V0ZjhfMScpezEwLnRp bWVze3V0ZjhfMSA8PCAn4oKsJ319DQogIHQucmVwb3J0KCd1dGY4XzInKXsxMC50aW1lc3t1 dGY4XzIgPDwgJ3gnfX0NCiAgdC5yZXBvcnQoJ2FzY2lpJyl7MTAudGltZXN7YXNjaWkgPDwg J3gnfX0NCmVuZA0KDQpwdXRzICJaZWljaGVuIGluICd1dGY4XzEnOiAgICAgICAgICAje3V0 ZjhfMS5sZW5ndGh9Ig0KcHV0cyAiQnl0ZXMgaW4gJ3V0ZjhfMSc6ICAgICAgICAgICAgI3t1 dGY4XzEuYnl0ZXNpemV9Ig0KcHV0cyAiRW5jb2Rpbmcgdm9uICd1dGY4XzEnOiAgICAgICAg I3t1dGY4XzEuZW5jb2Rpbmd9Ig0KcHV0cyAiWmVpY2hlbiA4MDAwMDExIGluICd1dGY4XzEn OiAgI3t1dGY4XzFbODAwMDAxMF19Ig0KcHV0cyAiIg0KcHV0cyAiWmVpY2hlbiBpbiAndXRm OF8yJzogICAgICAgICAgI3t1dGY4XzIubGVuZ3RofSINCnB1dHMgIkJ5dGVzIGluICd1dGY4 XzInOiAgICAgICAgICAgICN7dXRmOF8yLmJ5dGVzaXplfSINCnB1dHMgIkVuY29kaW5nIHZv biAndXRmOF8yJzogICAgICAgICN7dXRmOF8yLmVuY29kaW5nfSINCnB1dHMgIlplaWNoZW4g ODAwMDAxMSBpbiAndXRmOF8yJzogICN7dXRmOF8yWzgwMDAwMTBdfSINCnB1dHMgIiINCnB1 dHMgIlplaWNoZW4gaW4gJ2FzY2lpJzogICAgICAgICAgICN7YXNjaWkubGVuZ3RofSINCnB1 dHMgIkJ5dGVzIGluICdhc2NpaSc6ICAgICAgICAgICAgICN7YXNjaWkuYnl0ZXNpemV9Ig0K cHV0cyAiRW5jb2Rpbmcgdm9uICdhc2NpaSc6ICAgICAgICAgI3thc2NpaS5lbmNvZGluZ30i DQpwdXRzICJaZWljaGVuIDgwMDAwMTEgaW4gJ2FzY2lpJzogICAje2FzY2lpWzgwMDAwMTBd fSI--------------080200040501060506090307 Content-Type: text/plain; name ddchar.txt" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename ddchar.txt" 77u/WmVpY2hlbiBpbiAndXRmOF8xJzogICAgICAgICAgODAwMDAwMQ0KQnl0ZXMgaW4gJ3V0 ZjhfMSc6ICAgICAgICAgICAgODAwMDAwMw0KRW5jb2Rpbmcgdm9uICd1dGY4XzEnOiAgICAg ICAgVVRGLTgNClplaWNoZW4gODAwMDAwMSBpbiAndXRmOF8xJzogIOKCrA0KDQpaZWljaGVu IGluICd1dGY4XzInOiAgICAgICAgICA4MDAwMDAxDQpCeXRlcyBpbiAndXRmOF8yJzogICAg ICAgICAgICA4MDAwMDAzDQpFbmNvZGluZyB2b24gJ3V0ZjhfMic6ICAgICAgICBVVEYtOA0K WmVpY2hlbiA4MDAwMDAxIGluICd1dGY4XzInOiAg4oKsDQoNClplaWNoZW4gaW4gJ2FzY2lp JzogICAgICAgICAgIDgwMDAwMDENCkJ5dGVzIGluICdhc2NpaSc6ICAgICAgICAgICAgIDgw MDAwMDENCkVuY29kaW5nIHZvbiAnYXNjaWknOiAgICAgICAgIEFTQ0lJLThCSVQNClplaWNo ZW4gODAwMDAwMSBpbiAnYXNjaWknOiAgIHgNCg0KUmVoZWFyc2FsIC0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KdXRmOF8xICAgICAgMC4wNTAwMDAg ICAwLjAxMDAwMCAgIDAuMDYwMDAwICggIDAuMDgwMDAwKQ0KdXRmOF8yICAgICAgNi4yNzkw MDAgICAwLjAzMDAwMCAgIDYuMzA5MDAwICggIDYuMzQ5MDAwKQ0KYXNjaWkgICAgICAgMC4w NTAwMDAgICAwLjAxMDAwMCAgIDAuMDYwMDAwICggIDAuMDcxMDAwKQ0KLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tIHRvdGFsOiA2LjQyOTAwMHNlYw0KDQogICAgICAg ICAgICAgICAgdXNlciAgICAgc3lzdGVtICAgICAgdG90YWwgICAgICAgIHJlYWwNCnV0Zjhf MSAgICAgIDAuMDAwMDAwICAgMC4wMDAwMDAgICAwLjAwMDAwMCAoICAwLjAwMDAwMCkNCnV0 ZjhfMiAgICAgIDYuMjI5MDAwICAgMC4wMDAwMDAgICA2LjIyOTAwMCAoICA2LjM2OTAwMCkN CmFzY2lpICAgICAgIDAuMDAwMDAwICAgMC4wMDAwMDAgICAwLjAwMDAwMCAoICAwLjAwMDAw MCkNClplaWNoZW4gaW4gJ3V0ZjhfMSc6ICAgICAgICAgIDgwMDAwMjENCkJ5dGVzIGluICd1 dGY4XzEnOiAgICAgICAgICAgIDgwMDAwNjMNCkVuY29kaW5nIHZvbiAndXRmOF8xJzogICAg ICAgIFVURi04DQpaZWljaGVuIDgwMDAwMTEgaW4gJ3V0ZjhfMSc6ICDigqwNCg0KWmVpY2hl biBpbiAndXRmOF8yJzogICAgICAgICAgODAwMDAyMQ0KQnl0ZXMgaW4gJ3V0ZjhfMic6ICAg ICAgICAgICAgODAwMDAyMw0KRW5jb2Rpbmcgdm9uICd1dGY4XzInOiAgICAgICAgVVRGLTgN ClplaWNoZW4gODAwMDAxMSBpbiAndXRmOF8yJzogIHgNCg0KWmVpY2hlbiBpbiAnYXNjaWkn OiAgICAgICAgICAgODAwMDAyMQ0KQnl0ZXMgaW4gJ2FzY2lpJzogICAgICAgICAgICAgODAw MDAyMQ0KRW5jb2Rpbmcgdm9uICdhc2NpaSc6ICAgICAgICAgQVNDSUktOEJJVA0KWmVpY2hl biA4MDAwMDExIGluICdhc2NpaSc6ICAgeA0K --------------080200040501060506090307--