Issue #9847 has been updated by Kenneth Guerin.


Errata:
- comments regarding Test #2 in script are incorrect, but the code is still valid
- Issue was *not* noticed in 1.9.2; that was the last version at which the code worked properly
- Issue started in 2.0

----------------------------------------
Bug #9847: Cannot create new String when using File.read(size, buffer) 
https://bugs.ruby-lang.org/issues/9847#change-46779

* Author: Kenneth Guerin
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: current: 2.2.0
* ruby -v: 2.1.2p95
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
This bug was first noticed in version 1.9.2 and is still present in 2.1.2p95

The attached script does the following to highlight this bug:
- create a file of 13 fixed-length records of a specific size: all records contain a repeated letter, A for the first, B for the second, through M
- Test #1: read all records from the file and store them into an array of Strings, using File.read(size) and storing via 'cache << buffer'
- Test #2: read all records from the file and store them into an Array of Strings, using File.read(size, buffer) and storing via 'cache << String.new(buffer)'; buffer will be reused during each read, cache will hold copies
- test cycle is run using a record length of 23 and a record length of 24, highlighting Ruby's optimization of short strings

Results of running this script:
- with a record size of 23, Tests #1 & #2 show the cache containing all records:  [ A, B, ... M ]
- with a record size of 24, Test #1 shows the cache containing all records: [ A, B, ... M ]
- with a record size of 24, Test #2 shows the cache containing: [ M, M, ... M ]

Diagnosis & Notes:
- with a record size > 23 and reading a file using File.read(size, buffer), buffer is in such a state as to prevent a new unique String from being derived
- variations of this script showed that String.new was a creating new String object on each record read, but the contents of all of the Strings stored in the cache Array were being overwritten on each call to File.read
- this shows that String.new(buffer) is creating new String objects, but that the underlying values of all strings based on buffer were sharing the same internal memory
- this behavior exists in the "long string" variation of String; short optimized Strings do not share this property


---Files--------------------------------
strbug.rb (2.07 KB)


-- 
https://bugs.ruby-lang.org/