Bugs item #2154, was opened at 2005-07-23 10:28
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1698&aid=2154&group_id=426

Category: Standard Library
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: John Halderman (jhalderman)
Assigned to: Nobody (None)
Summary: ^ and \A match in the middle of the string when using StringScanner.scan

Initial Comment:
The ri documentation says this about StringScanner:

     Scanning a string means remembering the position of a _scan
     pointer_, which is just an index. The point of scanning is to move
     forward a bit at a time, so matches are sought after the scan
     pointer; usually immediately after it.

     Given the string "test string", here are the pertinent scan pointer
     positions:

         t e s t   s t r i n g
       0 1 2 ...             1
                             0

     When you #scan for a pattern (a regular expression), the match must
     occur at the character after the scan pointer. If you use
     #scan_until, then the match can occur anywhere after the scan
     pointer. In both cases, the scan pointer moves _just beyond_ the
     last character of the match, ready to scan again from the next
     character onwards. This is demonstrated by the example above.

When you match a ^ the match always happens at the first character after a \n or at the beginning of a string. Therefore the position of the match would still be valid for the purposes of scan even though the \n was before the current scan position. This can be demonstrated with the following code:

r = /^abc/
s = "efg\nabc"
m = r.match(s)
s[m.begin(0)..m.end(0)]

which produces the following output:

irb(main):001:0> r = /^abc/
=> /^abc/
irb(main):002:0> s = "efg\nabc"
=> "efg\nabc"
irb(main):003:0> m = r.match(s)
=> #<MatchData:0xb7eaf45c>
irb(main):004:0> s[m.begin(0)..m.end(0)]
=> "abc"

As you can see, the \n is not included in the match but is required for the match to occur.

To illustrate the problem I am talking about more clearly, I have provided the following code:

require 'strscan'
sc = StringScanner.new("the white elephant eats grass")
sc.scan(/the\s+/)
sc.bol?
sc.scan(/^white\s+/)
sc.scan(/\Aelephant\s+/)

this code produced the following result using irb:

irb(main):001:0> require 'strscan'
=> true
irb(main):002:0> sc = StringScanner.new("the white elephant eats grass")
=> #<StringScanner 0/29 @ "the w...">
irb(main):003:0> sc.scan(/the\s+/)
=> "the "
irb(main):004:0> sc.bol?
=> false
irb(main):005:0> sc.scan(/^white\s+/)
=> "white "
irb(main):006:0> sc.scan(/\Aelephant\s+/)
=> "elephant "

----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1698&aid=2154&group_id=426