Issue #14126 has been updated by MSP-Greg (Greg L).


@nobu

Thank you for the patch, as the lex array looks as I would think it should (I'm not that familiar with parsers.)

Using 60884, `Ripper.sexp_raw` and `Ripper.sexp` now return nil.  They both 'worked' using 60863 and 60875.

I've also got an error in YARD's parsing of `syntax error in ``:(2,3): syntax error, unexpected tSTRING_CONTENT, expecting tSTRING_END` using the following input:

```ruby
%w(
  a
  b
  c
  d
)
```

YARD's parser mostly hooks into Rippers events; I think the error is actually raised by Ripper.  Not sure, as I've spent more time with YARD c parser than its ruby parser...

Thanks, Greg

----------------------------------------
Bug #14126: Recent parse.y (Ripper) changes - lexing, tokenizing
https://bugs.ruby-lang.org/issues/14126#change-67907

* Author: MSP-Greg (Greg L)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.0dev (2017-11-22 trunk 60878) [x64-mingw32]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
First of all, I'd like to thank @yui-knk for all the work on `parse.y`.  I assume some of it is due the movement of `RDoc` from 'seattlerb' to 'ruby', along with `RDoc` now using Ripper instead of its own parser.

I'm a `YARD` user.  Recent commits have broken some of `YARD`'s parsing code, although many of the commits actually fixed odd behavior in `Ripper`.  I did find one thing that seems odd.

It centers on whether `Ripper.tokenize(src).join('') == src` or `Ripper.tokenize(src).join('').length == src.length` should be true.  I believe the actual issue for YARD is the following constraint:

```
src == Ripper.lex(src).each { |t| combined << t[2] }
```

Using the listed code, svn 60863 shows true for every source string, but 60878 shows false.  The extra white-space content that appears in the `:on_tstring_content` members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the `:on_words_sep` (or `:on_qwords_beg`) members.

```ruby
# frozen_string_literal: true

require 'ripper'
require 'pp'

module RipperPercent

  def self.run
      output "%w(\n  AA\n  BB\n  CC\n  DD\n)"
      output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)"
      output "%w(\n  AA  BB  CC  DD\n)"
  end

  def self.output(s)
    combined = ''.dup
    Ripper.lex(s).each { |t| combined << t[2] }
    puts
    puts "src    #{s.gsub("\n", "\\n")}"
    puts "lexed  #{combined.gsub("\n", "\\n")}"
    puts "src == lexed is #{s == combined}"

    # puts ; pp Ripper.lex(s)
    # puts Ripper.tokenize(s).inspect
    # pp Ripper.sexp_raw(s)
  end
end
RipperPercent.run
```

As mentioned previously, I'm not much of a c type, and much of `Ripper` is not doc'd very well.  Hence, I don't think I can fix this, if indeed it's an issue.  I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'.

Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented?

Thanks, Greg




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>