On 1 Aug 2008, at 15:45, Matthew Moss wrote:
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
> ## Code Heuristics (#172)
>
>
> This week, your task is to make my job simpler.
>
> Each week, coders send in their submissions to Ruby Quiz problems,
> usually as a mix of quiz discussion and actual code. Your task, then,
> is to take a submission as input, and generate output that is the
> extracted code. Every line of the input that isn't code should be
> prefixed with the comment marker, `#`.


I got lazy and thought I'd let ruby do the hard work. Given some text,  eed it through

eval("BEGIN {return true}\n#{code}", nil)

and see if an exception is raised or not (this does have limitations).  t's not enough to do this line by line, for example

if foo
   puts "bar"
end

the first and last lines are not, on their own, valid ruby, but as a  
whole it is of course valid.

For a given chunk of text we first try and find the maximal prefix  
(prefix isn't quite the right word, since we only split at lines) that  s valid ruby.
To do this we take an empty string and just add a line of the input at  ime, running eval each time to see if what we have is valid.

if there is no such prefix, then the first line must be comment text  
and so we tag that line as a comment. We remove the line and repeat  
processing on the remaining lines.
if there is such a prefix then that prefix is tagged as code, we  
remove it and process the remaining lines.

The output formatter sort of does the splitting into separate files -  t prints a mark to the screen where it would split (I was too lazy to  tart messing around with files).

What this code doesn't deal well with is lines with not much text, for  xample:

I think this does it:
if foo
   bar
end
Fred

The line Fred is marked as code, because that is perfectly legal ruby,  t's just the value of the constant Fred. Of course that would  
probably blowup if you actually evaluated that line but my valid code  etector can't handle that (I  can't really think how you could handle  his with true certainty without actually executing the code).

Sequences like

hope this helps

also look like legal code (but will produce warnings). To get around  
this we require that our evaling produces no warnings (and thus we  
trust that ruby quiz submitters squash warnings from their code :-)).
Another limitation was that if you were trying to say 'this works in  
ruby 1.8 but only this works in 1.9' then this solution would fail if  he 1.9 code used some ruby 1.9 specific bit of syntax (obviously if  
the example just uses differences in the standard library that is  
irrelevant) and you were running this script on ruby 1.8

Fred


Usage: (example stolen from Mikael Hoilund's submission to 171)

CodeExtractor.extract(<<TEXT
Oh hi, I just thought I'd golf a solution. I'm sure other people can  
do a much better job than I making a full hexdumping suite, so I just  ad some fun. Can't seem to get it lower than 78 characters,  
unfortunately.

i=0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join('  );i+=16}

Expanded and parenthesified, clarified:

i = 0
ARGF.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' '))
i += 16
}

ARGF (aliased as $<) is the file handle of all file names given in the  rguments concatenated, STDIN if none exactly what we need. The  
regex to scan matches between 0 and 16 characters (including newline)  reedily. Change it to 1,16 if you don't want the empty line at the end.

Instead of letting the block to scan take an argument, I used a trick  icked up from the last Ruby Quiz I participated in (Obfuscated  
Email), and use $& inside the block, which is the last regex match.  
Saves two characters \o/
TEXT
)

produces as output:

#Oh hi, I just thought I'd golf a solution. I'm sure other people can  o a much better job than I making a full hexdumping suite, so I just  ad some fun. Can't seem to get it lower than 78 characters,  
unfortunately.

i=0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join('  );i+=16}

-------
#Expanded and parenthesified, clarified:

i = 0
ARGF.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' '))
i += 16
}

-------
#ARGF (aliased as $<) is the file handle of all file names given in  
the arguments concatenated, STDIN if  exactly what we need. The regex  o scan matches between 0 and 16 characters (including newline)  
greedily. Change it to 1,16 if you don't want the empty line at the  
end.none

#Instead of letting the block to scan take an argument, I used a trick  icked up from the last Ruby Quiz I participated in (Obfuscated  
Email), and use $& inside the block, which is the last regex match.  
Saves two characters o/


The code:

require 'stringio'
Struct.new 'Line', :data, :code
class CodeExtractor
   attr_reader :lines, :output

   def initialize(text)
     @output = []
     @lines = text.split(/[\r\n]/)
   end

   def extract
     while lines.any?
       process lines
     end
   end

   def valid_syntax?(code)
     io = StringIO.new
     original_err, $stderr= $stderr, io
     eval("BEGIN {return true}\n#{code}")
     raise 'here'
   rescue Exception
     false
   ensure
     $stderr = original_err
     return false if io.length > 0
   end

   #returns the maximum number of lines (contiguous from the start)  
that are valid ruby
   def valid_code_prefix_length lines
     max_valid_lines = 0
     code = ""
     lines.each_with_index do |line, index|
       code << line
       code << "\n"
       if valid_syntax? code
         max_valid_lines = index + 1
       end
     end
     return max_valid_lines
   end

   def append_output(line, code)
     @output << Struct::Line.new(line, code)
   end

   def process lines
     if (prefix_length = valid_code_prefix_length lines) > 0
       prefix_length.times { append_output lines.shift, true }
     else
       append_output lines.shift, false
     end
   end

   def format_output
     last_line = nil
     @output.each do |line|
       if line.data =~ /^\s*$/
         puts ""
         next
       end
       if last_line && last_line.code && !line.code #transition from  
code to comment
         puts "-------"
       end
       puts "#{line.code ? '':'#'}#{line.data}"
       last_line = line
     end
   end

   def self.extract(text)
     c= CodeExtractor.new text
     c.extract
     c.format_output
     nil
   end