On 1 Aug 2008, at 15:45, Matthew Moss wrote:
>
> -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D=
-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-
>
> ## Code Heuristics (#172)
>
>
> This week, your task is to make my job simpler.
>
> Each week, coders send in their submissions to Ruby Quiz problems,
> usually as a mix of quiz discussion and actual code. Your task, then,
> is to take a submission as input, and generate output that is the
> extracted code. Every line of the input that isn't code should be
> prefixed with the comment marker, `#`.


I got lazy and thought I'd let ruby do the hard work. Given some text, =20=

I feed it through

eval("BEGIN {return true}\n#{code}", nil)

and see if an exception is raised or not (this does have limitations). =20=

It's not enough to do this line by line, for example

if foo
   puts "bar"
end

the first and last lines are not, on their own, valid ruby, but as a =20
whole it is of course valid.

For a given chunk of text we first try and find the maximal prefix =20
(prefix isn't quite the right word, since we only split at lines) that =20=

is valid ruby.
To do this we take an empty string and just add a line of the input at =20=

a time, running eval each time to see if what we have is valid.

if there is no such prefix, then the first line must be comment text =20
and so we tag that line as a comment. We remove the line and repeat =20
processing on the remaining lines.
if there is such a prefix then that prefix is tagged as code, we =20
remove it and process the remaining lines.

The output formatter sort of does the splitting into separate files - =20=

it prints a mark to the screen where it would split (I was too lazy to =20=

start messing around with files).

What this code doesn't deal well with is lines with not much text, for =20=

example:

I think this does it:
if foo
   bar
end
Fred

The line Fred is marked as code, because that is perfectly legal ruby, =20=

it's just the value of the constant Fred. Of course that would =20
probably blowup if you actually evaluated that line but my valid code =20=

detector can't handle that (I  can't really think how you could handle =20=

this with true certainty without actually executing the code).

Sequences like

hope this helps

also look like legal code (but will produce warnings). To get around =20
this we require that our evaling produces no warnings (and thus we =20
trust that ruby quiz submitters squash warnings from their code :-)).
Another limitation was that if you were trying to say 'this works in =20
ruby 1.8 but only this works in 1.9' then this solution would fail if =20=

the 1.9 code used some ruby 1.9 specific bit of syntax (obviously if =20
the example just uses differences in the standard library that is =20
irrelevant) and you were running this script on ruby 1.8

Fred


Usage: (example stolen from Mikael Hoilund's submission to 171)

CodeExtractor.extract(<<TEXT
Oh hi, I just thought I'd golf a solution. I'm sure other people can =20
do a much better job than I making a full hexdumping suite, so I just =20=

had some fun. Can't seem to get it lower than 78 characters, =20
unfortunately.

i=3D0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join(' =20=

');i+=3D16}

Expanded and parenthesified, clarified:

i =3D 0
ARGF.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' '))
i +=3D 16
}

ARGF (aliased as $<) is the file handle of all file names given in the =20=

arguments concatenated, STDIN if none =97 exactly what we need. The =20
regex to scan matches between 0 and 16 characters (including newline) =20=

greedily. Change it to 1,16 if you don't want the empty line at the end.

Instead of letting the block to scan take an argument, I used a trick =20=

I picked up from the last Ruby Quiz I participated in (Obfuscated =20
Email), and use $& inside the block, which is the last regex match. =20
Saves two characters \o/
TEXT
)

produces as output:

#Oh hi, I just thought I'd golf a solution. I'm sure other people can =20=

do a much better job than I making a full hexdumping suite, so I just =20=

had some fun. Can't seem to get it lower than 78 characters, =20
unfortunately.

i=3D0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join(' =20=

');i+=3D16}

-------
#Expanded and parenthesified, clarified:

i =3D 0
ARGF.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' '))
i +=3D 16
}

-------
#ARGF (aliased as $<) is the file handle of all file names given in =20
the arguments concatenated, STDIN if  exactly what we need. The regex =20=

to scan matches between 0 and 16 characters (including newline) =20
greedily. Change it to 1,16 if you don't want the empty line at the =20
end.none

#Instead of letting the block to scan take an argument, I used a trick =20=

I picked up from the last Ruby Quiz I participated in (Obfuscated =20
Email), and use $& inside the block, which is the last regex match. =20
Saves two characters o/


The code:

require 'stringio'
Struct.new 'Line', :data, :code
class CodeExtractor
   attr_reader :lines, :output

   def initialize(text)
     @output =3D []
     @lines =3D text.split(/[\r\n]/)
   end

   def extract
     while lines.any?
       process lines
     end
   end

   def valid_syntax?(code)
     io =3D StringIO.new
     original_err, $stderr=3D $stderr, io
     eval("BEGIN {return true}\n#{code}")
     raise 'here'
   rescue Exception
     false
   ensure
     $stderr =3D original_err
     return false if io.length > 0
   end

   #returns the maximum number of lines (contiguous from the start) =20
that are valid ruby
   def valid_code_prefix_length lines
     max_valid_lines =3D 0
     code =3D ""
     lines.each_with_index do |line, index|
       code << line
       code << "\n"
       if valid_syntax? code
         max_valid_lines =3D index + 1
       end
     end
     return max_valid_lines
   end

   def append_output(line, code)
     @output << Struct::Line.new(line, code)
   end

   def process lines
     if (prefix_length =3D valid_code_prefix_length lines) > 0
       prefix_length.times { append_output lines.shift, true }
     else
       append_output lines.shift, false
     end
   end

   def format_output
     last_line =3D nil
     @output.each do |line|
       if line.data =3D~ /^\s*$/
         puts ""
         next
       end
       if last_line && last_line.code && !line.code #transition from =20
code to comment
         puts "-------"
       end
       puts "#{line.code ? '':'#'}#{line.data}"
       last_line =3D line
     end
   end

   def self.extract(text)
     c=3D CodeExtractor.new text
     c.extract
     c.format_output
     nil
   end
end=