On 1 Aug 2008, at 15:45, Matthew Moss wrote: > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > ## Code Heuristics (#172) > > > This week, your task is to make my job simpler. > > Each week, coders send in their submissions to Ruby Quiz problems, > usually as a mix of quiz discussion and actual code. Your task, then, > is to take a submission as input, and generate output that is the > extracted code. Every line of the input that isn't code should be > prefixed with the comment marker, `#`. I got lazy and thought I'd let ruby do the hard work. Given some text, I feed it through eval("BEGIN {return true}\n#{code}", nil) and see if an exception is raised or not (this does have limitations). It's not enough to do this line by line, for example if foo puts "bar" end the first and last lines are not, on their own, valid ruby, but as a whole it is of course valid. For a given chunk of text we first try and find the maximal prefix (prefix isn't quite the right word, since we only split at lines) that is valid ruby. To do this we take an empty string and just add a line of the input at a time, running eval each time to see if what we have is valid. if there is no such prefix, then the first line must be comment text and so we tag that line as a comment. We remove the line and repeat processing on the remaining lines. if there is such a prefix then that prefix is tagged as code, we remove it and process the remaining lines. The output formatter sort of does the splitting into separate files - it prints a mark to the screen where it would split (I was too lazy to start messing around with files). What this code doesn't deal well with is lines with not much text, for example: I think this does it: if foo bar end Fred The line Fred is marked as code, because that is perfectly legal ruby, it's just the value of the constant Fred. Of course that would probably blowup if you actually evaluated that line but my valid code detector can't handle that (I can't really think how you could handle this with true certainty without actually executing the code). Sequences like hope this helps also look like legal code (but will produce warnings). To get around this we require that our evaling produces no warnings (and thus we trust that ruby quiz submitters squash warnings from their code :-)). Another limitation was that if you were trying to say 'this works in ruby 1.8 but only this works in 1.9' then this solution would fail if the 1.9 code used some ruby 1.9 specific bit of syntax (obviously if the example just uses differences in the standard library that is irrelevant) and you were running this script on ruby 1.8 Fred Usage: (example stolen from Mikael Hoilund's submission to 171) CodeExtractor.extract(<<TEXT Oh hi, I just thought I'd golf a solution. I'm sure other people can do a much better job than I making a full hexdumping suite, so I just had some fun. Can't seem to get it lower than 78 characters, unfortunately. i=0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join(' ');i+=16} Expanded and parenthesified, clarified: i = 0 ARGF.read.scan(/.{0,16}/m) { puts(("%08x " % i) + $&.unpack('H4'*8).join(' ')) i += 16 } ARGF (aliased as $<) is the file handle of all file names given in the arguments concatenated, STDIN if none exactly what we need. The regex to scan matches between 0 and 16 characters (including newline) greedily. Change it to 1,16 if you don't want the empty line at the end. Instead of letting the block to scan take an argument, I used a trick I picked up from the last Ruby Quiz I participated in (Obfuscated Email), and use $& inside the block, which is the last regex match. Saves two characters \o/ TEXT ) produces as output: #Oh hi, I just thought I'd golf a solution. I'm sure other people can do a much better job than I making a full hexdumping suite, so I just had some fun. Can't seem to get it lower than 78 characters, unfortunately. i=0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join(' ');i+=16} ------- #Expanded and parenthesified, clarified: i = 0 ARGF.read.scan(/.{0,16}/m) { puts(("%08x " % i) + $&.unpack('H4'*8).join(' ')) i += 16 } ------- #ARGF (aliased as $<) is the file handle of all file names given in the arguments concatenated, STDIN if exactly what we need. The regex to scan matches between 0 and 16 characters (including newline) greedily. Change it to 1,16 if you don't want the empty line at the end.none #Instead of letting the block to scan take an argument, I used a trick I picked up from the last Ruby Quiz I participated in (Obfuscated Email), and use $& inside the block, which is the last regex match. Saves two characters o/ The code: require 'stringio' Struct.new 'Line', :data, :code class CodeExtractor attr_reader :lines, :output def initialize(text) @output = [] @lines = text.split(/[\r\n]/) end def extract while lines.any? process lines end end def valid_syntax?(code) io = StringIO.new original_err, $stderr= $stderr, io eval("BEGIN {return true}\n#{code}") raise 'here' rescue Exception false ensure $stderr = original_err return false if io.length > 0 end #returns the maximum number of lines (contiguous from the start) that are valid ruby def valid_code_prefix_length lines max_valid_lines = 0 code = "" lines.each_with_index do |line, index| code << line code << "\n" if valid_syntax? code max_valid_lines = index + 1 end end return max_valid_lines end def append_output(line, code) @output << Struct::Line.new(line, code) end def process lines if (prefix_length = valid_code_prefix_length lines) > 0 prefix_length.times { append_output lines.shift, true } else append_output lines.shift, false end end def format_output last_line = nil @output.each do |line| if line.data =~ /^\s*$/ puts "" next end if last_line && last_line.code && !line.code #transition from code to comment puts "-------" end puts "#{line.code ? '':'#'}#{line.data}" last_line = line end end def self.extract(text) c= CodeExtractor.new text c.extract c.format_output nil end end