Daniel Martin wrote:
> Daniel Schierbeck <daniel.schierbeck / gmail.com> writes:
> 
>> I'm trying to write a regular expression that matches bencoded
>> strings, i.e. strings on the form x:y, where x is the numeric length
>> of y.
>>
>> This is valid:
>>
>>   6:foobar
>>
>> while this is not:
>>
>>   4:foo
> 
> I don't think that what you want to do is possible with a mere regular
> expression.
> 
> It might be possible using perl's special
> evaluate-code-while-in-regexp (??{ code }) feature, but not with any
> language that doesn't allow regular expression evaluations to escape
> back into the host language.
> 
> The problem is that you want to leave crucial portions of the regexp
> uncompiled until the moment that half of the regular expression has
> matched, and this is not possible.
> 
> But matching bencoded data isn't that hard; here's something I just
> whipped up that should handle bencoded data:
> 
> require 'strscan'
> 
> class BencodeScanner
>   def BencodeScanner.scan(str)
>     scan = StringScanner.new(str)
>     r = BencodeScanner.doscan_internal(scan,false)
>     raise "Malformed Bencoded String" unless scan.eos?
>     r
>   end
>   
>   private
>   
>   @@string_regexps = Hash.new {|h,k| h[k] = /:.{#{k}}/m}
>   
>   def BencodeScanner.doscan_internal(scanner, allow_e=true)
>     tok = scanner.scan(/\d+|[ilde]/)
>     case tok
>       when nil
>         raise "Malformed Bencoded String"
>       when 'e'
>         raise "Malformed Bencoded String" unless allow_e
>         return nil
>       when 'l'
>         retval = []
>         while arritem = BencodeScanner.doscan_internal(scanner)
>           retval << arritem
>         end
>         return retval
>       when 'd'
>         retval = {}
>         while key = BencodeScanner.doscan_internal(scanner)
>           val = BencodeScanner.doscan_internal(scanner,false)
>           retval[key] = val
>         end
>         return retval
>       when 'i'
>         raise "Malformed Bencoded String" unless scanner.scan(/-?\d+e/)
>         return scanner.matched[0,scanner.matched.length-1].to_i
>       else
>         raise "Malformed Bencoded String" unless scanner.scan(@@string_regexps[tok])
>         return scanner.matched[1,tok.to_i]
>     end
>   end
> end

Thank you all for your responses!

I've been away for the last two days, so I've only just got an 
opportunity to reply.

Daniel, I've further developed your solution:

   module Bencode
     class BencodingError < StandardError; end

     class << self
       def dump(obj)
         obj.bencode
       end

       def parse(benc)
         require 'strscan'

         scanner = StringScanner.new(benc)
         obj = scan(scanner)
         raise BencodingError unless scanner.eos?
         return obj
       end

       alias_method :load, :parse

       private

       def scan(scanner)
         case token = scanner.scan(/[ild]|\d+:/)
         when nil
           raise BencodingError
         when "i"
           number = scanner.scan(/0|(-?[1-9][0-9]*)/)
           raise BencodingError unless number
           raise BencodingError unless scanner.scan(/e/)
           return number
         when "l"
           ary = []
           until scanner.peek(1) == "e"
             ary.push(scan(scanner))
           end
           scanner.pos += 1
           return ary
         when "d"
           hsh = {}
           until scanner.peek(1) == "e"
             hsh.store(scan(scanner), scan(scanner))
           end
           scanner.pos += 1
           return hsh
         when /\d+:/
           length = token.chop.to_i
           str = scanner.peek(length)
           scanner.pos += length
           return str
         else
           raise BencodingError
         end
       end
     end
   end


Cheers, and thank you all for helping me out!
Daniel Schierbeck