Daniel Martin wrote: > Daniel Schierbeck <daniel.schierbeck / gmail.com> writes: > >> I'm trying to write a regular expression that matches bencoded >> strings, i.e. strings on the form x:y, where x is the numeric length >> of y. >> >> This is valid: >> >> 6:foobar >> >> while this is not: >> >> 4:foo > > I don't think that what you want to do is possible with a mere regular > expression. > > It might be possible using perl's special > evaluate-code-while-in-regexp (??{ code }) feature, but not with any > language that doesn't allow regular expression evaluations to escape > back into the host language. > > The problem is that you want to leave crucial portions of the regexp > uncompiled until the moment that half of the regular expression has > matched, and this is not possible. > > But matching bencoded data isn't that hard; here's something I just > whipped up that should handle bencoded data: > > require 'strscan' > > class BencodeScanner > def BencodeScanner.scan(str) > scan = StringScanner.new(str) > r = BencodeScanner.doscan_internal(scan,false) > raise "Malformed Bencoded String" unless scan.eos? > r > end > > private > > @@string_regexps = Hash.new {|h,k| h[k] = /:.{#{k}}/m} > > def BencodeScanner.doscan_internal(scanner, allow_e=true) > tok = scanner.scan(/\d+|[ilde]/) > case tok > when nil > raise "Malformed Bencoded String" > when 'e' > raise "Malformed Bencoded String" unless allow_e > return nil > when 'l' > retval = [] > while arritem = BencodeScanner.doscan_internal(scanner) > retval << arritem > end > return retval > when 'd' > retval = {} > while key = BencodeScanner.doscan_internal(scanner) > val = BencodeScanner.doscan_internal(scanner,false) > retval[key] = val > end > return retval > when 'i' > raise "Malformed Bencoded String" unless scanner.scan(/-?\d+e/) > return scanner.matched[0,scanner.matched.length-1].to_i > else > raise "Malformed Bencoded String" unless scanner.scan(@@string_regexps[tok]) > return scanner.matched[1,tok.to_i] > end > end > end Thank you all for your responses! I've been away for the last two days, so I've only just got an opportunity to reply. Daniel, I've further developed your solution: module Bencode class BencodingError < StandardError; end class << self def dump(obj) obj.bencode end def parse(benc) require 'strscan' scanner = StringScanner.new(benc) obj = scan(scanner) raise BencodingError unless scanner.eos? return obj end alias_method :load, :parse private def scan(scanner) case token = scanner.scan(/[ild]|\d+:/) when nil raise BencodingError when "i" number = scanner.scan(/0|(-?[1-9][0-9]*)/) raise BencodingError unless number raise BencodingError unless scanner.scan(/e/) return number when "l" ary = [] until scanner.peek(1) == "e" ary.push(scan(scanner)) end scanner.pos += 1 return ary when "d" hsh = {} until scanner.peek(1) == "e" hsh.store(scan(scanner), scan(scanner)) end scanner.pos += 1 return hsh when /\d+:/ length = token.chop.to_i str = scanner.peek(length) scanner.pos += length return str else raise BencodingError end end end end Cheers, and thank you all for helping me out! Daniel Schierbeck