On Sun, 17 Oct 2004 22:30:52 -0700, Mark Hubbart <discordantus / gmail.com> wrote: > > There's been some discussion on Ruby Talk lately about Range.member? which tests > > if a given element (often a number) is a member of the set the Range object > > iterates over. Obviously, this kind of test is useful in many aspects of > > programming, but let's approach this problem from a different angle. > > > > This week's quiz is to build a library that adds a class method called build() > > to Regexp. build() should accept a variable number of arguments which can > > include integers and ranges of integers. Have build() return a Regexp object > > that will match only integers in the set of passed arguments. > > Well, I came up with two solutions... But I wasn't able to complete > both of them. Okay, I guess posting this to the list must have jostled my brain around and knocked something loose... because soon after I posted that, I figured out that I had been looking at it the wrong way. So, look at the end of this email for my final version. My stipulations: 1. anchored at the beginning and end (of the line). 2. an arbitrary number of leading zeros should be allowed 3. any integer range should be allowed, even those that are partially or wholly negative I decided that Ranges act like regexen quite often, so it could make sense for there to be a #to_re method. So, I created a to_re method that converts the range into a regular expression. The algorithm used creates small regexps (compared to splatting and joining the ranges), so things are much faster than in the brute force method. The actual Regexp::build method just gives a Regexp union of the results, after making regexen out of any integer arguments. I defined the basic algorithm I was working on in my other email like this: > 1. break the range up into regexp friendly sections, like this: > (23..1024) => > 23..29, > 30..99, > 100..999, > 1000..1019, > 1020..1024 > > 2. convert each range into a string regexp: > 23..29 => "2[3-9]" > 30..99 => "[3-9]\\d" > 100..999 => "[1-9]\\d\\d" > 1000..1019 => "10[01]\\d" > 1020..1024 => "102[0-4]" > > 3. join them all together > /^0*(?:2[3-9]|[3-9]\d|[1-9]\d\d|10[01]\d|102[0-4])$/ I got stuck on step one, trying to break up the range. I could tell it needed to be done in two halves, as my code below shows; but I was stuck on second half, counting up to the end of the range. I solved that problem when I realized that I should count down to the middle, just like I counted up. That sometimes left a middle section behind, which gets turned into one last range and bundled in with the rest. Negative ranges are supported by converting them into positive ranges, then prefixing a - sign in the regexp. If a range spans zero, I split it in two parts, then join the regexen. Here's an example of a constructed regexp. Matching is practically instantaneous, even for large ones: huge = Regexp.build(0..928346798726) ==>/^(?-mix:0*\d|[1-9]\d|[1-9]\d\d|[1-9]\d\d\d|[1-9]\d\d\d\d|[1-9]\d\d\d\d\d|[1-9]\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d\d|[1-8]\d\d\d\d\d\d\d\d\d\d\d|9[01]\d\d\d\d\d\d\d\d\d\d|92[0-7]\d\d\d\d\d\d\d\d\d|928[0-2]\d\d\d\d\d\d\d\d|9283[0-3]\d\d\d\d\d\d\d|92834[0-5]\d\d\d\d\d\d|928346[0-6]\d\d\d\d\d|9283467[0-8]\d\d\d\d|92834679[0-7]\d\d\d|928346798[0-6]\d\d|9283467987[01]\d|92834679872[0-6])$/ "2342234" =~ huge ==>0 The code follows: def Regexp.build(*args) ranges, numbers = args.partition{|item| Range === item} re = ranges.map{|r| r.to_re } + numbers.map{|n| /0*#{n}/ } /^#{Regexp.union(*re)}$/ end class Range def to_re # normalize the range format; we want end inclusive, integer ranges # this part passes the load off to a newly built range if needed. if exclude_end? return( (first.to_i..last.to_i - 1).to_re ) elsif ! (first + last).kind_of?(Integer) return( (first.to_i .. last.to_i).to_re ) end # Deal with ranges that are wholly or partially negative. If range is # only partially negative, split in half and get two regexen. join them # together for the finish. If the range is wholly negative, make it # positive, then add a negative sign to the regexp if first < 0 and last < 0 # return a negatized version of the regexp return /-#{(-last..-first).to_re}/ elsif first < 0 neg = (first..-1).to_re pos = (0..last).to_re return /(?:#{neg}|#{pos})/ end ### First, create an array of new ranges that are more ### suited to regex conversion. # a and z will be the remainders of the endpoints of the range # as we slice it a, z = first, last # build the first part of the list of new ranges. list1 = [] num = first until num > z a = num # recycle the value # get the first power of ten that leaves a remainder v = 10 v *= 10 while num % v == 0 and num != 0 # compute the next value up num += v - num % v # store the value unless it's too high list1 << (a..num-1) unless num > z end # build the second part of the list; counting down. list2 = [] num = last + 1 until num < a z = num - 1 # recycle the value # slice to the nearest power of ten v = 10 v *= 10 while num % v == 0 and num != 0 # compute the next value down num -= num % v # store the value if it fits list2 << (num..z) unless num < a end # get the chewey center part, if needed center = a < z ? [a..z] : [] # our new list list = list1 + center + list2.reverse ### Next, convert each range to a regexp. list.map! do |rng| a, z = rng.first.to_s, rng.last.to_s a.split(//).zip(z.split(//)).map do |(f,l)| case when f == l then f when f.to_i + 1 == l.to_i then "[%s%s]" % [f,l] when f+l == "09" then "\\d" else "[%s-%s]" % [f,l] end end.join # returns the regexp for *that* range end ### Last, return the final regexp /0*#{ list.join("|") }/ end end Thanks for a great quiz! It really twisted my brain something fierce. cheers, Mark