On Sun, 17 Oct 2004 22:30:52 -0700, Mark Hubbart <discordantus / gmail.com> wrote:
> > There's been some discussion on Ruby Talk lately about Range.member? which tests
> > if a given element (often a number) is a member of the set the Range object
> > iterates over.  Obviously, this kind of test is useful in many aspects of
> > programming, but let's approach this problem from a different angle.
> >
> > This week's quiz is to build a library that adds a class method called build()
> > to Regexp.  build() should accept a variable number of arguments which can
> > include integers and ranges of integers.  Have build() return a Regexp object
> > that will match only integers in the set of passed arguments.
> 
> Well, I came up with two solutions... But I wasn't able to complete
> both of them.

Okay, I guess posting this to the list must have jostled my brain
around and knocked something loose... because soon after I posted
that, I figured out that I had been looking at it the wrong way. So,
look at the end of this email for my final version.

My stipulations:
1. anchored at the beginning and end (of the line).
2. an arbitrary number of leading zeros should be allowed
3. any integer range should be allowed, even those that are partially
or wholly negative

I decided that Ranges act like regexen quite often, so it could make
sense for there to be a #to_re method. So, I created a to_re method
that converts the range into a regular expression. The algorithm used
creates small regexps (compared to splatting and joining the ranges),
so things are much faster than in the brute force method. The actual
Regexp::build method just gives a Regexp union of the results, after
making regexen out of any integer arguments.

I defined the basic algorithm I was working on in my other email like this:

> 1. break the range up into regexp friendly sections, like this:
>   (23..1024) =>
>     23..29,
>     30..99,
>     100..999,
>     1000..1019,
>     1020..1024
> 
> 2. convert each range into a string regexp:
>   23..29 => "2[3-9]"
>   30..99 => "[3-9]\\d"
>   100..999 => "[1-9]\\d\\d"
>   1000..1019 => "10[01]\\d"
>   1020..1024 => "102[0-4]"
> 
> 3. join them all together
>   /^0*(?:2[3-9]|[3-9]\d|[1-9]\d\d|10[01]\d|102[0-4])$/

I got stuck on step one, trying to break up the range. I could tell it
needed to be done in two halves, as my code below shows; but I was
stuck on second half, counting up to the end of the range. I solved
that problem when I realized that I should count down to the middle,
just like I counted up. That sometimes left a middle section behind,
which gets turned into one last range and bundled in with the rest.

Negative ranges are supported by converting them into positive ranges,
then prefixing a - sign in the regexp. If a range spans zero, I split
it in two parts, then join the regexen.

Here's an example of a constructed regexp. Matching is practically
instantaneous, even for large ones:

huge = Regexp.build(0..928346798726)
    ==>/^(?-mix:0*\d|[1-9]\d|[1-9]\d\d|[1-9]\d\d\d|[1-9]\d\d\d\d|[1-9]\d\d\d\d\d|[1-9]\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d\d|[1-8]\d\d\d\d\d\d\d\d\d\d\d|9[01]\d\d\d\d\d\d\d\d\d\d|92[0-7]\d\d\d\d\d\d\d\d\d|928[0-2]\d\d\d\d\d\d\d\d|9283[0-3]\d\d\d\d\d\d\d|92834[0-5]\d\d\d\d\d\d|928346[0-6]\d\d\d\d\d|9283467[0-8]\d\d\d\d|92834679[0-7]\d\d\d|928346798[0-6]\d\d|9283467987[01]\d|92834679872[0-6])$/

"2342234" =~ huge
    ==>0

The code follows:


def Regexp.build(*args)
  ranges, numbers = args.partition{|item| Range === item}
  re = ranges.map{|r| r.to_re } + numbers.map{|n| /0*#{n}/ }
  /^#{Regexp.union(*re)}$/
end
  
  
class Range
  def to_re
    # normalize the range format; we want end inclusive, integer ranges
    # this part passes the load off to a newly built range if needed.
    if exclude_end?
      return( (first.to_i..last.to_i - 1).to_re )
    elsif ! (first + last).kind_of?(Integer)
      return( (first.to_i .. last.to_i).to_re )
    end
    
    # Deal with ranges that are wholly or partially negative. If range is
    # only partially negative, split in half and get two regexen. join them
    # together for the finish. If the range is wholly negative, make it
    # positive, then add a negative sign to the regexp
    if first < 0 and last < 0
      # return a negatized version of the regexp
      return /-#{(-last..-first).to_re}/
    elsif first < 0
      neg = (first..-1).to_re
      pos = (0..last).to_re
      return /(?:#{neg}|#{pos})/
    end    
    
    ### First, create an array of new ranges that are more
    ### suited to regex conversion.
    
    # a and z will be the remainders of the endpoints of the range
    # as we slice it
    a, z = first, last
    
    # build the first part of the list of new ranges.
    list1 = []
    num = first
    until num > z
      a = num # recycle the value
      # get the first power of ten that leaves a remainder
      v = 10
      v *= 10 while num % v == 0 and num != 0
      # compute the next value up
      num += v - num % v
      # store the value unless it's too high
      list1 << (a..num-1) unless num > z
    end
    
    # build the second part of the list; counting down.
    list2 = []
    num = last + 1
    until num < a
      z = num - 1 # recycle the value
      # slice to the nearest power of ten
      v = 10
      v *= 10 while num % v == 0 and num != 0
      # compute the next value down
      num -= num % v
      # store the value if it fits
      list2 << (num..z) unless num < a
    end
    # get the chewey center part, if needed
    center = a < z ? [a..z] : []
    # our new list
    list = list1 + center + list2.reverse
    
    ### Next, convert each range to a regexp.
    list.map! do |rng|
      a, z = rng.first.to_s, rng.last.to_s
      a.split(//).zip(z.split(//)).map do |(f,l)|
        case
          when f == l then f
          when f.to_i + 1 == l.to_i then "[%s%s]" % [f,l]
          when f+l == "09" then "\\d"
          else
            "[%s-%s]" % [f,l]
        end
      end.join # returns the regexp for *that* range
    end
    
    ### Last, return the final regexp
    /0*#{ list.join("|") }/
  end
end


Thanks for a great quiz! It really twisted my brain something fierce.

cheers,
Mark