On Oct 17, 2004, at 1:12 PM, Jamis Buck wrote:

> So, according to my calculations, 48+ hours have elapsed.
>
> Thus, here's my solution to Regexp.build(). I assumed the following:

My solution is pretty different and admittedly only so, so in 
functionality.

My main idea was to treat all passed parameters as character data.  
This solves the leading zeros problem by letting you pass things like 
(1..60, "01".."09").  In addition, this approach also allows you to 
pass non-numerical data, though that wasn't part of the quiz.

The other main point of my implementation was to not anchor at all.  
This may make built Regexps less convenient to use, but by allowing you 
to embed them in other patterns it greatly increases usability.  For 
example, if you would like to allow for arbitrary leading zeros, you 
just embed the result of build() in another Regexp object with a 
leading "0*".  You can use embedding to provide whatever anchoring you 
need, setup your own captures, or even to combine several built Regexp 
objects.

Well, all that is how I intended this to work.  It even gets close at 
times.  <laughs>  Unfortunately, my character collapsing system (to 
regex character classes) is dog slow and only works correctly on 
numerical data.  Put simply, my library makes the quiz's (1..1_000_000) 
example impractical in build time.  If I had it to do over, I would 
approach this part of the problem from a completely different angle.  
This is the one I built to throw away, as the saying goes.

I'll post my library below, and then my unit tests, which probably 
better convey what I was aiming for.

James Edward Gray II

#!/usr/bin/env ruby

class Regexp
	def self.build( *nums )
		nums = nums.map { |e| Array(e) }.flatten.map { |e| String(e) }
		nums = nums.sort_by { |e| [-e.length, e] }
		
		patterns = [ ]
		while nums.size > 0
			eq, nums = nums.partition { |e| e.length == nums[0].length }
			patterns.push(*build_char_classes( eq ))
		end
		
		/(?:#{patterns.join("|")})/
	end
	
	private
	
	def self.build_char_classes( eq_len_strs )
		results = [ ]

		while eq_len_strs.size > 1
			first = eq_len_strs.shift
			if md = /^([^\[]*)([^\[])(.*)$/.match(first)
				chars = md[2]
				matches, eq_len_strs = eq_len_strs.partition do |e|
					e =~ /^#{md[1]}(.)#{Regexp.escape md[3]}$/ and chars << $1
				end
				if matches.size == 0
					results << first
					next
				end
				
				chars = build_short_class(chars.squeeze)
				eq_len_strs << "#{md[1]}[#{chars}]#{md[3]}"
			else
				results << first
			end
		end
		results << eq_len_strs[0] if eq_len_strs.size == 1

		results
	end

	def self.build_short_class( char_class )
		while md = /[^\-\0]{3,}/.match(char_class)
			short = md[0][1..-1].split("").inject(md[0][0, 1]) do |mem, c|
				if (mem.length == 1 or mem[-2] != ?-) and mem[-1, 1].succ == c
					mem + "-" + c
				elsif mem[-2, 2] =~ /-(.)/ and $1.succ == c
					mem[0..-2] + c
				else
					mem + c
				end
			end
			char_class.sub!(md[0], short.split("").join("\0"))
		end
		
		char_class.tr!("\0", "")
		char_class.gsub!(/([^\-])-([^\-])/) do |m|
			if $1.succ == $2 then $1 + $2 else m end
		end
		char_class
	end
end

===  Unit Tests  ===

#!/usr/bin/env ruby

# Usage:  ruby -r regexp_build_lib $0

require "test/unit"

class TestRegexpBuild < Test::Unit::TestCase
	def test_integers
		lucky = /^#{Regexp.build(3, 7)}$/
		assert_match(lucky, "7")
		assert_no_match(lucky, "13")
		assert_match(lucky, "3")

		month = /^#{Regexp.build(1..12)}$/
		assert_no_match(month, "0")
		assert_match(month, "1")
		assert_match(month, "12")
		day = /^#{Regexp.build(1..31)}$/
		assert_match(day, "6")
		assert_match(day, "16")
		assert_no_match(day, "Tues")
		year = /^#{Regexp.build(98, 99, 2000..20005)}$/
		assert_no_match(year, "04")
		assert_match(year, "2004")
		assert_match(year, "99")
		
		num = /^#{Regexp.build(1..1_000)}$/
		assert_no_match(num, "-1")
		(-10_000..10_000).each do |i|
			if i < 1 or i > 1_000
				assert_no_match(num, i.to_s)
			else
				assert_match(num, i.to_s)
			end
		end
	end

	def test_embed
		month = Regexp.build("01".."09", 1..12)
		day = Regexp.build("01".."09", 1..31)
		year = Regexp.build(95..99, "00".."05")
		date = /\b#{month}\/#{day}\/(?:19|20)?#{year}\b/
		
		assert_match(date, "6/16/2000")
		assert_match(date, "12/3/04")
		assert_match(date, "Today is 09/15/2004")
		assert_no_match(date, "Fri Oct 15")
		assert_no_match(date, "13/3/04")
		assert_no_match(date, "There's no date hiding in here:  00/00/00!")
		
		md = /^(#{Regexp.build(1..12)})$/.match("11")
		assert_not_nil(md)
		assert_equal(md[1], "11")
	end

	def test_words
		animal = /^#{Regexp.build("cat", "bat", "rat", "dog")}$/
		assert_match(animal, "cat")
		assert_match(animal, "dog")
		assert_no_match(animal, "Wombat")
	end
end