On Jan 24, 2008, at 9:23 AM, Dan Cuddeford wrote:

> So it seems using the two together
>
>
> require 'uri'
>
>        uri = URI.parse("http://www.ruBy-lang.org/ARSE")
>
>  can = uri.normalize
>  p can
>
>  p can.host
>
>  p can.path
>
>
> means the path keeps it's case sensitivity but the host is normalized.
>
> I think that's it - however,
>
> try it with ruby-lang..org and
>
> /usr/lib/ruby/1.8/uri/generic.rb:195:in `initialize': the scheme http
> does not accept registry part: www.ruBy-lang..org (or bad hostname?)
> (URI::InvalidURIError)
>        from /usr/lib/ruby/1.8/uri/http.rb:78:in `initialize'
>        from /usr/lib/ruby/1.8/uri/common.rb:488:in `new'
>        from /usr/lib/ruby/1.8/uri/common.rb:488:in `parse'
>        from canon.rb:3
>
> So I guess it needs a bit or error checking before hand.

require 'uri'

def canonicalize(uri)
   u = uri.kind_of?(URI) ? uri : URI.parse(uri.to_s)
   u.normalize!
   newpath = u.path
   while newpath.gsub!(%r{([^/]+)/\.\./?}) { |match|
              $1 == '..' ? match : ''
            } do end
   newpath = newpath.gsub(%r{/\./}, '/').sub(%r{/\.\z}, '/')
   u.path = newpath
   u.to_s
end

canonicalize('http://www.Ruby-Lang.ORG/ARSE/done/../../rear/./end/.')
=> "http://www.ruby-lang.org/rear/end/"

-Rob

Rob Biedenharn		http://agileconsultingllc.com
Rob / AgileConsultingLLC.com