Hi all,

I've been using URI found one issue with URI that is similar to
http://rubyforge.org/tracker/?group_id=426&atid=1698&func=detail&aid=13504 .

URL's with underscores in the hostname are considered invalid, but are actually
real and resolvable.

   % ruby -v && irb
   ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-darwin9.2.0]
   >> require 'uri'
   >> URI.parse("http://mis_nomer.blogspot.com/")
   URI::InvalidURIError: the scheme http does not accept registry part: mis_nomer.blogspot.com (or bad hostname?)
          from /opt/local/lib/ruby/1.8/uri/generic.rb:195:in `initialize'
          from /opt/local/lib/ruby/1.8/uri/http.rb:78:in `initialize'
          from /opt/local/lib/ruby/1.8/uri/common.rb:488:in `new'
          from /opt/local/lib/ruby/1.8/uri/common.rb:488:in `parse'
          from (irb):2

The problem stems from RFC 2396 saying hostname is:

  hostname      = *( domainlabel "." ) toplabel [ "." ]
  domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
  toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

But now in RFC 3986, which supercedes 2396, a host is:

  host          = IP-literal / IPv4address / reg-name
  reg-name      = *( unreserved / pct-encoded / sub-delims )
  unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"

This doeas appear to conflict with the 'Preferred name syntax' from RFC 1035
which says only a-z, A-Z, 0-9 and '-' should be in domain names.  I was attempting
to find the RFC that spoke about underscores and other possible items but had no
luck.

Thoughts?

enjoy

-jeremy

--
========================================================================
 Jeremy Hinegardner                              jeremy / hinegardner.org