On Jun 5, 5:24 pm, "Jano Svitok" <jan.svi... / gmail.com> wrote: > On 6/5/07, Morgan Cheng <morgan.chen... / gmail.com> wrote: > > > > > On Jun 5, 3:00 pm, "Jano Svitok" <jan.svi... / gmail.com> wrote: > > > On 6/5/07, Morgan Cheng <morgan.chen... / gmail.com> wrote: > > > > > I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host. > > > > > uri = "http://dr_gabriele.podomatic.com/enclosure/ > > > > 2006-08-03T15_09_59-07_00.m4v" > > > > URI.parse(uri) > > > > > Is there any way to work around that? > > > > thanks > > > > It seems underscores are not allowed in host part of an URI. So it's > > > not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it, > > > you can open the class and redefine some of the methods and/or > > > manually edit URI sources. > > > In RFC 2396, "_" is taken as "Unreserved Characters". > > Unreserved characters can be escaped without changing the semantics > > of the URI, but this should not be done unless the URI is being > > used > > in a context that does not allow the unescaped character to appear. > > > However, URI.escape doesn't escape "_". > > > require 'URI' > > original_uri = "http://dr_gabriele.podomatic.com/enclosure/" > > uri = URI.escape(original_uri) > > puts uri == original_uri > > I'm no expert on DNS, this is what I have found in appendix A: > > host = hostname | IPv4address > hostname = *( domainlabel "." ) toplabel [ "." ] > domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum > toplabel = alpha | alpha *( alphanum | "-" ) alphanum > IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit > > alphanum = alpha | digit > alpha = lowalpha | upalpha > > lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | > "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | > "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" > upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | > "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | > "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" > digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | > "8" | "9" > > There's no "_" there. YMMV ;-) Thanks a lot for your help. I am just wandering, internet is a wild world. Wierd non-standard stuff is all around. The non-standard host name is a example. Popular browser can handle these URLs well. Perhaps ruby should be more strong to survive better in such wild world :-)