Alex Young <alex / blackkettle.org> writes:

> I'm not sure whether this is a bug:
>
> irb(main):013:0> a = URI.parse("http://www.example.com/foo/bar?a=b")
> => #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
> irb(main):014:0> b = URI.parse("?a=c")
> => #<URI::Generic:0xfdbb6b770 URL:?a=c>
> irb(main):015:0> puts a.merge(b).to_s
> http://www.example.com/foo/?a=c

I'll note that although firefox agrees with your expectations, lynx
agrees with the behavior of the uri module.

To understand what the uri module is doing, look at this:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> b = URI.parse("?a=c")
=> #<URI::Generic:0xfdbccb1d0 URL:?a=c>
irb(main):003:0> b.scheme
=> nil
irb(main):004:0> b.userinfo || b.host || b.port
=> nil
irb(main):005:0> b.path
=> ""
irb(main):006:0> b.query
=> "a=c"
irb(main):007:0> b.fragment
=> nil

That is, the scheme and authority portions of the uri are *nil*, but
the path is present, as the empty string.  When merging an empty path
with the path "/foo/bar" , the uri module comes up with "/foo/".  Not
a totally unreasonable choice.

In fact, this is a bug, but not the one you think.  "?a=b" is a
malformed relative URI.  You should get a parse error trying to create
that.

According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):

      relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

      rel_path      = rel_segment [ abs_path ]

      rel_segment   = 1*( unreserved | escaped |
                          ";" | "@" | "&" | "=" | "+" | "$" | "," )

See the 1* part?  That means that a relative uri path segment must
consist of at least one character.  An empty path segment is illegal.
(Note that uri references that begin with '#' are covered in section 4
of the RFC, and match the rule "URI-reference" rather than the rule
"relativeURI")

Now, given that the URI module does indeed accept relative URIs like
this, perhaps we should redefine URI merging for these pathological
cases so that the URI module behaves as some particular well-known
browser does:

module URI
  class Generic
    def merge_like(browser, other)
      if !other.absolute? and other.path and other.path.empty? and 
         not (other.userinfo || other.host || other.port) then
        case browser
        when :firefox, :netscape
          other = other.dup
          other.path = self.path
        when :ie, :microsoft, :links
          other = other.dup
          if other.query || other.fragment
            other.path = self.path
          else
            other.path = '.'
          end
        when :lynx
          # we're good already, so we don't *need* to do
          # this, but let's pass the real merge function
          # valid relative uris anyway, okay?
          other = other.dup
          if other.query
            other.path = '.'
          else
            other.path = self.path
          end          
        else
          # Could someone test how opera handles the three links on 
          # http://snowplow.org/martin/relative_uri_test.html ?
          raise "Unhandled browser type #{browser}"
        end
      end
      return merge(other)
    end
  end
end

-- 
s=%q(  Daniel Martin -- martin / snowplow.org
       puts "s=%q(#{s})",s.to_a.last       )
       puts "s=%q(#{s})",s.to_a.last