--Pql/uPZNXIm1JCle
Content-Type: multipart/mixed; boundary="ryJZkp9/svQ58syV"
Content-Disposition: inline


--ryJZkp9/svQ58syV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

The attached patch against Ruby 1.8.4 adds XHTML 1.0 output support to
the Ruby CGI library.  It should apply clean against Ruby 1.9 as well,
although I haven't tried.  

I've taken special care to _not_ change the output produced by non-XHTML
output types; they should remain byte-for-byte compatible to the output
generated by the Ruby CGI library without this patch (with one exception
that is discussed way, way, way down at the bottom of this message).
Also, any changes to the existing CGI API are in the form of
additional method parameters which default to the old behavior.  In
other words, applications and libraries which depend on Ruby CGI will
continue to work as expected without modification.

XHTML documents are, by definition, also well-formed XML documents,
which, in addition to the extra document type declarations, also means
case-sensitive attribute and element names, balanced element tags, and
quoted attribute values.  With that in mind, here's a list of changes
which apply to XHTML content generation:

  * Added three new output types: "xhtml10", "xhtml10Tr", and
    "xhtml10Fr", for XHTML 1.0 Strict, XHTML 1.0 Transitional, and 
    XHTML 1.0 Frameset, respectively.
  * Lower-cased element and attribute names.
  * Expand minimized attributes.
  * Balanced element tags.
  * Addition of an XML declaration.

Here's an example that demonstrates each of these changes (the code that
generates this output is attached as "ruby-1.8.4-xhtml_cgi_test.rb"):

Here's the output produced by CGI.rb with an output type of 'html4':

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><HTML><HEAD><TITLE>CGI Output Test (html4)</TITLE><LINK href="style.css" rel="stylesheet" type="text/css"></HEAD><BODY><H1>CGI Output Test (html4)</H1><HR><P>This is a test <ACRONYM title="HyperText Markup Language">HTML</ACRONYM> document.</P><FORM METHOD="post" ENCTYPE="application/x-www-form-urlencoded" ACTION="foobar.cgi"><INPUT NAME="hi" TYPE="hidden" VALUE="hidden"><SELECT NAME="bare_attr_test"><OPTION VALUE="1">foo</OPTION><OPTION SELECTED VALUE="2">bar</OPTION></SELECT><INPUT TYPE="submit" VALUE="Testing Form Output"></FORM><PRE><CODE name="id-map-test">this is a test of name mapping</CODE></PRE></BODY></HTML>

Now, here's the same document, with an output type of 'xhtml10':

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><?xml version='1.0' encoding='UTF-8'?><html xmlns="http://www.w3.org/1999/xhtml"><head><title>CGI Output Test (xhtml10)</title><link href="style.css" rel="stylesheet" type="text/css" /></head><body><h1>CGI Output Test (xhtml10)</h1><hr /><p>This is a test <acronym title="HyperText Markup Language">HTML</acronym> document.</p><form method="post" enctype="application/x-www-form-urlencoded" action="foobar.cgi"><input name="hi" id="hi" type="hidden" value="hidden" /><select name="bare_attr_test" id="bare_attr_test"><option value="1">foo</option><option selected="selected" value="2">bar</option></select><input type="submit" value="Testing Form Output" /></form><pre><code name="id-map-test" id="id-map-test">this is a test of name mapping</code></pre></body></html>

Note the addition of an XML declaration after the document type
declaration.  While the XHTML spec doesn't _require_ an XML declaration,
it strongly recommends it, especially if the document is being served
with a content type of 'text/html'.  I haven't changed the default
Content-Type for XHTML output from the default value of 'text/html';
I'll elaborate on that a bit more later.  I've also deliberately put the
XML declaration after the DOCTYPE declaration.  Even though my spidey
sense tells me that it should be the other way around, the spec does not
appear to require the XML declaration before the DOCTYPE declaration
and, more importantly, certain browsers (hello, Internet Explorer!)
behave incorrectly if the DOCTYPE declaration doesn't come first.

Other obvious changes are the lower-case element and attribute names,
balanced attributes ("<HR>" vs "<hr />") and terminating tags for
elements such as <option> and ), and non-minimized attributes
("<OPTION SELECTED>" vs. "<option selected='selected'>").

The XHTML 1.0 specification is both explicit about removing minimized
attributes and frustratingly vague about what the value of the new
expanded attributes should be.  The examples provided in the
specification put the name of the attribute as the value, which means, 
for example, that the fragment "<OPTION SELECTED>" would become 
"<option selected='selected'>".  

In order to maintain backwards compatability with XHTML-challenged user
agents, the "HTML Compatability Guidelines" section of the XHTML 1.0
specification recommends that XHTML 1.0 documents served with a
Content-Type of "text/html" supply both the 'id' and 'name' attributes
with identical values as fragment identifiers.  This recommendation is
reflected in the patch; 'name' attributes for elements are automatically
cloned as 'id' attributes unless an element has explicitly specified an
'id' attribute.

Now, everybody's favorite topic, content types.  The allowed content
type for XHTML documents, in order of preference according to the
specification, are as follows:

  * 'application/xhtml+xml'
  * 'application/xml'
  * 'text/xml'
  * 'text/html'

The W3C has an entire page explaining the effect each of the allowed
content types has on compliant user agents, so I won't duplicate that
here.  As noted above, I've taken special care to make sure that the
generated XHTML complies with the W3C XHTML 1.0 HTML Compatability
Guidelines and also renders properly in HTML4-aware/XHTML-challenged
user agents (I shouldn't have to say it at this point, but, Internet
Explorer).  In order to ensure backwards compatability, the recommended
content type according to the W3C guidelines is 'text/html'.

As promised, here's the one change I've made to non-XHTML output:

-          "<INPUT TYPE=\"HIDDEN\" NAME=\"#{k}\" VALUE=\"#{v}\">"
+          hidden(k, v)

Obviously this will change the order of the attributes in one isolated
case (although it's not immediately apparent what that case is; that 
code block is wrapped in an "if @output_hidden" statement, and
@output_hidden is never set to a true value). 

Both the patch and the test code used to generate the sample output
above are attached.  Each are also available online at the following
URLs:

  http://diff.pablotron.org/ruby-1.8.4-xhtml_cgi.diff
  http://pablotron.org/files/ruby-1.8.4-xhtml_cgi_test.rb

Finally, if you're having trouble sleeping, here's some recommended
reading:

  * "XHTML 1.0 The Extensible HyperText Markup Language"
    http://www.w3.org/TR/xhtml1/
  * "XHTML Media Types"
    http://www.w3.org/TR/xhtml-media-types/

-- 
Paul Duncan <pabs / pablotron.org>        OpenPGP Key ID: 0x82C29562
http://www.pablotron.org/               http://www.paulduncan.org/

--ryJZkp9/svQ58syV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ruby-1.8.4-xhtml_cgi.diff"
Content-Transfer-Encoding: quoted-printable

diff -ur ruby-1.8.4/lib/cgi.rb ruby-1.8.4-xhtml_cgi/lib/cgi.rb
--- ruby-1.8.4/lib/cgi.rb	2005-10-06 21:01:22.000000000 -0400
+++ ruby-1.8.4-xhtml_cgi/lib/cgi.rb	2006-01-20 13:38:26.000000000 -0500
@@ -266,10 +266,13 @@
 #   end
 # 
 #   # add HTML generation methods
-#   CGI.new("html3")    # html3.2
-#   CGI.new("html4")    # html4.01 (Strict)
-#   CGI.new("html4Tr")  # html4.01 Transitional
-#   CGI.new("html4Fr")  # html4.01 Frameset
+#   CGI.new("html3")      # html3.2
+#   CGI.new("html4")      # html4.01 (Strict)
+#   CGI.new("html4Tr")    # html4.01 Transitional
+#   CGI.new("html4Fr")    # html4.01 Frameset
+#   CGI.new("xhtml10")    # XHTML 1.0 (Strict)
+#   CGI.new("xhtml10Tr")  # XHTML 1.0 Transitional
+#   CGI.new("xhtml10Fr")  # XHTML 1.0 Frameset
 #
 class CGI
 
@@ -536,6 +539,24 @@
   # 
   # This method does not perform charset conversion. 
   #
+  # Content-Type and XHTML 1.0 Output:
+  #
+  # In accordance with the oft-heralded Principle of Least Suprise and
+  # both backwards and browser compatability, the Content-Type for
+  # generated XHTML defaults to 'text/html'.  However, if you're
+  # generating XHTML 1.0 content (i.e., you've created a CGI with the
+  # xhtml10, xhtml10Tr, or xhtml10Fr HTML output types), for user agents
+  # which are XHTML-aware, you might consider using a more
+  # XHTML-friendly content type. such as 'application/xhtml+xml',
+  # 'application/xml', or 'text/xml'.  In particular, the use of
+  # 'application/xhtml+xml' change the default character encoding
+  # behavior, rendering, and validation that user agents make about your
+  # document.  The nuances of these character types and the effect they
+  # have on conforming user agents are covered in gory detail in the W3C
+  # note "XHTML Media Types" at the following URL:
+  #
+  #   http://www.w3.org/TR/xhtml-media-types/
+  #
   def header(options = "text/html")
 
     buf = ""
@@ -1224,35 +1245,83 @@
   # Provides methods for code generation for tags following
   # the various DTD element types.
   module TagMaker # :nodoc:
+    TagStyle = Struct.new(
+      # upper-case element and attribute names on output 
+      # (true for HTML, false for XHTML)
+      :upcase,  
+
+      # allow bare (minimized) attributes (attributes without values)
+      # (true for HTML, false for XHTML)
+      :bare_attrs, 
+
+      # always close elements, even if empty
+      # (false for HTML, true for XHTML)
+      :always_close, 
+
+      # add implicit IDs to elements with name attributes but not IDs
+      # (false for HTML, true for XHTML10)
+      :implicit_ids
+    )
+
+    # HTML3/4 tag style
+    # this is declared here because it's the default tag style if
+    # unspecified in the methods below (to preserve
+    # backwards-compatability for other extensions depending on these
+    # methods)
+    HTML_TAG_STYLE = TagMaker::TagStyle.new(true, true, false, false)
+    XHTML10_TAG_STYLE = TagMaker::TagStyle.new(false, false, true, true)
+
 
     # Generate code for an element with required start and end tags.
     #
     #   - -
-    def nn_element_def(element)
-      nOE_element_def(element, <<-END)
+    def nn_element_def(element, style = HTML_TAG_STYLE)
+      elem_name = style.upcase ? element.upcase : element
+      nOE_element_def(element, <<-END, style)
           if block_given?
             yield.to_s
           else
             ""
           end +
-          "</#{element.upcase}>"
+          "</#{elem_name}>"
       END
     end
 
     # Generate code for an empty element.
     #
     #   - O EMPTY
-    def nOE_element_def(element, append = nil)
+    def nOE_element_def(element, append = nil, style = HTML_TAG_STYLE,lose_elem = false)
+      elem_name = style.upcase ? element.upcase : element
+      elem_end = (close_elem && style.always_close) ? ' />' : '>'
+      attr_name = style.upcase ? 'name' : 'name.downcase'
+
       s = <<-END
-          "<#{element.upcase}" + attributes.collect{|name, value|
+          has_id = #{style.implicit_ids}
+          has_id &&= attributes.keys.map { |v| v.downcase}.include?('id')
+
+          "<#{elem_name}" + attributes.collect{|name, value|
             next unless value
-            " " + CGI::escapeHTML(name) +
+            " " + CGI::escapeHTML(#{attr_name}) +
             if true == value
-              ""
+              #{style.bare_attrs} ? "" : '="' + CGI::escapeHTML(#{attr_name}) + '"'
             else
-              '="' + CGI::escapeHTML(value) + '"'
+              val = '="' + CGI::escapeHTML(value) + '"'
+
+              # what we're doing here is cloning the name attribute to
+              # an ID attribute if the implicit_ids style flag is set
+              # and an ID attribute wasn't explicitly specified.  this
+              # is necessary to maintain backwards compatability with
+              # the existing CGI modules and forwards compatability with
+              # XHTML 1.0 and (eventually) XHTML 1.1.  This approach has
+              # the added bonus of being guaranteed to work in older
+              # user agents.
+              val += ' id' + val if #{style.implicit_ids} && 
+                                    !has_id && 'name' == name.downcase
+
+              # return attribute value assignment
+              val
             end
-          }.to_s + ">"
+          }.to_s + "#{elem_end}"
       END
       s.sub!(/\Z/, " +") << append if append
       s
@@ -1262,10 +1331,11 @@
     # start) tag is optional.
     #
     #   O O or - O
-    def nO_element_def(element)
-      nOE_element_def(element, <<-END)
+    def nO_element_def(element, style = HTML_TAG_STYLE)
+      elem_name = style.upcase ? element.upcase : element
+      nOE_element_def(element, <<-END, style)
           if block_given?
-            yield.to_s + "</#{element.upcase}>"
+            yield.to_s + "</#{elem_name}>"
           else
             ""
           end
@@ -1554,7 +1624,7 @@
       end
       if @output_hidden
         body += @output_hidden.collect{|k,v|
-          "<INPUT TYPE=\"HIDDEN\" NAME=\"#{k}\" VALUE=\"#{v}\">"
+          hidden(k, v)
         }.to_s
       end
       super(attributes){body}
@@ -1594,6 +1664,14 @@
     # "DOCTYPE", if given, is used as the leading DOCTYPE SGML tag; it
     # should include the entire text of this tag, including angle brackets.
     #
+    # For XHTML 1.0 output, two addition pseudo-attributes, "XMLDECL"
+    # and "XML_ENCODING", are available.  "XMLDECL" is, as the name
+    # implies, the full XML declaration for the output document, and, 
+    # like the "DOCTYPE" pseudo-element, should include the entire text
+    # of the tag, including angle brackets. "XML_ENCODING" is the top-
+    # level XML encoding for the document, and defaults to "UTF-8" if
+    # unspecified.
+    #
     # The body of the html element is supplied as a block.
     # 
     #   html{ "string" }
@@ -1647,6 +1725,39 @@
         buf += doctype
       end
 
+      # if the method xmldecl exists, then print out an XML
+      # declaration.  IE doesn't render in strict mode if the first
+      # element in the document isn't a DOCTYPE declaration, so we need
+      # to put the XML declaration after the DOCTYPE declaration, even
+      # though it really makes more sense the other way around.
+      if respond_to?(:xmldecl)
+        buf += if attributes.key?('XMLDECL')
+          # if the pseudo-attribute XMLDECL is specified, then delete it
+          # from the attribute list and use that instead of the
+          # pre-defined XML declaration
+          attributes.delete('XMLDECL')
+        else
+          # if the pseudo-attribute XML_ENCODING is specified, then
+          # delete it from the attribute list and use it instead of
+          # UTF-8
+          encoding = if attributes.key?('XMLENCODING') 
+            attributes.delete('XML_ENCODING')
+          else
+            'UTF-8'
+          end
+
+          # render the XML declaration with the specified encoding
+          xmldecl(encoding)
+        end
+      end
+      
+
+      # add the xml namespace unless the xmlns method isn't defined
+      # _and_ we don't have the xmlns attribute set
+      unless attributes.key?('xmlns') || !respond_to?(:xmlns)
+        attributes['xmlns'] = xmlns
+      end
+
       if block_given?
         buf += super(attributes){ yield }
       else
@@ -2055,7 +2166,6 @@
 
   # Mixin module for HTML version 3 generation methods.
   module Html3 # :nodoc:
-
     # The DOCTYPE declaration for this version of HTML
     def doctype
       %|<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">|
@@ -2236,6 +2346,154 @@
 
   end # Html4Fr
 
+  # Mixin module for generating XHTML version 1.0
+  module Xhtml10 # :nodoc:
+
+    # The DOCTYPE declaration for this version of HTML
+    def doctype
+      %|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">|
+    end
+
+    def xmldecl(enc = 'UTF-8')
+      %|<?xml version='1.0' encoding='#{enc}'?>|
+    end
+
+    def xmlns
+      'http://www.w3.org/1999/xhtml'
+    end
+
+    # Initialise the HTML generation methods for this version.
+    def element_init
+      style = TagMaker::XHTML10_TAG_STYLE
+
+      extend TagMaker
+      methods = ""
+      # - -
+      for element in %w[tt i b big small em strong dfn code samp kbd
+        var cite abbr acronym sub sup span bdo address div map object
+        h1 h2 h3 h4 h5 h6 pre q ins del dl ol ul label select optgroup
+        fieldset legend button table title style script noscript
+        textarea form a blockquote caption 
+        html body p dt dd li option theadtfood tbody colgroup tr th td head ]
+        methods += <<-BEGIN + nn_element_def(element, style) + <<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+
+      # - O EMPTY
+      for element in %w[img base br area link param hr input col meta ]
+        methods += <<-BEGIN + nOE_element_def(element, nil, style, true)<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+
+      eval(methods)
+    end
+
+  end # Xhtml10
+
+
+  # Mixin module for HTML version 4 transitional generation methods.
+  module Xhtml10Tr # :nodoc:
+
+    # The DOCTYPE declaration for this version of HTML
+    def doctype
+      %|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">|
+    end
+
+    def xmldecl(enc = 'UTF-8')
+      %|<?xml version='1.0' encoding='#{enc}'?>|
+    end
+
+    def xmlns
+      'http://www.w3.org/1999/xhtml'
+    end
+
+    # Initialise the HTML generation methods for this version.
+    def element_init
+      style = TagMaker::XHTML10_TAG_STYLE
+
+      extend TagMaker
+      methods = ""
+      # - -
+      for element in %w[ tt i b u s strike big small em strong dfn
+          code samp kbd var cite abbr acronym font sub sup span bdo
+          address div center map object applet h1 h2 h3 h4 h5 h6 pre q
+          ins del dl ol ul dir menu label select optgroup fieldset
+          legend button table iframe noframes title style script
+          noscript textarea form a blockquote caption html body p dt dd
+          li option thead tfoot tbody colgroup tr th td head]
+        methods += <<-BEGIN + nn_element_def(element, style) + <<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+
+      # - O EMPTY
+      for element in %w[ img base basefont br area link param hr input
+          col isindex meta ]
+        methods += <<-BEGIN + nOE_element_def(element, nil, style, true)<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+
+      eval(methods)
+    end
+
+  end # Xhtml10Tr
+  
+  # Mixin module for generating XHTML version 1.0 with framesets.
+  module Xhtml4Fr # :nodoc:
+
+    # The DOCTYPE declaration for this version of HTML
+    def doctype
+      %|<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">|
+    end
+
+    # the XML declaration for this XHTML document
+    def xmldecl(enc = 'UTF-8')
+      %|<?xml version='1.0' encoding='#{enc}'?>|
+    end
+
+    # the XML namespace attribute for this XHTML document
+    def xmlns
+      'http://www.w3.org/1999/xhtml'
+    end
+
+    # Initialise the HTML generation methods for this version.
+    def element_init
+      style = TagMaker::XHTML10_TAG_STYLE
+
+      methods = ""
+      # - -
+      for element in %w[ frameset ]
+        methods += <<-BEGIN + nn_element_def(element, style) + <<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+
+      # - O EMPTY
+      for element in %w[ frame ]
+        methods += <<-BEGIN + nOE_element_def(element, nil, style, true)<-END
+          def #{element}(attributes = {})
+        BEGIN
+          end
+        END
+      end
+      eval(methods)
+    end
+
+  end # Xhtml10Fr
+
 
   # Creates a new CGI instance.
   #
@@ -2246,6 +2504,9 @@
   # html4:: HTML 4.0
   # html4Tr:: HTML 4.0 Transitional
   # html4Fr:: HTML 4.0 with Framesets
+  # xhtml10:: XHTML 1.0 (Strict)
+  # xhtml4Tr:: XHTML 1.0 Transitional
+  # xhtml4Fr:: XHTML 1.0 with Framesets
   #
   # If not specified, no HTML generation methods will be loaded.
   #
@@ -2291,6 +2552,20 @@
       extend Html4Fr
       element_init()
       extend HtmlExtension
+    when 'xhtml10'
+      extend Xhtml10
+      element_init()
+      extend HtmlExtension
+    when 'xhtml10Tr'
+      extend Xhtml10Tr
+      element_init()
+      extend HtmlExtension
+    when 'xhtml10Fr'
+      extend Xhtml10Tr
+      element_init()
+      extend Xhtml10Fr
+      element_init()
+      extend HtmlExtension
     end
   end
 

--ryJZkp9/svQ58syV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ruby-1.8.4-xhtml_cgi_test.rb"
Content-Transfer-Encoding: quoted-printable

#!/usr/bin/env ruby

# load the CGI module
require 'cgi'

# build list of test output types and associated test CGIs
HTML_TYPES = %w{xhtml10 html4 html4Tr xhtml10Tr}

# clear stdin (to prevent CGI offline-mode)
$stdin = File.open('/dev/null', 'r')

# iterate over each test CGI and render the same document
HTML_TYPES.each do |key| 
  # create new CGI with the given output type
  cgi = CGI.new(key)

  # generate document title
  doc_title = "CGI Output Test (#{key})"

  puts "#{key}: " << cgi.html { 
    cgi.head { 
      head = cgi.title { doc_title }

      # style link attributes
      link_attrs = { 
        'type' => 'text/css', 
        'href' => 'style.css', 
        'rel'  => 'stylesheet' 
      }

      # test empty element behavior
      head + cgi.link(link_attrs) 
    } + 
    
    cgi.body { 
      cgi.h1 { doc_title } + 

      # test empty element behavior
      cgi.hr + 

      # test element balancing
      cgi.p { 
        'This is a test ' + 
        cgi.acronym('title' => 'HyperText Markup Language') { 'HTML' } + 
        ' document.' 
      } +

      cgi.form('post', 'foobar.cgi') {
        # test hidden element behavior
        cgi.hidden('hi', 'hidden') +

        # test bare (minimized) attribute behavior
        cgi.popup_menu('bare_attr_test', ['1', 'foo'], ['2', 'bar', true]) +
        cgi.submit("Testing Form Output")
      } +

      cgi.pre { 
        # test id munging
        cgi.code('name' => 'id-map-test') {
          'this is a test of name mapping'
        } 
      }
    } 
  } 
end

--ryJZkp9/svQ58syV--

--Pql/uPZNXIm1JCle
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFD0WBMzdlT34LClWIRAkHIAKCkBmxxFHrg3nVefBs3rDyTTYQYwwCg2Hho
1iopySFljqiWZ0BxbtVPKfM߰i
-----END PGP SIGNATURE-----

--Pql/uPZNXIm1JCle--