On Fri, 2006-01-27 at 16:15, James Britt wrote: <snip> > > Historical reasons, mostly. HTML started out as inspired by SGML, rather > > than compliant with SGML. The people who built the first web servers > > didn't know much about SGML, and so they reinvented processing > > instructions in an annoyingly incompatible manner. Ever since, web > > frameworks have been built on a solid foundation of > > don't-bother-me-with-the-basics-of-SGML/XML-processing. Quite > > successfully too, which really grates my cheese. :-) > > Do you have any references for this? Tim Berners-Lee certainly knew about SGML. The original specification explicitly mentions it. See http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html. However, HTML was not fully SGML compliant. For example, there is a sentence in the original spec that says: "Currently HTML documents are transmitted without the normal SGML framing tags, but if these are included parsers will ignore them." There was also an original test dataset, including this file: http://www.w3.org/History/19921103-hypertext/hypertext/WWW/Test/test.html. If you look at the source, you can see that it is not fully SGML compliant. For starters, there is no Doctype. Also, there are tags that contain formatted text that is not wrapped in a CDATA section. Neither is allowed in SGML. An interesting thing to note is that the <P> tag was used to indicate the end of a paragraph in the test document, though the original spec said <P> was a paragraph start tag. I remember that all my early HTML books said <P> was an end tag. Unfortunately, I threw those books away years ago. :-( Also, I believe the first HTML DTD was for version 2.0, written in 1995. Here is the link: http://www.w3.org/MarkUp/html-spec/html.dtd. Since there was no DTD for version 1.0, it could not have been SGML compliant. In all fairness, I could be wrong about there being no HTML 1.0 DTD. There are notes from 1992 that talk about the future of HTML and "a new DTD", which indicates the existence of an old one. It's just that I haven't found it. Even so, the lack of a requirement for a Doctype would be enough to render HTML non-compliant. More importantly, it would not be parseable by SGML parsers. At the time, loosing the Doctype and CDATA sections, and not supporting hierarchical chapter and section structures, was probably the right decision. HTML had to be very simple, or people would not have used it. If the design had been "better", we might not have had a web today. > I'm pretty sure Tim Berners-Lee, > Marc Andreessen, etc. knew about SGML, and I do not believe that HTML > ever had PIs. I have never seen a HTML spec that mentions processing instructions. Nor is there any need to. Processing instructions can be defined by anyone who designs a processing application, they are not tied to a specific DTD or SGML application. (Well, except that some specifications explicitly defines some PIs, but there is nothing that prevents users of the DTD to specify more of them.) I can't prove that the people who wrote the first web servers did not know about PIs, but I think it is likely. If they had known, what possible reason could they have had for deliberately doing something that was not SGML compliant? (Browser wars and vendor lock in didn't become major issues until later.) > > Also, the xml-dev list is a good place to read varying, but informed, > opinions on the use of PIs. > > For example: > > http://lists.xml.org/archives/xml-dev/200505/msg00159.html > I follow the list, though not as carefully now as I did a couple of years ago. In addition to the applications mentioned in the thread you refer to, XML editors, like XMetaL and Arbortext Editor make use of processing instructions. So does many proprietary SGML/XML processing systems. /Henrik -- http://kallokain.blogspot.com/ - Blogging from the trenches of software development http://www.henrikmartensson.org/ - Reflections on software development http://testunitxml.rubyforge.org/ - The Test::Unit::XML Home Page http://declan.rubyforge.org/ - The Declan Home Page