Ok, now that I've gotten this initial release out, I want to talk a bit 
about what it is, how it works and some of the problems I've faced up to 
this point as these are pretty critical problems for getting Ruby 
adopted in the XML community (and other problems not related to XML).

My Framework basically revolves around the notion of chaining components 
that modify XML documents together.  The first component in the chain 
starts a SAX2 stream of events, then the next few components modify the 
stream, while the final component at the end of the chain outputs the 
final and completed document.

I wrote a similar framework (which started before I discovered Cocoon2) 
in Visual Basic for my employer that used the DOM.  This worked, but 
this is a VERY innefficient way to process an XML document, especially 
when you start to chain a lot of components together.  Here's an example:

Your initial component generates an XML document by reading it from 
disk.  The next component in the chain transforms certain tags (say 
framework:include) by loading another XML document from disk and 
inserting it's contents into the stream.  Simple enough.  But what if 
you have another component that inserts a bunch of errors into the 
document by searching for the yourapp:errors tag?  Well, now you can see 
the problem, each component has to scan the ENTIRE DOM tree just to make 
one change.  This is clearly a bad idea when you start to chain things 
together.

So the decision to move to a SAX2 stream as opposed to a DOM 
architecture like REXML was an easy one to make (and seeing that the 
Cocoon2 guys did the same thing also helped).  But writing applications 
that use SAX in Ruby is funky at best.

The biggest problem is in dealing with namespaces.  A lot of the Ruby 
XML libraries don't seem to handle XML namespaces very well (if at all). 
 For a project of this scope, namespaces become very critical very 
quickly.  

REXML didn't seem to support streaming namespaces at all (maybe I missed 
something).  Even if it did though, the streaming portions of REXML 
didn't seem to prone to connecting various stream consumers and 
providers together or very fleshed out at all.

So I moved to XMLParser.  Now, XMLParser does support namespaces, but 
it's extremely weird how it's done.  I would have thought that XMLParser 
would have supported the full SAX2 interfaces that Expat provides, but 
instead it seems to implement this funky way of combining the namespace 
URI with the attribute tag and cramming that into the SAX1 interfaces. 
 I don't know if this is an XMLParser issue, or one of the things Expat 
does, but it's not very effective.  I essentially ended up writing a 
wrapper class for XMLParser that undid the damage and made the 
interfaces more SAX2 like.

Which brings me to another point.  We've discussed XML implementations 
before, even streaming parsers like SAX.  Everybody has their own little 
way of doing it.  This is a problem, and here's a good example.

My framework streams SAX2 events into a Transformer (an object that 
makes changes to the SAX2 stream) that uses the Sablotron interface to 
apply a stylesheet to the stream.  To accomplish this, my component has 
to collect the SAX2 stream into a ruby string.  Load the stylesheet, 
pass the Ruby string and the stylesheet into the sablotron transformer 
and let it do it's mojo.  Sablotron returns a ruby string, which I then 
have to *reparse* to generate the new SAX2 stream and send that off to 
the next component.

This is hacky at best (and definitely a performance drag).  An ideal 
implementation would work more along these lines.  The XSLT processor 
needs to use some sort of DOM like implementation to function (probably 
unavoidable).  If the XSLT processor could directly accept SAX events 
and build it's internal representation, that would cut a good deal of 
code out of the picture (serializing the stream to a string, and then 
parsing that string into the new internal representation).  Then, 
because we are using a standard event interface, the final XML document 
could then be immediately streamed out via the SAX2 interfaces generated 
directly from the internal DOM, cutting out yet even more code (avoiding 
saving the new XML document to a string and then parsing ANOTHER time).

That's the ideal.  Imagine if XSLT4R used REXML as it's internal 
representation, REXML was capable of generating it's tree directly from 
SAX2 events and then capable of generating SAX2 events directly from 
it's tree.  That would make combining the various XML technologies not 
just easier, but a LOT more efficient.

Now, I've basically created a SAX2 interface for Ruby in my project 
(that sits on top of  XMLParser).  There has to be synergy between my 
project, and projects such as XMLParser, REXML, and NQXML.  I'd like 
your guys oppinions on the subject.  I want to find a common ground we 
can all agree on and am willing to work towards that common ground.

Now, that's the big XML dilemna.  My framework does a few things that 
are very non-XML for you.  The biggest addition is a new Session interface.

My Session and Cache interfaces are very straightforward, they are 
basically Ruby hashes that get Marshalled to disk after every page 
request (using file locking to hopefull prevent any conflicts).  This 
works, but is hardly ideal.  At some point I want to have the ability to 
plug in something like druby for a better solution.  But that's not my 
problem.  My problem has to do with security concerns.

Ruby won't eval code that is tainted.  These cached objects are tainted 
when they are loaded from the disk, so I frequently have to untaint them 
to access them.  Some I check, some I don't (I either can't or for good 
or bad think I can trust them).  This works, but this sits wrong in my 
stomach.

Assuming I marshal the objects to disk with the appropriate permissions 
such that only the apache user can access the file, is that good enough? 
 (I'm ignoring the problem of multiple users having access to the Apache 
environment for now).  It makes me nervous, but anyway, this brings me 
to my second question.  I seem to be having problems untainting the 
objects in the hash.  If I trust that the file I loaded was secure, then 
I should implicitly trust everything in the cache.  Hash.untaint does 
not seem to propgate to all entries within the hash.  Is this the 
correct behaviour?  Right now I have to explicitly put untaints all 
throughout my code and that again makes me nervous.

Ok, those are my biggest concerns at the moment.  I'm sure I'll have 
more, but I think both of these should provide enough content for me to 
fuss over to keep me busy for awhile now (and I still have way too much 
documenting to do ;)

Bryan

Bryan Murphy wrote:

> Ruby Publishing Framework
> Version 0.5.0
>
> Description:
>
>  The Ruby Publishing Framework is an XML based framework for building 
> dynamic
>  applications that can generate content based on a SAX2 like stream of 
> events.
>
> Version 0.5.0:
>
>  This is the initial public release of the framework which currently 
> focuses
>  on bulding web based applications.
>
> What is it?
>
>  An XML publishing framework is a set of reusable classes for building
>  applications that generate content using XML documents.  By defining 
> classes
>  of components that conform to specific behaviors and providing code for
>  linking those classes together, the framework provides a means to 
> efficiently
>  create XML based applications out of easily resuable and highly 
> modularized
>  components.
>
> How can this benefit me?
>
>  To get to the point: When you build a web based application, you are 
> making
>  an HTML document that pulls data from one or more sources.  Since 
> HTML is
>  essentially a dialect of XML, you can build applications that utilize 
> the
>  industry-wide XML expertise and technologies.
>
>  The problem with this approach is that you often end up reinventing the
>  wheel.  Every time you want to create or modify an XML document, you 
> have
>  to write code to process the XML.  This can become tedious as you write
>  what is essentially the same code (with minor differences) again and 
> again.
>
>  To compound the problem, you are frequently using outdated technologies
>  (such as ASP or JSP pages) that cause problems of their own (such as 
> mixing
>  content with logic).  Attempts have been made to rectify these problems,
>  but many of them don't go far enough.
>
>  An XML publishing framework allows you to move beyond the world of 
> outdated
>  web development.  By utilizing standards based technologies and best
>  practices (such as MVC seperation of content, logic, and 
> presentation) the
>  framework allows you to quickly and easily build complex XML 
> applications.
>
> Where can I get it and/or learn more?
>
>  http://software.terralab.com/framework/
>
> Are there any similar projects in other languages?
>
>  Apache Cocoon 2 (Java)
>  -> http://xml.apache.org/cocoon/
>
>  AxKit (Perl)
>  -> http://www.axkit.org/
>