--------------070706010408000105090809
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

James Edward Gray II wrote:
> On Apr 4, 2007, at 6:15 AM, Robert Klemme wrote:
> 
>> On 04.04.2007 12:00, Peter Szinek wrote:
>>> Robert Klemme wrote:
>>>> On 04.04.2007 10:53, Peter Szinek wrote:
>>>>> I really just need a fast XML parser which is easy to install, 
>>>>> that's all. scRUBYt! is a high-level framework, aimed also at 
>>>>> non-programmers, so I can not expect that all my potential users 
>>>>> are handy with debian's package policy and the joys of libxml 
>>>>> installing on win32 :)
>>>>
>>>> Maybe then you'll simply have to decide whether ease of use or 
>>>> performance is more important to you.
>>> Should I interpret this as 'decide between REXML and libxml'?
>>> There are really no other alternatives?
>>
>> AFAIK REXML is the only pure Ruby XML parser - and it comes with the 
>> standard distribution.
> 
> Sounds like it is time for FasterXML.  :)

One pointer: REXML comes with quite a fast pullparser, and it should be 
possible to base some lightweight xml document lib on that. (The 
documentation says that the API should not be considered stable, but I'm 
sure that could be resolved with the REXML author.)

As a proof of concept, see the attached code. We use it in our company 
to load and process XML files generated by our tools and OpenOffice Calc.
I just tested it on a 1MB XML from an .ods file, which it loaded 
successfully in < 2 seconds.

Writing a fast XPath implementation to match this might be quite a 
challenge, though. ;)

Dennis

--------------070706010408000105090809
Content-Type: text/plain;
 namemlsimple2.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filenamemlsimple2.rb"

require 'rexml/parsers/pullparser'

module XmlSimple
	def self.load(filename)
		parse(File.read(filename))
	end

	def self.parse(string)
		parser  EXML::Parsers::PullParser.new(string)
		return Node.new(['root', {}],  parser)
	end

	class Node
		include Enumerable
	
		instance_methods(true).each {|m| undef_method(m) unless m /__.*__/}
		attr_reader :name, :attr, :text, :children
		def initialize(token, parser)
			@name  oken[0]
			@text  '
			@siblings  self]
			@attr  oken[1]
			@nodes  }
			@children  ]
			loop do
				if parser.has_next?
					tok  arser.pull
				else
					tok  EXML::Parsers::PullEvent.new([:end_element, 'root'])
				end
				case tok.event_type
				when :start_element
					node  ode.new(tok, parser)
					@children << node
					if @nodes[tok[0]]
						@nodes[tok[0]].push_sibling(node)
					else
						@nodes[tok[0]]  ode
					end
				when :end_element
					raise unless tok[0] @name
					return
				when :text
					@text << tok[0]
					@children << tok[0]
				end
			end
		end
    
		def push_sibling(node)
			@siblings << node
		end
		
		def to_a
			@siblings
		end
    
		def each(&block)
			@siblings.each(&block)
		end
    
		def method_missing(m)
			return @nodes[m.to_s]
		end
  
		def [](m)
			return @nodes[m]
		end
    
		def inspect(indent  ')
			r  ndent + @name + ":\n"
			indent +   '
			r << indent + 'attr: ' + attr.inspect + "\n" unless attr.empty?
			r << indent + 'text: ' + text.inspect + "\n" unless text.empty?
			@nodes.each do |k, v|
				v.each {|n| r << n.inspect(indent)}
			end
			return r
		end
 	end
end

--------------070706010408000105090809--