--------------070706010408000105090809 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit James Edward Gray II wrote: > On Apr 4, 2007, at 6:15 AM, Robert Klemme wrote: > >> On 04.04.2007 12:00, Peter Szinek wrote: >>> Robert Klemme wrote: >>>> On 04.04.2007 10:53, Peter Szinek wrote: >>>>> I really just need a fast XML parser which is easy to install, >>>>> that's all. scRUBYt! is a high-level framework, aimed also at >>>>> non-programmers, so I can not expect that all my potential users >>>>> are handy with debian's package policy and the joys of libxml >>>>> installing on win32 :) >>>> >>>> Maybe then you'll simply have to decide whether ease of use or >>>> performance is more important to you. >>> Should I interpret this as 'decide between REXML and libxml'? >>> There are really no other alternatives? >> >> AFAIK REXML is the only pure Ruby XML parser - and it comes with the >> standard distribution. > > Sounds like it is time for FasterXML. :) One pointer: REXML comes with quite a fast pullparser, and it should be possible to base some lightweight xml document lib on that. (The documentation says that the API should not be considered stable, but I'm sure that could be resolved with the REXML author.) As a proof of concept, see the attached code. We use it in our company to load and process XML files generated by our tools and OpenOffice Calc. I just tested it on a 1MB XML from an .ods file, which it loaded successfully in < 2 seconds. Writing a fast XPath implementation to match this might be quite a challenge, though. ;) Dennis --------------070706010408000105090809 Content-Type: text/plain; name mlsimple2.rb" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename mlsimple2.rb" require 'rexml/parsers/pullparser' module XmlSimple def self.load(filename) parse(File.read(filename)) end def self.parse(string) parser EXML::Parsers::PullParser.new(string) return Node.new(['root', {}], parser) end class Node include Enumerable instance_methods(true).each {|m| undef_method(m) unless m /__.*__/} attr_reader :name, :attr, :text, :children def initialize(token, parser) @name oken[0] @text ' @siblings self] @attr oken[1] @nodes } @children ] loop do if parser.has_next? tok arser.pull else tok EXML::Parsers::PullEvent.new([:end_element, 'root']) end case tok.event_type when :start_element node ode.new(tok, parser) @children << node if @nodes[tok[0]] @nodes[tok[0]].push_sibling(node) else @nodes[tok[0]] ode end when :end_element raise unless tok[0] @name return when :text @text << tok[0] @children << tok[0] end end end def push_sibling(node) @siblings << node end def to_a @siblings end def each(&block) @siblings.each(&block) end def method_missing(m) return @nodes[m.to_s] end def [](m) return @nodes[m] end def inspect(indent ') r ndent + @name + ":\n" indent + ' r << indent + 'attr: ' + attr.inspect + "\n" unless attr.empty? r << indent + 'text: ' + text.inspect + "\n" unless text.empty? @nodes.each do |k, v| v.each {|n| r << n.inspect(indent)} end return r end end end --------------070706010408000105090809--