On Tue, Nov 07, 2006 at 08:03:40AM +0900, David Vallner wrote:
> pdg wrote:
> > Hi All,
> > 
> > As a first exercise with Ruby, I am going through the Pickaxe book and
> > creating a jukebox. I haven't even tried to create an array of songs
> > yet, because I got distracted and wanted to work this out. I am trying
> > toi feed in the data from my iTunes xml file to it to get the data, I
> > can get it to work if I delete most of the xml file, but when it's 5-6
> > gig, 
> 
> OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> fit into there with base64-encoded contents :P
> 
> > rexml just seems to die. I have vaguely heard that stream parsing
> > may be the answer, but am totally unaware of how to use it.
> > 
> 
> Well, time to learn. I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing - which
> normally "blows up" the original XML document's size in bytes five times
> and more. And REXML definitely doesn't have performance of any kind
> amongst its qualities. (And for completeness' sake, I never 'clicked'
> with the API either, but I'm a minority there.)
> 
> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> Compiling Required - I think the one-click installer comes with one of
> these, buggered if I know which.

Ruby comes with a pull parser in the standard lib:
  http://ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Parsers/PullParser.html

I would give it a try on a document that large.

> 
> After that, Google is your friend. Look at the documentation to
> whichever parser you decided to use and use that - personally, I don't
> do much / no non-tree XML parsing at all, so I'm mainly guessing around
> on this. The main difference is that while with REXML, you can
> arbitrarily look around the XML document, with stream and pull parsing,
> you can only process the document in order, and have to keep the state
> of that processing (e.g. which track you're currently "working on") in
> your Ruby code.
> 
> David Vallner
> 

-- 
Aaron Patterson
http://tenderlovemaking.com/