David Vallner wrote:
> pdg wrote:
>   
>> Hi All,
>>
>> As a first exercise with Ruby, I am going through the Pickaxe book and
>> creating a jukebox. I haven't even tried to create an array of songs
>> yet, because I got distracted and wanted to work this out. I am trying
>> toi feed in the data from my iTunes xml file to it to get the data, I
>> can get it to work if I delete most of the xml file, but when it's 5-6
>> gig, 
>>     
>
> OMFG. That's a -huge- XML file. Probably all of my MP3s together would
> fit into there with base64-encoded contents :P
>
>   
>> rexml just seems to die. I have vaguely heard that stream parsing
>> may be the answer, but am totally unaware of how to use it.
>>
>>     
>
> Well, time to learn. I probably never even saw a computer that could
> handle a XML file that size using straightforward DOM parsing - which
> normally "blows up" the original XML document's size in bytes five times
> and more. And REXML definitely doesn't have performance of any kind
> amongst its qualities. (And for completeness' sake, I never 'clicked'
> with the API either, but I'm a minority there.)
>
> You want a Ruby binding to a stream or pull parser - to my best
> knowledge, REXML is neither. That means libxml2, expat, or Xerces.
> Compiling Required - I think the one-click installer comes with one of
> these, buggered if I know which.
>
> After that, Google is your friend. Look at the documentation to
> whichever parser you decided to use and use that - personally, I don't
> do much / no non-tree XML parsing at all, so I'm mainly guessing around
> on this. The main difference is that while with REXML, you can
> arbitrarily look around the XML document, with stream and pull parsing,
> you can only process the document in order, and have to keep the state
> of that processing (e.g. which track you're currently "working on") in
> your Ruby code.
>
> David Vallner
>
>   
Actually, I recently had to rewrite an xml parser to go stream ( SAX ) 
style ... REXML made the task VERY easy ...

Yes, it's not the fastest thing there is, but it was "fast enough" ...

Definitely try writing it with REXML before taking the route of anything 
heavier.

jd