BA wrote:
> Yes, I want to extract the PDAT element, however, I want to use the B110 
> tag to find this element.  The XML *is* predictable, however, there are 
> variations in the placement of the elements (there could be several 
> different address fields and/or many paragraphs that need to be 
> parsed/searched).  The files are *extremely* large (some could be as 
> large as 1-2GB).  I would prefer to do all of the processing in Ruby if 
> this is possible (want to use the OO functionality for the text 
> processing I want to do) and would like to also incorporate regex if 
> possible (started doing this by parsing the file line by line, however, 
> ran into malformed XML where I decided that I needed to use the database 
> functionality of XML.  Not sure if DOM would work.  Could not get XPath 
> to work.  The listener was, quite frankly, a SWAG.  Thanks.

OK, I got the picture.

I would suggest the pull parser.  Open up a file stream and keep pulling 
events.  When you get a start_element event, check the element name.
If it is B110, then, loop and pull events until the PDAT element.
Then pull until text event.
Grab text and store it or whatever.
Go back to main loop, looking again for that B110 element.


Something like this:

#!/usr/bin/env ruby
require 'rexml/parsers/pullparser'

include  REXML::Parsers

$text = []

def pdat( parser )
   while parser.has_next?
     pull_event = parser.pull
     $text.push( pull_event[0] ) if  pull_event.text?
  end
end

def get_text parser
   while parser.has_next?
     pull_event = parser.pull
     b110( parser ) if pull_event.start_element? and
                       pull_event[0] =~ /B110/
   end
end

def b110( parser )
   while parser.has_next?
     pull_event = parser.pull
     pdat( parser ) if pull_event.start_element? and
                       pull_event[0] =~ /PDAT/
   end
end



File.open( "pdat.xml", "r") { |f|
   parser = PullParser.new( f )
   b110( parser )

}

puts $text.join( "\n" )




James

-- 

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com  - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com  - Playing with Better Toys