On Aug 16, 2005, at 4:46 PM, Adam Sanderson wrote:

> I was wondering if anyone would be interested in, or knows of a  
> generic
> parsing library.

I've just recently been throwing together my own tool for this.  I  
just got done using it in a real-world (paid) project.  It's small  
and really just a chainsaw tool for data mining, but it seems to be a  
good start.  I haven't documented it yet, but here are a couple of  
examples from my unit tests:

     def test_complex
         path  = File.join(File.dirname(__FILE__), "ross_report.txt")
         test  = self

         input(path) do
             @state = :skip
             start_skipping_at("\f")
             stop_skipping_at(/\A-[- ]+-\Z/)
             skip(/\A\s*\Z/)
             skip(/--\Z/)

             find_in_skipped(/((?:Period|Week)\s+\d.+?)\s*\Z/) do | 
period|
                 test.assert_equal("Period  02/2002", period)
             end

             stop_at("*** Selection Criteria ***")

             read do |line|
                 test.assert_match(/\A\s+(?:Sales|Cust|SA)|\A[-\w]+\s 
+/, line)
             end
         end

         path  = File.join(File.dirname(__FILE__), "car_ads.txt")

         data = input(path, "") do
             @state = :skip
             stop_skipping_at("Save Ad")
             skip(/\A\s*\Z/)

             pre { @price = @miles = nil }
             read(/\$([\d,]+\d)/) { |price| @price = price.delete 
(",").to_i }
             read(/([\d,]*\d)\s*m/) { |miles| @miles = miles.delete 
(",").to_i }

             read do |ad|
                 if @price and @price < 20_000 and @miles and @miles  
< 40_000
                     (@ads ||= Array.new) << ad.strip
                 end
             end
         end

         assert_equal([<<END_AD.strip], data.ads)
2003 Chrysler Town & Country LX
      $16,990, green, 21,488 mi, air, pw, power locks, ps, power  
mirrors,
dual air bags, keyless entry, intermittent wipers, rear defroster,  
alloy,
pb, abs, cruise, am/fm stereo, CD, cassette, tinted glass
VIN:2C4GP44363R153238, Stock No:C153238, CALL DAN PERKINS AT  
1-800-432-6326
END_AD
     end

__END__

The first half of that is parsing the report from Ruby Quiz #17  
(http://www.rubyquiz.com/quiz17.html).  The second half is parsing a  
listing of car ads (very unstructured data) looking for cars below a  
certain price and mileage.

If people think this looking promising, I'll be happy to make it  
available.

James Edward Gray II