* Jamis Buck <jgb3 / email.byu.edu> [0926 16:26]:
> Dick Davies wrote:
> >long shot but what the hell - don't suppose any of you good
> >good people are sitting on a parser for Mozilla/Firefox bookmarks.html
> >files, by any chance?
> 
> Funny you should ask. :) I've had this for awhile, and I can't even 
> remember why I wrote it. It's pretty hacked together, and it's not a 
> true "parser" (I just search for certain patterns in the bookmark file) 
> and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks 
> file, but it should be fairly straightforward to modify for your own 
> purposes.
> 
> Hope this is at least close to what you are looking for... :)

Thanks a lot, it was handy to get a feel for it - I gave up on a parser too (I'd prefer not to require extra libs), and did a
cutdown homegrown version in the end (I only need url, folder info and description myself) :

-----------------------------------------------------------------
rasputin@lb:lib$ cat mozbooks.rb 
#!/usr/bin/env ruby

# quick and dirty bookmarks.html parser - thanks to Jamis Buck for the 'folder state machine' idea

class MozBooks

        # pull urls, descriptions and folder heirarchy info from mozilla/firefox bookmarks.html
        def self.parse(bm)
                folders = []
                bm.each_line{ |l|
                        folders.pop if l =~ /<\/dl><p>/i        # we just left a folder
                        folders << $1 if l =~ /\s*<dt><h3[^>]+>(.*)<\/h3>/i # we just entered a folder
                        puts "url = #{$1}, desc = #{$2}, folder = #{folders.join('/')}" if l =~ /a href="([^"]*)"[^>]+>([^<]+)</i
                }
        end
end

mb = MozBooks.parse($stdin)
-----------------------------------------------------------------

and that seems to work (enough info for my purposes anyway, I can feed this lot into del.icio.us).... thanks!

rasputin@lb:booty$ cat ~/bookmarks.html | ruby lib/mozbooks.rb |grep -i ruby|head
url = http://raa.ruby-lang.org/, desc = RAA - Ruby Application Archive, folder = toolbar/search
url = http://www.rubygarden.org/ruby?UsingRubyFastCGI, desc = Ruby: UsingRubyFastCGI, folder = toolbar/proj/FastCGI
url = http://dev.faeriemud.org/changes-1.8.0.html, desc = New Features in Ruby 1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubygarden.org/ruby?RIOnePointEight, desc = Ruby: RIOnePointEight, folder = toolbar/ruby/1.8
url = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, desc = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubyist.net/~matz/slides/rc2003/mgp00003.html, desc = MagicPoint presentation foils, folder = toolbar/ruby/1.8
url = http://whytheluckystiff.net/articles/2003/08/04/rubyOneEightOh, desc = whyTHEluckySTIFF ;,. What's Shiny and New in Ruby 1.8.0? .,;, folder = toolbar/ruby/1.8
url = http://images-jp.amazon.com/images/P/4894714531.09.LZZZZZZZ.jpg, desc = 4894714531.09.LZZZZZZZ.jpg (JPEG Image, 375x475 pixels), folder = toolbar/ruby/community
url = http://www2a.biglobe.ne.jp/~seki/ruby/, desc = I like Ruby., folder = toolbar/ruby/community
url = http://www.excite.co.jp/world/url/body/?wb_url=http%3A%2F%2Fwww.rubyist.net%2F%7Ematz&submit=%96%7C%96%F3&wb_lp=JAEN&wb_dis=2&wb_co=excitejapan, desc = Matz' Blog, folder = toolbar/ruby/community





-- 
It's always darkest just before it gets pitch black.
Rasputin :: Jack of All Trades - Master of Nuns