--0015174c19962ca4900463147b70 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Thu, Jan 31, 2008 at 1:20 AM, Sunil Khedar <sunil / truesparrow.com> wrote: > I am working on a RSS parser script. Here I have to parser thousands and > thousands of RSS feeds every hour. > > I am looking for a optimized parser which can take parse all these > feeds. Please suggest the RSS parser you have come across. > Sounds like a case of premature optimization to me. If you intend to do anything like stick the data parsed from the feeds into a database or search index, I think you'll quickly find that will become the bottleneck, rather than the feed processing itself. My company went through something similar, with a performance obsessed former C++ programmer looking for the fastest feed parsing solution available. He settled on building his own, highly procedural feed processor around libxml-ruby after benchmarking several of the solutions available. However, soon after he discovered that updating the database and search index was a far bigger bottleneck, one he spent the next several months addressing. Feed parsing speed went completely by the wayside. If you intend to do any sort of indexing of the feeds at all, you should really focus on building a maintainable feed reader, as opposed to a fast one. The database and/or search index are going to be your bottleneck anyway, so don't let the desire for speed trump things like correctness and code clarity. Feed processing is something that scales horizontally using a queue and multiple feed reader processes, as opposed to databases and search indexes which generally don't scale quite as well. Given that, I would suggest looking at existing solutions like feedtools and feedzirra before trying to write your own, and if you do, go with Nokogiri. It has a nice, clear, easy-to-use API and is relatively fast. -- Tony Arcieri --0015174c19962ca4900463147b70--