On Mar 28, 6:11 pm, Adam Akhtar <adamtempor... / gmail.com> wrote:
> Hi im starting to use hrpicot and im having problems extracting
> descriptions of various concert events from a page. Here is a sample of
> the html
>
> <p>
> <a name="concerts"/>
> <span class="heading">Concerts</span>
> <br/>
> <span class="subheading">POPULAR</span>
> <br/>
> <br/>
> <span class="textbold">Middle Field! Vol.4</span >
> <br/>
> Featuring electric-pop band The Stealth, Mac and Masaru, and others. Mar
> 28, 7pm, ,500 (adv)/ íŽ3,000 (door). Shibuya O-Nest. Tel: 03-3498-9999.
> <br/>
> <br/>
> <span class="textbold">Philip Woo featuring Brenda Vaughn</span>
> <br/>
> Japanese pianist and soul singer performing with Andy Wulf and Kaori
> Kobayashi. Mar 28 & 29, 7 & 9:30pm, íŽ3,150. Cotton Club, Marunouchi.
> Tel: 03-3215-1555.
> <br/>
> ..
> ..
> ..
> etc
>
> I can get the artist band names fine using
> names = doc.search("//span[@class='textbold']")
>
> but i cant get teh descriptions. In fact the descriptions aren't
> indvidually wrapped up in any tags but rather just clumped together
> under the paragraph tab with line breaks <br/>
>
> So I thought id just try
> descriptions =
> doc.search("/html/body/div/table/tbody/tr[4]/td/table/tbody/tr/td[2]/table/tbody/tr/td/span/p")
> but when i try to puts descriptions nothing is printed to the screen.
>
> How would i go about getting this info??? any tips or ideas?
>
> Thanks
> --
> Posted viahttp://www.ruby-forum.com/.

Once you have the 'name' node you can use next_node to get the next
elements in the document
This method should work for your example:

def print_names_and_descriptions(hpricot_doc)
  names = hpricot_doc.search("//span[@class='textbold']")

  names.each do |name|
    node = name.next_node
    node = node.next_node until node.text? and node.inner_text =~ /\w
+/

    puts name.inner_text
    puts node.to_s.strip
    puts
  end
end