On Mar 28, 6:11 pm, Adam Akhtar <adamtempor... / gmail.com> wrote:
> Hi im starting to use hrpicot and im having problems extracting
> descriptions of various concert events from a page. Here is a sample of
> the html
>
> <p>
> <a name=3D"concerts"/>
> <span class=3D"heading">Concerts</span>
> <br/>
> <span class=3D"subheading">POPULAR</span>
> <br/>
> <br/>
> <span class=3D"textbold">Middle Field! Vol.4</span >
> <br/>
> Featuring electric-pop band The Stealth, Mac and Masaru, and others. Mar
> 28, 7pm, =A52,500 (adv)/ =A53,000 (door). Shibuya O-Nest. Tel: 03-3498-999=
9.
> <br/>
> <br/>
> <span class=3D"textbold">Philip Woo featuring Brenda Vaughn</span>
> <br/>
> Japanese pianist and soul singer performing with Andy Wulf and Kaori
> Kobayashi. Mar 28 & 29, 7 & 9:30pm, =A53,150. Cotton Club, Marunouchi.
> Tel: 03-3215-1555.
> <br/>
> ..
> ..
> ..
> etc
>
> I can get the artist band names fine using
> names =3D doc.search("//span[@class=3D'textbold']")
>
> but i cant get teh descriptions. In fact the descriptions aren't
> indvidually wrapped up in any tags but rather just clumped together
> under the paragraph tab with line breaks <br/>
>
> So I thought id just try
> descriptions =3D
> doc.search("/html/body/div/table/tbody/tr[4]/td/table/tbody/tr/td[2]/table=
/tbody/tr/td/span/p")
> but when i try to puts descriptions nothing is printed to the screen.
>
> How would i go about getting this info??? any tips or ideas?
>
> Thanks
> --
> Posted viahttp://www.ruby-forum.com/.

Once you have the 'name' node you can use next_node to get the next
elements in the document
This method should work for your example:

def print_names_and_descriptions(hpricot_doc)
  names =3D hpricot_doc.search("//span[@class=3D'textbold']")

  names.each do |name|
    node =3D name.next_node
    node =3D node.next_node until node.text? and node.inner_text =3D~ /\w
+/

    puts name.inner_text
    puts node.to_s.strip
    puts
  end
end