Hello everyone. It's me: Derek, again! Sorry for writing a novel here,
but I'd really appreciate some help.

I'm still working on the same program -- a way to show valid course
combinations for my school schedule, using an HTML file that contains
all the courses for the semester.

I have a rough draft copy of it working, but I'd like to see an example
of a more elegant coding style than my own.

Here's a (simplified) example of the data I'm working with:

<tr>
  <td>Intro to Programming<td> #title
  <td>MW</td> #days
  <td>9:00am-10:30am</td> #time
  <td>Dr. Smith</td> # professor
</tr>
<tr>
  <td>Intro to Knitting<td>
  <td>TR</td>
  <td>9:00am-10:30am</td>
  <td>Dr. Mittens</td>
</tr>

Earlier, someone on the forum showed me a very elegant way to collect
this information (I use Nokogiri). It was:

doc = Nokogiri::HTML(open(url))

raw_course_list = doc.css("tr").collect { |row|
  row.css("td").collect { |column|
    column.text.strip
  }
}

This would give me an array of arrays in the format
[[courseA,data,data], [courseB, data, data]].

E.g., in this case it would yield:
[["Intro to Programming", "MW", "9:00am-10:30am", "Dr. Smith"], ["Intro
to Knitting", "TR", "9:00am-10:30am", "Dr. Mittens"]]

This works perfectly, except in 3 main cases.

*** Problem 1: The <tr> does not contain course information. (It's some
irrelevant part of the HTML). In this case, I did the following:
raw_course_data.reject! { |i| i.size != 4 }, would filtered out
non-courses. Note: no tables without course data had the size of one
with course data (in the non-simplified version, the size is actually
much larger).

So, already I think it's ugly coding! It firsts loads ALL <tr> contents
into arrays, then rejects them after creation.

*** Problem 2: In a few cases, some courses do not have specified days
and times yet. In those cases, the course days reads "TBA" (to be
announced), and there is no column for time. Thus, the array of such
courses is 1 less than the normal expected case.

E.g.:

<tr>
  <td>Algebra<td>
  <td>TBA</td> # notice there is now 1 <td> for day/time now
  <td>Dr. Calculator</td>
</tr>

Thus, I create ANOTHER time that Ruby goes back over the elements of
raw_course_list again. This time, the code is put right before problem
1's fix:

raw_course_list.each { |i|
  if i.size == 3
    i.insert(2, "")
  end
}

So again, if an array has a size of 3, I figure it's a valid course,
just with no time assigned, so I create a blank element between the day
and professor, just to satisfy the Course class, which these array
elements of the outer array will ultimately become. E.g. of call:
Course.new(title, day, time, professor)

*** Problem 3: Some rows of the HTML are actually a continued
description of the course in the row above. For example, a course that
has a lab might look like this:

<tr>
  <td>Chemistry /w Lab<td>
  <td>TR</td>
  <td>9:00am-10:30am</td>
  <td>Dr. Chemicals</td>
</tr>
<tr>
  <td><td> # Empty, since the above row provides the course name
  <td>R</td> # Day of the lab
  <td>11:00am-12:30pm</td> # Time of the lab
  <td>Dr. Chemicals</td>  # Lab professor
</tr>

The good news is it's the same length as a normal class. So for this, I
add a bit more code to problem 2's code (the each block), changing the
each method to .each_with_index:

raw_course_list.each_with_index { |i, index|
  if i.size == 3
    i.insert(2, "")
  end

  # NEW CODE FOR LABS (still working out the kinks, but hopefully I
won't need this)
  # lab will always have a size of 4 and a empty first element so:
  if i[0].empty?
    # add all the data from the lab to the previous course:
    raw_course_list[index-1].push(i.each { |element| element })

    # then remove lab from raw_course_list
    raw_course_list.pop(index)

    # index has to go back one to avoid skipping an element (since we
popped one)
    index -= 1
  end
}

==================================

So there you have it. Can anyone think of a way where I can improve the
quality and elegance of this code?
-- 
Posted via http://www.ruby-forum.com/.