Howdy, Gavin. Thanks for taking the time to answer!

Tobias responded to a lot of your comments already; I just wanted to add a 
little more:

Gavin Sinclair wrote:
>> So one might start out with
>>
>>     class LogLine
>>         attr_reader :month, :day, :hour, :minute, :second,
>>                 :host, :program, :pid, :message
>>         [...]
>>     end
> 
> I'd use a Time object instead of :month, :day, etc.

We're thinking along the same lines; part of that [...] is a method that 
returns a Time object. But I did it this way for a couple of reasons. One 
is that because time in the standard syslog file is ambiguous (you'll note 
there's no year or tz field), it's important to offer the uninterpreted 
data. Anotherissue is one of speed: I'd rather defer creation of the Time 
object until somebody actually needs it.

> I wouldn't try to make subclasses of LogLine straight away, unless you
> already know what you want to do with them.

Happily, I do. I need to process my mail logs to see which DNS-based 
blacklists would cut spam without yielding false positives. This struck me 
as a fine project to learn Ruby with. But once I have a some decent 
classes, they will undoubtedly be reused for other log analysis.

> Passing the string (minus the date) is all I can think you can do.  You
> can't slice the string in a way that's meaningful to all subclasses.

Well, that's not entirely true. All LogLine subclasses agree on the basic 
fields in the LogLine class, so I want to slice those up first. Other 
subclasses may be able to ferret out additional data from the message, so 
each subclass may do further slicing.

>> Done naively, this would mean that each subclass would parse the line,
>> but  that would be wasteful. So obviously, the superclass should do the
>> basic  parsing once and pass the details down. Passing as an Array is
>> brittle and  not very OO. Passing as a Hash is better, but still not
>> right.
> 
> BTW There's nothing non-OO about arrays in Ruby.  They're not like Java!

Ah, I should have explained more what I meant.

In OO analysis and design, you find groups of data that travel together and 
call them objects. If I break a log line up into an array of fields

    ["Aug", "24",  "12", "31", "01", "myhost", "sendmail", "12345", "foo"]

and pass that around, that's not very OO because there is an implicit 
structure that isn't expressed in my representation. Arrays are just a row 
of things, and I know more about the log line than that. So I could use a 
hash to represent the structure and the data together:

    { "month"=>"Aug", "day"=>"24", "hour"=>"12", [...] }

Which is better, right? With the array if I added a field in the middle (say 
syslog starts logging microseconds) then everything breaks. But with the 
hash, it's more flexible.

But then I've got to say to myself, "Gosh, that hash looks a lot like an 
object." And if I turn it into a full-fledged object, I can add behavior, 
like a call to get a Time object.

So that's what I meant when I said that passing arrays like this around 
isn't such good OO design.

> Basically, I think you should put all the logic of what subclass best
> represents the line into LogLine, like: [...]
> 
>     case line
>       when SENDMAIL_REGEX
>         return SendMailLogLine.new(...)
>       when NAMED_REGEX
>         return NamedLogLine.new(...)
>       else
>         return BasicLogLine.new(line)
>     end
>   end
> end
 
That's a good start, but you will run into trouble with that over time. 

The case statement approach works for a few LogLine types, but is brittle. 
If you only have a couple of subclasses, then keeping the superclass in 
sync with the subclasses is easy. But with 20, it's a big pain, exactly the 
kind of pain that OO is supposed to save us from.

And if you want other people to be able to write dynamically loadable log 
parsing modules that extend your current functionality, the case statement 
approach breaks down entirely.

It's been my experience that often when I write a complicated if or case 
statement, there's a set of subclasses waiting to be born. Or if they 
already exist, then there's an opportunity to move the code into the 
subclasses.

> It may be a bit awkward for the client of LogLine to know what to do with
> the return value - it'll probably have to type-check it - but that's
> because of the subclassing, not because of my approach.

I don't think awkward is quite the right word; one of the points of object 
orientation is that you have typing built in to the language.

In my case, some parts of the code are interested only in certain types of 
LogLines; they will indeed check for type. But many parts of the code will 
work with any LogLine at all. So a ReportGenerator object would accept a 
collection of LogLine objects, but the SendmailReportGenerator would only 
choose to process things of the SendmailLogLine type. 


Thanks again for taking the time to lend me a hand!

William