Howdy, Gavin. Thanks for taking the time to answer! Tobias responded to a lot of your comments already; I just wanted to add a little more: Gavin Sinclair wrote: >> So one might start out with >> >> class LogLine >> attr_reader :month, :day, :hour, :minute, :second, >> :host, :program, :pid, :message >> [...] >> end > > I'd use a Time object instead of :month, :day, etc. We're thinking along the same lines; part of that [...] is a method that returns a Time object. But I did it this way for a couple of reasons. One is that because time in the standard syslog file is ambiguous (you'll note there's no year or tz field), it's important to offer the uninterpreted data. Anotherissue is one of speed: I'd rather defer creation of the Time object until somebody actually needs it. > I wouldn't try to make subclasses of LogLine straight away, unless you > already know what you want to do with them. Happily, I do. I need to process my mail logs to see which DNS-based blacklists would cut spam without yielding false positives. This struck me as a fine project to learn Ruby with. But once I have a some decent classes, they will undoubtedly be reused for other log analysis. > Passing the string (minus the date) is all I can think you can do. You > can't slice the string in a way that's meaningful to all subclasses. Well, that's not entirely true. All LogLine subclasses agree on the basic fields in the LogLine class, so I want to slice those up first. Other subclasses may be able to ferret out additional data from the message, so each subclass may do further slicing. >> Done naively, this would mean that each subclass would parse the line, >> but that would be wasteful. So obviously, the superclass should do the >> basic parsing once and pass the details down. Passing as an Array is >> brittle and not very OO. Passing as a Hash is better, but still not >> right. > > BTW There's nothing non-OO about arrays in Ruby. They're not like Java! Ah, I should have explained more what I meant. In OO analysis and design, you find groups of data that travel together and call them objects. If I break a log line up into an array of fields ["Aug", "24", "12", "31", "01", "myhost", "sendmail", "12345", "foo"] and pass that around, that's not very OO because there is an implicit structure that isn't expressed in my representation. Arrays are just a row of things, and I know more about the log line than that. So I could use a hash to represent the structure and the data together: { "month"=>"Aug", "day"=>"24", "hour"=>"12", [...] } Which is better, right? With the array if I added a field in the middle (say syslog starts logging microseconds) then everything breaks. But with the hash, it's more flexible. But then I've got to say to myself, "Gosh, that hash looks a lot like an object." And if I turn it into a full-fledged object, I can add behavior, like a call to get a Time object. So that's what I meant when I said that passing arrays like this around isn't such good OO design. > Basically, I think you should put all the logic of what subclass best > represents the line into LogLine, like: [...] > > case line > when SENDMAIL_REGEX > return SendMailLogLine.new(...) > when NAMED_REGEX > return NamedLogLine.new(...) > else > return BasicLogLine.new(line) > end > end > end That's a good start, but you will run into trouble with that over time. The case statement approach works for a few LogLine types, but is brittle. If you only have a couple of subclasses, then keeping the superclass in sync with the subclasses is easy. But with 20, it's a big pain, exactly the kind of pain that OO is supposed to save us from. And if you want other people to be able to write dynamically loadable log parsing modules that extend your current functionality, the case statement approach breaks down entirely. It's been my experience that often when I write a complicated if or case statement, there's a set of subclasses waiting to be born. Or if they already exist, then there's an opportunity to move the code into the subclasses. > It may be a bit awkward for the client of LogLine to know what to do with > the return value - it'll probably have to type-check it - but that's > because of the subclassing, not because of my approach. I don't think awkward is quite the right word; one of the points of object orientation is that you have typing built in to the language. In my case, some parts of the code are interested only in certain types of LogLines; they will indeed check for type. But many parts of the code will work with any LogLine at all. So a ReportGenerator object would accept a collection of LogLine objects, but the SendmailReportGenerator would only choose to process things of the SendmailLogLine type. Thanks again for taking the time to lend me a hand! William