On 12 Oct 2006, at 11:44, Hugh Sasse wrote:

> So, my question is this: Given that since I started working in
> computing there have been major strides in software development,
> such as Object Oriented programming becoming mainstream, development
> of concepts like refactoring, development of practices such as the
> Agile methodologies, not to mention developments in networking and
> databases, what are the parallel developments in debugging large
> systems?  By large, I mean sufficiently large to cause problems in
> the mental modelling of the dynamic nature of the process, and
> involving considerable quantities of other people's code.

Debugging large systems is indeed hard -- lots to remember at once  
with too many moving parts.  It's much easier to debug small things.

Alongside all the other strides you mention, testing has improved no  
end as I am sure you are aware.  Tools like autotest [1] make it easy  
to run tests against your system all the time, so you notice sooner  
rather than later when it doesn't behave the way you (via your tests)  
expect it to.

It's daunting when confronted with a large pile of someone else's  
code, especially if that code doesn't have tests, but you have to  
start somewhere.  You can write tests against the third-party code  
which over time become your own personal (executable) documentation  
of its API.

With your specific problem, perhaps you could write a test for the  
operation you are trying to do.  Start with one that passes.  Now add  
more tests until one fails -- should be easy as it sounds like you  
can reliably make the system fail.  Now you can iteratively try to  
write intermediate tests between the one that passes and the one that  
fails, until you isolate the problem to a very small change  
somewhere.  Think of it as a binary search via tests of the problem  
space.  Hopefully you will be able to converge on the problematic  
needle in the haystack of code.

And then when you have isolated and fixed the problem, you have a  
nice set of tests which ensure it won't reappear later on -- a  
benefit which manually inspecting log files doesn't confer.

You probably know all this already, so apologies ;-)

Good luck,
Andy Stewart


[1] http://www.zenspider.com/ZSS/Products/ZenTest/