On Mon, 24 Apr 2006, Benedikt Heinen wrote:

> I'm currently looking for a data storage backend for a little pet project of 
> mine. The catch is - two systems are running the software and need to be able 
> to access data (r/w), and it needs to continue function, if either of the two 
> goes down (i.e. just putting a DB on one of the two isn't going to help).

The most common error in designing fault tolerant systems is not the
technology choice, but failing to carefully identify the threats, risks
and costs.

ie. What threat do you want to tolerate?

Mad man with Chainsaw? Powerfailure? Human error? That one is the
hardest, how do you protect against an idiot with root? Or worse, a
smart malicious one with root. Hardware failure?

What are the probabilities? ie. How likely is it to occur? What is the
likelyhood of double threats? Accidental low, if malicious attack, high.

What are the costs of downtime? How hot a hot standby do you need? If
controlling something that will go disastrously wrong in milliseconds,
very hot standby. If just a web site you may lose 2 browsing customers
out of ten thousand browsing customers only 100 of which may actually buy
something. ie. Hot standby just adds cost and complexity and increased
risk of failure for way more money than it would save.

eg. A journalling file system reboots fast. Ask google about "crash only
software".




John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter / tait.co.nz
New Zealand

Carter's Clarification of Murphy's Law.

"Things only ever go right so that they may go more spectacularly wrong later."

From this principle, all of life and physics may be deduced.