A design method that ensures continued system operation in the event of individual failures by providing redundant elements.
At the component level, the design includes redundant chips and circuits and the capability to bypass faults automatically. At the computer-system level, any elements that are likely to fail, such as processors and large disk drives, are replicated.
Fault-tolerant operations often require backup or UPS (uninterruptible power supply) systems in the event of a main power failure. In some cases, the entire computer system is duplicated in a remote location to protect against vandalism, acts of war, or natural disaster.
See also clustering; data protection; disk duplexing; disk mirroring; redundant array of inexpensive disks; System Fault Tolerance.