Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Software Fault Tolerance
3.
Software Fault Tolerance Techniques
3.1.
Recovery Block Technique
3.2.
N-Version Software Technique
4.
FAQs
5.
Key Takeaways
Last Updated: Mar 27, 2024

Software Fault Tolerance

Author Rajat Agrawal
0 upvote

Introduction

In today's computation-based world, services must be highly dependable. Unplanned service interruption results in revenue loss and, in some situations, contractual penalties because the nearest competition is only a mouse click away.

Fault Tolerance has evolved to strengthen the dependability of computing systems, as many vital applications in modern society depend on computers.

Software Fault Tolerance (SFT) has become an important consideration due to the challenges of generating error-free software. Design faults account for the vast majority of software mistakes. The complexity of the problem domain is the primary cause of software design failures.

Several Fault Tolerance approaches are being investigated to minimize the impact of software faults.

Let’s learn about Software Fault Tolerance in-depth.

Software Fault Tolerance

Software fault tolerance refers to a software's ability to identify and recover from a fault that is occurring or has already happened in either the software or hardware in the system it is executing to provide service according to the specification.

Software fault tolerance is required for the next generation of highly available and reliable computing systems, from embedded to data warehouse systems.

It is necessary to comprehend the nature of the problem that software fault tolerance is designed to solve to understand it completely.

Most software flaws are design flaws. Software manufacturing, or the replication of software, is considered flawless. The problem stemming entirely from design flaws is unlike practically any other system where fault tolerance is desirable.

Fault-tolerant software ensures system reliability by using protective redundancy at the software level.

Let’s discuss the Techniques for obtaining Fault-tolerant software.
 

Also see,  V Model in Software Engineering

Software Fault Tolerance Techniques

There are two main techniques for obtaining fault-tolerant software:-

1.) Recovery Block

2.) N-Version Software

Both the above techniques are based on design diversity.

Design diversity refers to a system's components being developed using different designs but providing the same service. The basic premise of design diversity is that components built in different ways will fail in different ways.

As a result, if one of the redundant versions fails, at least one of the others will provide a desirable output.

Let’s understand these techniques one by one.

Recovery Block Technique

The Recovery Block technique is a simple technique developed by Randel in the early 1970s. 

In this technique, alternate software versions are organized like the dynamic redundancy (standby) approach in hardware. 

The recovery block deals with the Adjudicator. The adjudicator in the recovery block confirms the results of several implementations of the same algorithm. The system view is divided into fault recoverable blocks in a system with recovery blocks.

These fault-tolerant blocks are used to build the overall system. There is at least one primary, secondary, and exceptional case code in each block, as well as an adjudicator. The adjudicator is the component that determines whether the various blocks to try are correct.

The adjudicator should be kept simple to ensure execution speed and correctness. The adjudicator performs the primary alternate first when entering a unit. (A unit may have N alternates that the adjudicator can try.) If the adjudicator judges that the fundamental block failed, the system is rolled back, and the second alternate is tried.

If the adjudicator does not accept the other results, the exception handler is called, indicating that the software could not accomplish the required operation.

The recovery block technique puts more pressure on the specification to be specific enough to generate numerous functionally identical alternatives. This issue is also considered in relation to the N-version software technique.

N-Version Software Technique

The N-version software solutions seek to replicate the N-way redundant hardware idea of classical hardware fault tolerance. Every module in an N-version software system is completed using up to N in different ways. Each variant performs the same task, but preferably in another way. Each version then submits its response to a voter or decider, who determines the proper response and returns it as the module's outcome.

By relying on the design diversity principle, this system should overcome the design flaws that affect most software. The notion that the system could comprise several types of hardware and many software versions is an essential distinction in N-version software.

N-version software can only be successful and tolerate faults if the required design diversity is met. The importance of suitable specifications (including recovery blocks) in N-version software cannot be overstated. The delicate balance required by the N-version software method necessitates a specific specification that the various versions are completely inter-operable, allowing a software decider to choose equally between them, but not so limiting that software programmer are unable to create diverse designs. It's challenging to encourage design diversity while maintaining version compatibility in the specification; nonetheless, most modern software fault tolerance approaches rely on this delicate balance.

The N-version approach allows for various defects to be generated, but the system is successfully masked and disregarded. However, it is critical to identify and correct these flaws before they become errors. First, the fault classification approach for N-version software: if an N-version system only has one version, the error is categorized as a simplex fault. If there are defects in M versions of an N-version system, the fault is called an M-plex fault.

FAQs

  1. What is Software Fault Tolerance?
    Software fault tolerance refers to a software's ability to identify and recover from a fault that is occurring or has already happened in either the software or hardware in the system it is executing to provide service according to the specification.
     
  2. What are the two Software Fault Tolerance Techniques?
    The two software fault tolerance techniques are Recovery Block and N-Version Software.
     
  3. List two major differences between Recovery Block and N-Version Software Techniques.
    The two major differences between RB and NVS are:-
    The acceptance test is not performed in NVS while it is performed in RB.
    NVS can be applied to critical systems while RB cannot.

Key Takeaways

In this article, we have extensively discussed Software Fault Tolerance and different fault tolerance techniques. 

We hope that this blog has helped you enhance your knowledge regarding Software Engineering and if you would like to learn more, check out our articles on Software Engineering, and Software Reliability Models

Do upvote our blog to help other ninjas grow.

Happy Coding!

Live masterclass