Table of contents
1.
Introduction 
2.
The ‘CAP’ in the CAP 
2.1.
Consistency in CAP
2.2.
Availability in CAP 
2.3.
Partition Tolerance 
3.
Frequently asked questions 
4.
Key takeaways 
Last Updated: Mar 27, 2024

CAP theorem

Author Alisha chhabra
2 upvotes
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction 

In the modern age, everything runs on the cloud. The majority of the web applications are written with cloud technologies - they use distributed caching and distributed data stores. In simple words, A distributed system is a group of independent computers that are seen by their users as a single cohesive system.

Cloud solutions are so popular among engineers since they come up with several advantages over traditional methods:

  1. Scalability: Rather than scaling vertically (adding additional storage or computation power to current machines), distributed systems allow you to extend horizontally by bringing additional servers.
  2. High fault tolerance: Even if one server or node in the system is infiltrated, the remaining servers will keep your application and data running.
  3. High Performance: When compared to centralised computer network clusters, it can deliver more performance at a lower cost.
  4. Low latency: Since your data is scattered throughout the globe, consumers may access it quickly from their local server rather than fetching it across the globe.

However, distributed systems are not invincible to failure. Along with its benefits, it has various drawbacks that might cause the system to crash. To develop a solution, we must be aware of the situations in which the distributed system fails or responds erroneously. 

The CAP theorem defines what happens to a distributed data store when things go wrong at the edge situations.

Before we look at what the CAP theorem entails, we’ll first look at CAP:

The ‘CAP’ in the CAP 

In CAP, C stands for Consistency, A for Availability, and P for Partition Tolerance. 

Let us understand each attribute with an example:

Assume you've opened an online business where you keep track of your customers' important dates. 

You've indeed witnessed a lot of people neglect critical deadlines and thereby it stunts their progress. You intend to start a business that accepts customers’ significant dates, and your company sends a reminder a day before the deadline. Your company name is “Always there for you.”

You received fewer calls at first, but as your business grows, you will receive a sufficient number of calls every day. 

You now have to deal with many calls alone since you are the only person in your company. So you've decided to recruit a group of people with whom you can divide/distribute your task.

This is precisely how distributed systems work in a real-life scenario. 

Now that you've recruited five folks, let's say your task becomes easier and more efficient since each of you maintains a record of daily calls with the customer's name and respective information. To answer calls, each member of your team uses a common cell phone number which is routed. 

You get a call from one of your customers one day, seeking to update his information. You say, "Please wait a moment while I check your name on my sheet." You haven't discovered any information on him yet. No, sir, there is no record of you in our files, you say. Because you and your team used the same phone number to answer calls, your coworker was probably the one who answered the customer's call. As a result, the client experience diminishes, and he will not contact your company again.

What steps can you take to improve the client experience? You discussed with the team and came up with a solution: anytime someone answers the phone, they must enter the information into each of the entries. That is, all data must be copied in each record at the same time. That is precisely what is meant by data consistency, in which all data in any system is equal regardless of where the client accesses the data.

Consistency in CAP

According to CAP, a system is said to be consistent if all nodes see the same data simultaneously. Take, for example, Google excel sheets. Assume that many users have read and write access to the sheet. If someone makes a change, it must now be reflected on your end. Similarly, any changes you make must be reflected on their end. In CAP, this is referred to as consistency. The data is replicating synchronously across the systems. 

Any read that is happening after the latest write, all the nodes must return the latest value of that write

Customers have no complaints about anything since your firm is currently running well. Two of your teammates get sick one day, which means they won't be able to work that day, leaving you and the other teammates to handle all of the calls on their own. Assume your organization receives 3000 calls each day, which is spread across 6 teammates, including you. Each member answers approximately 500 calls. As two people become ill, the burden of calls falls more heavily on you and your teammates. Because of timestamps, you may not be able to address all of the calls. This means that if your organization's throughput is 3000, now due to the absence of teammates it is reduced to 2000.

Now, this scenario leads to the unavailability of the service. Whereas, in CAP, A stands for Availability which refers to a non-error response to every request. To maintain availability, you plan to hire more people so that the throughput won’t be affected if someone falls sick or for any other reason. 

 

Availability in CAP 

Availability means that any client requesting data will receive a response, even if one or more nodes are unavailable. Another way to put it is that every operational node in the distributed system, without exception, returns a valid response to any request.

We can provide a highly available service by responding to a query with the current value on the server. If we do this, there is no guarantee that the value is the most recent value provided to the database. A recent write may still be in transit somewhere.

Hence, to be the system highly available, there are chances to respond to inconsistent data. 

 

The above example represents the concept of availability which believes that responding to the request is more important than delivering consistent data, i.e., the latest write data.

You have understood the Availability in CAP, now what else?

Let us retake the previous example and consider the scenario where everything is running cool, but something happened which is not in your hand,i.e., Network failure between two nodes. 

Suppose you use any communication device to transfer the data to make it consistent, but one day the network breakage occurs between you and one of your teammates. 

Since there is no communication medium to transfer the data, the system can behave as “Inconsistent” or “unavailable.” Such type of issue is known as Partition tolerance

Partition Tolerance 

According to CAP, Regardless of whether messages are dropped or delayed between nodes in a system, the system does not fail.

Normally, your data store performs all three functions. However, according to the CAP theorem, you can provide either consistency or availability when a distributed database has a network failure.

It's a tradeoff.  Other than that, all three can be delivered. However, in the event of a network outage, a decision must be made.

Partition tolerance is expected in the theorem. The assumption is that the system functions on a distributed data store, which means that the system is designed to cope with network partitions. Because network faults are unavoidable, partition tolerance is required to provide any form of dependable service.

Now, what to choose or what not to choose is all dependent on the use case.

 

 

Most banking applications, for example, prioritise consistency above availability since they don't want their users' data to fluctuate. Consider the situation where you have put x amount of money into an ATM and see if your mobile application has been updated with the latest update or not. The update is not being communicated to the server's application owing to the network problem. You may believe you've lost all of your money at this point and begin blaming the bank.

This degrades the customer experience, something no bank can afford. That is why they have chosen to do so—consistency over availability with partition tolerance, i.e., CP in CAP.

So, what happens if you prioritise consistency over availability?

Consider the prior case in which you and one of your teammates had a network outage:

In such a case, how would you approach Consistency over Availability with Partition Tolerance?

You will notify the person who is experiencing network problems that you may depart today. We cannot afford to send inconsistent data, so you must remain inactive until the network stabilises. This has now impacted the system's availability.

But what if data availability is more important than data consistency?

In such a circumstance, you will allow the partitioned person to remain active and answer calls, but when the network disruption is resolved, the person will send the data across the systems.

Such a use case is common in e-commerce and social applications where data availability is more important than consistency.

Consider the example of e-commerce platforms such as Amazon if the buyer added an item to his cart two days ago. And now he's adding three more goods to his cart, but the system is experiencing partition tolerance, which means the update isn't reflected on the UI. Assume a user removes one item because it will not be reflected; therefore, the most recent write data is not replicated to the system.

What happens at the conclusion is that the set's union operation is performed so that the user can see all of the data in his cart even if he has removed something. That is what the CAP theorem means by availability over consistency with partition tolerance.

To summarize the above points:

  • High consistency comes at the expense of decreased availability.
  • High availability comes at the expense of inconsistency.

This is not it yet, there is a lot more to explore in the CAP theorem, but it is essential to know about the stated three attributes of CAP. 

Let’s now have a look at some faqs based on the discussion we had. 

Frequently asked questions 

  • What does it mean by high availability?

As the name indicates, availability refers to accessing an application in terms of the system design. If a user is unable to access a web application, this shows that the software is unavailable.

The term "high availability" refers to the application's ability to run without interruption.

  • Why is the C in CAP not what is C in ACID?

Though the letter C in ACID and CAP stands for consistency, yet they have distinct meanings. In CAP, consistency implies "all nodes view the same data at the same time," but in ACID, consistency means "every transaction the database makes will take it from one consistent state to another."

  • Why is it not possible to satisfy all three properties of the CAP theorem?

In real-world systems, satisfying all three CAP properties is impossible. Because some things are beyond our control, for example, consider network failure. Network failure causes partition tolerance, which means that to avoid a bad customer experience, one must sacrifice either availability or consistency.

  • What is Eventual Consistency in CAP theorem?

The term "eventual consistency" refers to having multiple copies of data on various machines to achieve high availability and scalability. As a result, any changes made to a data item on one device must be propagated to other replicas.

Key takeaways 

Here we come to an end, but the discussion is not over yet. There is much more excellent content waiting for you.

Understanding how system architecture is designed and handled is crucial for an engineer since soon you’ll be working on that regardless of which role you opt for. 

Knowing the system is super essential. Most high-traffic companies ( e.g., Amazon, Google, Facebook) employ CAP Theorem as a basis for deciding their application architecture. 

Thanks for reading, but don’t stop here; follow up on more articles to ease the workload of engineering. Once you've grabbed something, it's really simple to apply it in real life or at work. Hence, make sure you go over each point mentioned in the article for better understanding. Read it twice, thrice, and even more. 

Continue to soar, Ninja!🐾

Live masterclass