Introduction
The students are asked to choose Mathematics or Biology as their first optional subject in a class. Some students choose Mathematics, some choose Biology, and some choose both. How can we find out the number of students(n) in that class?
Let n1 = the number of students taking Mathematics,
n2 = the number of students taking Biology
and n3 = the number of students taking both
If asked how many students have taken either Mathematics or Biology, we add the number of students taking both the subjects, i.e., n = n1 + n2. By doing so, we have counted the students who have taken both subjects twice. So we need to subtract them once. Hence the solution is n = n1 + n2 -n3. This is nothing but the Inclusion-Exclusion principle of set theory.
Inclusion-Exclusion Principle
In case of two sets
In many problems, we must include contributions of more than one term in our answer. This results in the inclusion of the same term more than once; hence we use the inclusion-exclusion principle. Clearly, in set theory, the union of two sets A and B can be represented as :
|A ∪ B| = |A| + |B| − |A ∩ B| …….(i)
The Venn Diagram above provides a transparent understanding and proves eq(i). The event A ∩ B has been double-counted while adding the elements of A and B. Thus, this double-counting is corrected by subtracting the number of elements in A ∩ B. The co-related equation in probability theory is given by-
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) .......(ii)
In case of three sets:
The cardinality of three finite sets, A, B, and C, i.e., |A ∪ B ∪ C|, we once again apply our provisional(incorrect) guess that it may be |𝐴|+|𝐵|+|𝐶|. That way, we tend to count some elements once and others more than once. In the Venn Diagram above, the cardinalities of |𝐴|, |𝐵| and |𝐶|(dark blue areas) are counted once, those of |𝐴 ∩ 𝐵|, |𝐴 ∩ 𝐶| and |𝐵 ∩ 𝐶|(light-blue areas) twice, and that of |𝐴 ∩ 𝐵 ∩ 𝐶|(centre region) is counted thrice. To fix this problem, we subtract the cardinalities of all the intersections of two sets, which ensures that the light blue areas are counted once. However, the formula is still wrong for the centre elements, counted three times and then un-counted three times for a net-zero time. Therefore, the cardinality of the centre needs to be added.
So, the final formula is:
|A∪B∪C| = |A| + |B| + |C| − |A∩B| − |A∩C| − |B∩C| + |A∩B∩C|...(iii)
The correlated equation in probability theory is given by:
P(A∪B∪C) = P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C)...(iv)
The inclusion-exclusion principle is the generalisation of equations (ii) and (iv) to an arbitrary number of sets.