Inclusion, Exclusion

In this modern day and age, it simply cannot be doubted that it is of the very utmost importance that we find the size of the union of some sets. One might try just adding the sizes of all the sets, but that's not correct, because then one would be double-counting the elements that appear in more than one set. But it's a good start. One might then think that one could begin by adding the sizes of the sets, but then subtract the sizes of the intersections of each pair of sets, in order to correct for the double-counting. But this is also incorrect, because then what about the elements that appear in three sets and had thus initially been triple-counted?—after subtracting the pairwise intersections, these elements haven't been included in the count at all! So one realizes that one must then add the sizes of the triplewise intersections ...

And in one of the dark recesses of the human mind, untouched by outside light or sound, silent and unyielding to the invidious scrutiny of introspection and cognitive science—a conjecture is formed—

The size of the union of n sets is given by the alternating (starting from positive) sum of the sums of the sizes of all j-way intersections amongst the sets from j:=1 to n!

(N.b., the exclamation mark indicates an excited tone, not "factorial".)

This conjecture turns out to be entirely correct, as demonstrated in the following

Theorem (Inclusion–Exclusion Principle).

$\left|\bigcup_{i=1}^{n}A_{i}\right|=\sum_{j=1}^{n}(-1)^{j+1}\left(\sum_{S\subset\mathcal{P}(\{A_{i}\});|S|=j}\left|\bigcap_{A_{s}\in S}A_{s}\right|\right)$

Proof. By induction.

(Basis.) |A₁ ∪ A₂| = |A₁| + |A₂| – |A₁ ∩ A₂|

(Induction.) We want to show that given that we can express a union of n sets using the proposed method, then we can do the same for a union of n+1 sets. From the basis, we can write:

$\left|\left(\bigcup_{i=1}^{n}A_{i}\right)\cup A_{n+1}\right|=\left|\bigcup_{i=1}^{n}A_{i}\right| + |A_{n+1}|-\left| \left(\bigcup_{i=1}^{n}A_{i}\right) \cap A_{n+1} \right|.$

Our inductive hypothesis says that the first term on the right side can be written as our proposed sum of appropriately signed sizes of intersections. Also, we can distribute the intersection-with-A_n+1 over the union in the last term on the right side and then use the inductive hypothesis again to likewise rewrite that term as a sum of appropriately signed sizes of intersections. But then notice that we have all the terms we need, and that the signs are correct as well. So this is what I've been trying to tell you this whole time.

An Algorithmic Lucidity

a blog

Leave a Reply