Tricki
a repository of mathematical know-how

Double counting

Quick description

If you have two ways of calculating the same quantity, then you may well end up with two different expressions. If these two expressions were not obviously equal in advance, then your calculations provide a proof that they are. Moreover, this proof is often elegant and conceptual. Double counting can also be used to prove inequalities, via a simple result about bipartite graphs.

Example 1: sums of binomial coefficients

Prove that \binom n0 + \binom n1 + \binom n2 \dots + \binom nn = 2^n.

The left-hand side of this well-known identity counts all the subsets of size 0 from a set of size n, then all the sets of size 1, then all the sets of size 2, and so on. By the time it has finished, it has counted all the subsets of a set of size n. But an entirely different argument shows that the number of subsets of a set X of size n is 2^n: for each element x\in X there are two possibilities: x belongs to the subset or x does not belong to the subset. Therefore, in determining a subset of X we have n independent binary choices, so 2^n possibilities. (If this argument sounds a bit woolly, then you can prove it by induction. Let X be the set \{1,2,\dots,n\}. Then every subset of X is obtained by taking a subset of \{1,2,\dots,n-1\} and either adding n to it or leaving it alone. Therefore, the number of subsets of X is twice the number of subsets of \{1,2,\dots,n-1\}. And the number of subsets of \{1\} is 2.)

Example 2: counting the edges of an icosahedron

Prove that the number of edges of an icosahedron is 30.

The idea of this example is to deduce the answer very simply from just the fact that an icosahedron has 20 triangular faces. The rough idea of the proof is that each face has three edges, while each edge is an edge of two faces, so the ratio of faces to edges ought to be 2 to 3. To be sure that this argument is correct, we count pairs (E,F), where E is an edge and F is a face that has E as one of its edges. Since the icosahedron has 20 triangular faces, the number of such pairs is 60. If m is the number of edges, then the number of such pairs is also 2m, since each edge is part of two faces. Therefore, m=30, as claimed.

General discussion

Many combinatorial applications of the double-counting principle can be regarded as consequences of the following very simple result about bipartite graphs: if G is a bipartite graph with vertex sets X and Y, then the sum of the degrees of the vertices in X is equal to the sum of the degrees of the vertices in Y. The proof is a very simple double count, since the two sums are different ways of counting the edges in G (one by focusing on the vertex in X and the other by focusing on the vertex in Y).

For instance, for the icosahedron we could define a bipartite graph where one vertex set consisted of the edges of the icosahedron and the other of the faces, joining an edge to a face if the edge belonged to the face. The edges of this bipartite graph are precisely the edge-face pairs considered above (so this is not a different proof, but a different way of looking at the same proof).

The following equivalent reformulation of the bipartite-graphs principle can often be used very directly: if d_X is the average degree of the vertices in X and d_Y is the average degree of the vertices in Y, then d_X/d_Y=|Y|/|X|.

Many applications of the principle use the following consequence. Let G be a bipartite graph with vertex sets X and Y. Suppose that every vertex in X has degree d_X and every vertex in Y has degree d_Y. Let U be any subset of X, and let V be the set of all vertices in Y that are joined to at least one vertex in U. Then |V|\geq |U|d_X/d_Y. To prove this, let H be the subgraph of G with vertex sets U and V, with the edges inherited from G. Then every vertex in U has degree d_X and every vertex in V has degree at most d_Y.

It often happens that we are in the following situation. We have a set X whose size we are trying to bound above, but we do not have an obvious bipartite graph to which to apply an inequality such as the one in the last paragraph. In that case, we must construct one. It is often easier to think about the task in an equivalent way: given a bipartite graph with vertex sets X and Y, each vertex y in Y determines a neighbourhood N_y (defined to be the set of all vertices in X that are joined to y), and therefore the bipartite graph itself determines a collection of subsets of X, possibly with repeats, so strictly speaking it determines a multiset of subsets of X. Conversely, given a collection of subsets of X, we can use it to define a bipartite graph, by choosing for each set in the collection a vertex (or set of vertices), and putting these vertices together to form a set Y. This thought leads to another principle. Let \mathcal{B} be a collection of subsets of X. If each element of X is contained in at least d_X sets in \mathcal{B} and each set in \mathcal{B} contains at most d elements of X, then |X|\leq |\mathcal{B}|d/d_X. (To prove this directly, count pairs (x,B), where x\in X, B\in\mathcal{B}, and x\in B.)

Example 3: the sizes of upper shadows

Let \mathcal{A} be a collection of subsets of \{1,2,\dots,n\}, with every set in \mathcal{A} of size r. Let s>r and define the s-upper shadow \partial_s\mathcal{A} to be the collection of all sets of size s that contain at least one set in \mathcal{A}. Then the size of \partial_s\mathcal{A} is at least |\mathcal{A}|\binom ns/\binom nr.

To prove this assertion, let us define a bipartite graph with vertex sets \mathcal{A} and \partial_s\mathcal{A}, joining A to B if and only if A\subset B. Then each A\in\mathcal{A} has degree \binom{n-r}{s-r}, so in particular this is the average degree. And each B\in\partial_s\mathcal{A} has degree at most \binom sr, so this is an upper bound for the average degree of vertices in \partial_s\mathcal{A}. By the bipartite-graphs principle, it follows that the size of \partial_s\mathcal{A} is at least |\mathcal{A}| \binom{n-r}{s-r}/\binom sr. (Here we have applied the first reformulation of the principle, but we could have applied the second one even more directly.)

One can check by direct calculation that \binom{n-r}{s-r}/\binom sr=\binom ns/\binom nr. Here is a double-counting argument instead, which shows that \binom{n-r}{s-r}\binom nr=\binom ns\binom sr. This time we are not presented with an obvious bipartite graph, so let us look at the right-hand side and see what it might be counting. The first term counts subsets of size s from a set of size n, and the second counts subsets of size r from a set of size s. So we could regard the product as the number of ways of choosing a set U of size s from the set \{1,2,\dots,n\} and then choosing a subset V of U of size r.

Now another way of choosing such a pair (U,V) is to choose V first. We can do this in \binom nr ways, and then there will be \binom {n-r}{s-r} ways of choosing a set U of size s that contains V, since we need to choose s-r more elements from the n-r elements we still have left.

Example 4: the Erdős-Ko-Rado theorem

Let \mathcal{A} be a collection of subsets of \{1,2,\dots,n\} such that every A\in\mathcal{A} has size r and such that no two sets in \mathcal{A} are disjoint. Suppose also that n\geq 2r. Then |\mathcal{A}|\leq \binom {n-1}{r-1}.

Note that the collection of all sets of size r that contain the element 1 has size \binom {n-1}{r-1}, so this result is best possible. Note also that \binom {n-1}{r-1}=\frac rn \binom nr, so what we are trying to show is that the proportion of sets of size r that we can choose if no two are disjoint is at most r/n. Thirdly, note that if r>n/2 then any two sets of size r intersect, so that condition is necessary. For convenience, let us us the standard notation [n] for the set \{1,2,\dots,n\} and [n]^{(r)} for the collection of subsets of [n] of size r. Also, we shall call a set system in which any two sets have non-empty intersection an intersecting system.

To solve this problem let us try to find a collection \mathbf{B} of subsets of [n]^{(r)} with the following three properties. First, if \mathcal{B} is any element of \mathbf{B} (note that we have three levels of sets here, so an element of \mathbf{B} is a subset of [n]^{(r)} and therefore a collection of subsets of [n]), then the proportion of sets in \mathcal{B} that belong to \mathcal{A} is at most r/n; second, every element of [n]^{(r)} belongs to the same number of sets \mathcal{B}\in\mathbf{B}; and third, every element of \mathbf{B} contains the same number of sets. If we can do that, then we can use the above results, but the nicest way to finish the argument is to reason probabilistically as follows. Let p be the proportion of elements of [n]^{(r)} that belong to \mathcal{A}. Then if we pick a random element of [n]^{(r)}, it has a probability p of belonging to \mathcal{A}. But another way of choosing a random element of [n]^{(r)} is to choose a random element \mathcal{B} of \mathbf{B} and then a random set A\in\mathcal{B}. This gives the uniform distribution on [n]^{(r)}, since every element of [n]^{(r)} has an equal probability of belonging to the randomly chosen set \mathcal{B}, and it then has a probability |\mathcal{B}|^{-1} of being chosen from \mathcal{B}, and all the sets \mathcal{B} have the same size. But if no set in \mathbf{B} has a proportion greater than r/n of elements that belong to \mathcal{A}, then the probability of choosing an element of \mathcal{A} by the second method is at most r/n (since this is true whatever set \mathcal{B} you begin by choosing). It follows that p\leq r/n, which is what we wanted to prove.

It remains to choose \mathbf{B}. Here we use another trick: exploiting symmetry. Suppose we can find just a single set \mathcal{B} with the property that no intersecting set-system \mathcal{A}\subset [n]^{(r)} can contain more than r|\mathcal{B}|/n elements. Then the same will be true if we permute the set [n] and take the corresponding transformation of \mathcal{B}. (To be precise, if \pi is a permutation of [n] we would take the collection of all sets \pi(B) such that B\in\mathcal{B}.) If we define \mathbf{B} to be the set of all such transformations of \mathcal{B}, then the symmetry of the situation means that every element of [n]^{(r)} is contained in the same number of sets in \mathbf{B}. Thus, the problem is solved if we can find just one set \mathcal{B}\subset [n]^{(r)} that cannot contain a proportion greater than r/n of sets that all intersect.

We are closing in on our target. How can a proportion of r/n arise? If n is a multiple of r, then it is quite easy to see how to do it. For instance, if n=5r then we could take five disjoint sets of size r, and obviously at most one of these sets can belong to an intersecting system. But if r is coprime to n, as it may well be, then we will need the number of sets in \mathcal{B} to be a multiple of n. Are there any natural examples of subsets of [n]^{(r)} that have a number of elements that is a multiple of n?

It's not quite clear what "natural' means here, so let's jump straight to the answer and then see if we can justify it somehow. The answer is to think of the set \{1,2,\dots,n\} as the group of integers mod n and to take all sets of the form \{i,i+1,\dots,i+r-1\}, where this is interpreted mod n.

To see that this works, let us suppose we have an intersecting family of these sets. Without loss of generality one of the sets in the family is \{1,2,\dots,r\}. Then all other sets in the family must be intervals that begin or end at one of the numbers from 1 to r. Let us write i_+ for the interval of length r that begins at i and j_- for the interval of length r that ends at j. Then if j_- belongs to the family, we know that (j+1)_+ does not. Thus, we can have at most one out of each pair \{j_-,(j+1)_+\}. There are r-1 such pairs and every interval of length r that intersects \{1,2,\dots,r\} is contained in one of them, apart from the interval \{1,2,\dots,r\} itself. Therefore, the family can have size at most r, and the proof is complete.

Why was the idea of choosing a cyclic collection of intervals a natural one? One answer is that it is a fairly natural generalization of the partitioning that worked when n was a multiple of r. Another is that if we are trying to find a nice subset of [n]^{(r)} then one potential source of examples is orbits of group actions. If we want our subset to have size n, then an obvious group to try is the cyclic group of order n, since we can easily define an action by cyclically permuting the ground set. And finally, if we are arranging the elements of [n] in a cycle, then the most natural set of size r to consider is an interval. So even if it isn't instantly obvious that this example is going to work, it is one of the first examples one is likely to try.

In case that long discussion leaves the impression that the proof is rather long, here is the proof minus the accompanying justification. Let \mathcal{A}\subset [n]^{(r)} be an intersecting family. Let \mathcal{B} be the family of intervals of length r mod n. Let \pi be a random permutation and let \mathcal{B}_\pi be the set of all \pi(B) such that B\in\mathcal{B}. Then the size of \mathcal{A}\cap\mathcal{B}_\pi cannot be more than r, by symmetry combined with the argument above that proves it when \pi is the identity. On the other hand, the expected size of \mathcal{A}\cap\mathcal{B}_\pi is |\mathcal{A}|n/\binom nr. It follows that |\mathcal{A}|\leq \frac rn\binom nr=\binom{n-1}{r-1}.

This theorem is called the Erdős-Ko-Rado theorem, and the proof by double counting is due to Gyula Katona.

Example 5: an identity concerning Euler's totient function

Euler's totient function \phi is the function that takes a positive integer m to the number of positive integers less than m and coprime to m. For example, the integers less than 12 and coprime to 12 are 1,5,7 and 11, so \phi(12)=4.

An important identity in elementary number theory is that n=\sum_{d|n}\phi(d), where "\sum_{d|n}" means that we are summing over all factors d of n. For example, the factors of 10 are 1,2,5 and 10, and \phi(1)+\phi(2)+\phi(5)+\phi(10)=1+1+4+4=10.

To prove this identity, we shall count the set \{1,2,\dots,n\} in two different ways. The first way is the utterly obvious way: it has size n by definition of "has size n". The definition being used here is that a set X has size n if and only if there is a one-to-one correspondence between X and the set \{1,2,\dots,n\}, which there obviously is if X actually equals \{1,2,\dots,n\}.

For the second way, we shall partition the set \{1,2,\dots,n\} into subsets and add up the sizes of those subsets. To do this, we would like to define a subset for each factor d of n. Is there a natural way of doing this that will give rise to disjoint sets?

The answer to any question like this will be yes if we can find a function from \{1,2,\dots,n\} to the set of all factors of n. Then for each d we take the set of all elements that map to d. So is there a natural function of this kind?

Yes, and it's not hard to think of because there aren't any obvious alternatives: we define f(m) to be (m,n), the highest common factor of m and n. This certainly maps m to a factor of n, and it also depends on m in a natural way. So let's see if it works.

Given a factor d of n, which integers m map to d? That is, which integers m have the property that (m,n)=d? Well, they obviously have to be multiples of d. But they also mustn't be multiples of any larger factor of n. Since n=d(n/d), a multiple dj will have the property (dj,n)=d if and only if (j,n/d)=1, that is, if and only if j has no factors in common with n/d. Therefore, the number of m such that f(m)=d is equal to \phi(n/d).

We have now shown that n=\sum_{d|n}\phi(n/d), which is not quite what we were trying to show. However, the map d\mapsto n/d is a bijection from the set of factors of n to itself, so it is in fact equivalent to what we wanted to show. (Another way of arguing would be to replace the map f by the map g(m)=(m,n/d). Then the inverse image of d would have size \phi(d).)

Example 6: evaluating zeta(2)

Euler's famous theorem that \frac 1{1^2}+\frac 1{2^2}+\frac 1{3^2}+\dots=\frac{\pi^2}6 can also be proved by calculating the same quantity in two different ways. In this case the problem is no longer combinatorial, so the phrase "double count" is less appropriate, but the general philosophy is the same. This example requires some Fourier analysis.

The quantity we shall calculate is the square of the L_2-norm of the function f defined on the interval [-\pi,\pi] by taking f(x)=-1 if x<0 and f(x)=1 if x\geq 0. Here we shall define the square of the L_2-norm to be \frac 1{2\pi}\int_{-\pi}^\pi |f(x)|^2dx. The calculation is extremely easy, since |f(x)|^2 is identically 1, so we obtain the answer 1.

So far, this is not very interesting, but it becomes interesting if we exploit Parseval's identity, which tells us that the L_2-norm of a function is equal to the \ell_2-norm of its sequence of Fourier coefficients. Thus, a second way of calculating the square of the L_2-norm of f is to work out its Fourier coefficients and add up the squares of their absolute values. Therefore, 1 must be equal to \sum_{n=-\infty}^\infty|\hat{f}(n)|^2, where \hat{f}(n) is defined to be \frac 1{2\pi}\int_{-\pi}^\pi f(x)e^{-inx}dx.

Now \int_0^\pi e^{-inx}dx=-\frac 1{in}[e^{-inx}]_0^\pi=-(e^{-in\pi}-1)/in. If n is even, then e^{-in\pi}=1, so we get 0. If n is odd, then e^{-in\pi}=-1, and then we get 2/in. After a very similar calculation for the part of the integral between -\pi and 0, and after taking account of the factor \frac 1{2\pi} we find that \hat{f}(n)=2/in\pi if n is odd and 0 if n is even. It follows that the sum of the absolute values of the Fourier coefficients is \sum_{m=-\infty}^\infty\frac{4}{\pi^2(2m+1)^2}. Since the square of an odd number is equal to the square of minus that odd number, we can write this as \sum_{m=1}^\infty\frac 8{\pi^2(2m-1)^2}. Therefore, \sum_{m=1}^\infty\frac 1{(2m-1)^2}=\frac{\pi^2}8.

This is already a pretty interesting identity. To obtain Euler's theorem, we just note that the left-hand side is equal to \sum_{n=1}^\infty\frac 1{n^2}-\sum_{n=1}^\infty\frac 1{(2n)^2}, which is three quarters of \sum_{n=1}^\infty\frac 1{n^2}. So \sum_{n=1}^\infty\frac 1{n^2}=\frac 43\frac{\pi^2}8, which equals \frac{\pi^2}6.

Another point of view

Giving a bipartite graph with vertex sets X and Y is the same as giving a relation R between X and Y: we declare x\in X and y \in Y to be related if there is an edge joining x and y. Equivalently, if we identify the relation R with its graph (i.e. if we think of R as being the subset of X \times Y consisting of those ordered pairs (x,y) whose elements are related by R), then R is precisely the set of edges in the bipartite graph.

Now there are natural maps X \times Y \rightarrow X and X\times Y \rightarrow Y, given by projecting onto the first and second factor respectively. Restricting these maps to R, we obtain projections  R \rightarrow X and  R \rightarrow Y.

If x \in X, then the degree of x (in the bipartite graph view-point) is just the order of the preimage p^{-1}(x), and similarly the degree of y is simply the order of q^{-1}(y). The sum over x \in X of the orders of all the preimages p^{-1}(x) is just the order of R, as is the sum of the orders of all the preimages q^{-1}(y), as y ranges over the points of Y. Thus we get the basic equality

 \sum_{x \in X} | p^{-1}(x) |  = \sum_{y \in Y} |q^{-1}(y) ].

(Both sides are equal to |R|.) This gives another point of view on the basic principle of double counting discussed above.

This point of view, using the graph of the relation R and the maps p and q, can be applied to extend the idea of double counting to other contexts. For example, in algebraic geometry it gives rise to the technique of dimension counting via incidence varieties.

Example 7: fixed points and orbits of group actions

Here is an example from the theory of group actions. Suppose that the finite group G acts on the finite set X. There is a rather natural relation to consider on G \times X, related to the study of fixed points for the action. Namely, we define g \in G and x \in X to be related if g fixes x. More symbolically, we define a relation R via

 R = \{ (g,x) \in G \times X \, | \, g \cdot x = x\}.

Now as above we have the projections R \rightarrow G and  R \rightarrow X. Let us see what we can deduce from the double counting formula applied in this context.

We begin by interpreting p^{-1}(g), for g \in G. This is just the set of fixed points of g acting on X, which we denote by X^g.

What about q^{-1}(x), for x \in X? This is precisely the stabilizer of x, i.e. the subgroup of G consisting of points which fix x, which we denote by G_x.

Thus double counting gives the identity

 \sum_{g \in G} | X^g| = \sum_{x \in X} | G_x|.

We can reinterpret this formula by recalling that if x and y lie in the same orbit of G, then G_x and G_y are conjugate. Indeed, if y = g\cdot x, then a simple calculation shows that G_y = g G_x g^{-1}. Thus we can regroup the second sum into a sum over a set of orbit class representatives x_1,\ldots,x_n, and so rewrite it as

 \sum_{i=1}^n \sum_{y \in O(x_i)} |G_y| = \sum_{i = 1}^n |O(x_i)||G_{x_i}|  = \sum_{i = 1}^n |G| = n |G|.

(Here O(x_i) denotes the orbit of x_i, and we have used the formula G_{x_i}].) Altogether, we conclude that

 \sum_{g \in G} |X^g| = n|G|,

where n denotes the number of orbits of G acting on X. Dividing both sides by |G|, we find that

 \dfrac{1}{|G|}\sum_{g \in G} |X^g| = n,

or, in words:

The average number of fixed points of an element of G acting on X is equal to the number of orbits of G acting on X.

This useful relation between fixed points and orbits is known as Burnside's lemma.