a repository of mathematical know-how

Dimension arguments in combinatorics

Quick description

If you want to find the maximum of a combinatorial quantity, and if the maximum seems to be very far from unique, then try to relate the quantity to the dimension of a certain vector space.


Basic linear algebra over finite fields. The notions of set, subset, element.

Example 1 (with general discussion)

Let n be an even integer. How many subsets of the set \{1,2,\dots,n\} can you pick if they all have to have odd size but the intersection of any two of them has to have even size?

Let us try to attack this problem directly, and to begin with let us just try to find the biggest example we can in the greediest way we can. And to save space let us use notation such as 134 for the set with elements 1, 3 and 4 rather than proper notation such as \{1,3,4\}. The obvious set to begin with is 1. Since the next set has to have even-sized intersection with 1 and odd size, the obvious choice is 2. Continuing, the greedy algorithm gives us the n sets 1, 2, ..., n, and we then find that we cannot choose any more sets (since any other set of odd size will have an odd intersection with any singleton formed from one of its elements).

Can we do better than this? Let's experiment by starting with a different set and then continuing greedily. If we start with 123, then the next set has to have odd size and even-sized intersection with 123, so the two most obvious choices are 124 and 4. If we go for 124, then the obvious continuation is 125, 126, ..., 12n. Can we choose any more sets? Yes we can: the set 134567...n, which a moment's thought will reveal to have odd size and an even intersection with all the other sets. And then we notice that we can also choose the set 234567...n, at which point it is not hard to prove that we cannot choose any more. (Proof: any set that contains 12 cannot contain any of 3,4,5,...,n, so must be 12, which has odd intersection with 134567...n. Any set that contains 13 must contain all of 4,5,6,...,n because if it doesn't contain m then it has odd intersection with 12m. And any set that contains 23 must contain all of 4,5,6,...,n for a similar reason.) Once again, we have ended up with n sets.

If you continue experimenting in this way, you will find the same thing every time: if you have chosen fewer than n sets then you can choose more, but once you get to n sets then you get stuck. But there seems to be a great deal of freedom about how you choose the sets. This contrasts with many problems in extremal combinatorics, where the best possible examples are often unique, or unique up to some obvious symmetry. (A good example of the latter: how many subsets of \{1,2,\dots,n\} can you choose if none of your subsets is contained in any other? The unique best possible collection of sets is all sets of size n/2.)

Are there any other circumstances where you have a collection of objects, and a condition on pairs (x,y) of those objects, such that (i) whenever you have chosen objects x_1,x_2,\dots,x_m with m<n there are many different ways of choosing y such that the condition holds for each pair (x_i,y), and (ii) it is impossible to choose more than n of them if any two have to satisfy the condition? Yes there are: a common one is when the objects are vectors in \mathbb{R}^n and the condition on a pair (x,y) is that x and y should be orthogonal. Since orthogonality implies independence, we cannot choose more than n non-zero vectors that are all orthogonal to each other.

Can we relate these two situations? To do so we would need to associate vectors with our subsets of \{1,2,\dots,n\} and interpret the even-intersections condition as a kind of orthogonality. The obvious way to associate a vector with a subset A of \{1,2,\dots,n\} is to take its characteristic function, that is, the function f that takes the value 1 for every m that belongs to A and 0 for every m that does not belong to A. Now the size of the intersection of two sets is equal to the standard scalar product of their characteristic functions (a fact that has many uses). Therefore, if two sets have even intersection, then their scalar product is an even integer.

But we wanted orthogonality. That is, we wanted a scalar product of zero. Can we think of an even integer as being "zero" somehow? Yes we can, if we look at it mod 2. So we are led to interpret the characteristic functions of the sets as elements of the set \{0,1\}^n, and to think of that set as an n-dimensional vector space over the field \mathbb{F}_2.

Of course, we must check that orthogonality still implies independence in this context. The usual proof is this: if \sum_i c_iv_i=0, then the scalar product with any x_j must be 0; but the orthogonality of the v_i implies that this scalar product is c_j\langle v_j,v_j\rangle; since v_j is a non-zero vector, its scalar product with itself is non-zero, so c_j must be zero; since j was arbitrary we are done. In the mod-2 context the only part of that proof that fails is the assertion that if v_j is a non-zero vector, then \langle v_j,v_j\rangle is non-zero: it is equal to the number of 1s in v_j, reduced mod 2. But if v_j is the characteristic function of a set of odd size, then this will be 1, so the proof of linear independence works in this case.

A quick summary of the above proof: you can choose at most n sets of odd size with even intersections because the characteristic functions of such sets must be linearly independent in the vector space \mathbb{F}_2^n.

Further general discussion

How can we recognise extremal problems where linear algebra is likely to be useful? One answer was touched on above. It is that such problems are often characterized by the presence of a wide variety of best possible examples. The reason this suggests linear algebra as a possible tool is that the problem, "How many linearly independent vectors can you find in \mathbb{F}^n?" has a very similar quality: you can't find more than n, but there are many ways of finding n. Of course, if your problem has that quality, there is no guarantee that linear algebra can be used, and even when it can, one often needs much more ingenuity to recast it in terms of vectors in a vector space than was needed above.

Example 2

An obvious follow-up problem to the first is this. How many sets can you pick if their sizes have to be even and their intersections also even?

Note that the approach above now fails, since if A has even size and if v is its characteristic function, then \langle v,v\rangle=0. Furthermore, a moment's thought reveals that much bigger collections of sets are possible. For instance, take the collection of all possible unions of the sets 12, 34, 56, ..., (n-1) n. There are 2^{n/2} such unions, they all have even size, and the intersection of any two of them is even. That collection of sets turns out to be merely the simplest and most natural of a wide variety of examples of collections of size 2^{n/2} with the desired property. Combining that observation with the fact that we can express the even-intersections property as a condition about scalar products mod 2, it seems likely that linear algebra will help us.

There are two approaches we could try: one would be to find some 2^{n/2}-dimensional space V and find some way of converting any system of m sets that satisfy the property we want into a collection of m linearly independent vectors in V. The other is to stick with our previous way of turning sets into vectors and see where it gets us. Since it is not at all obvious what we would choose as V in the first approach, let us try the second.

Thus, we are trying to choose as many vectors in \mathbb{F}_2^n as we can, with the property that every v that we choose has an even number of 1s, and \langle v,w\rangle=0 for any v and w that we choose. Since the parity of the number of 1s in v is \langle v,v\rangle, we can say this quite neatly: we want a collection Z of vectors with the property that \langle v,w\rangle=0 for any two vectors v,w\in Z (not necessarily distinct).

Imagine that we are choosing a collection of such vectors in a greedy way: we just keep going until we cannot go any further. If we have chosen vectors v_1,v_2,\dots,v_m, then any vector v we add to the collection will have to have the property that \langle v_i,v\rangle=0 for i=1,2,\dots,m. Now the condition \langle v_i,v\rangle=0 is a linear condition of v, so we are placing m linear conditions on all subsequent vectors. It might seem as though they must therefore all live in an (n-m)-dimensional subspace of \mathbb{F}_2^n, but of course that would only be the case if the linear conditions were independent, and there is no reason for them to be. But this does give us a huge clue about how to proceed with the proof: if we are trying to keep going for as long as possible, then we should try to make the conditions we impose as dependent as possible, which comes to the same thing as saying that we should try to cram the vectors v_1,v_2,\dots,v_m into as small a subspace of \mathbb{F}_2^n as we can.

From here the proof more or less writes itself. If we have chosen more than 2^{n/2} vectors, then at least n/2+1 of them must be linearly independent (since an n/2-dimensional subspace of \mathbb{F}_2^n has size 2^{n/2}. From this it follows that all vectors in the collection are confined to a subspace of dimension at most n/2-1, which is an obvious contradiction. So the simple example we started with is indeed best possible.

Example 3

To give a glimpse of how much more sophisticated the linear-algebra method can be, here is a beautiful proof, due to Peter Frankl and Janos Pach, of the famous Sauer-Shelah lemma. This is a solution to the following problem. Let \mathcal{F} be a collection of subsets of \{1,2,\dots,n\}. Given any subset Y\subset\{1,2,\dots,n\}, define the projection \mathcal{F}(Y) of \mathcal{F} onto Y to be the collection of all sets Y\cap F with F\in\mathcal{F}. How large can \mathcal{F} be if there is no set Y of size k for which \mathcal{F}(Y) is the set of all subsets of Y? (Many authors say that Y is shattered by \mathcal{F} if \mathcal{F}(Y) is the set of all subsets of Y.)

A moment's thought leads to an obvious maximal example (by which I mean an example that cannot be extended rather than an example of maximal size, though it will eventually turn out to be of maximal size too). It is to take \mathcal{F} to be the set of all subsets of \{1,2,\dots,n\} of size less than k.

Imagine now that we have experimented a bit and been unable to find any larger examples. Given the naturalness of that example, we begin to suspect that it may be of maximal size. If we have experimented sufficiently diligently, we will also see that if our hunch is correct, then there are many examples of maximal size. So once again we wonder about using linear algebra. The most obvious approach would be to find associate with each set in \mathcal{F} a vector in some space of dimension 1+\binom n 1+\dots+\binom n{k-1} in such a way that the condition we are requiring \mathcal{F} to satisfy implies that all those vectors are linearly independent.

But how do we associate 1+\binom n 1+\dots+\binom n{k-1} coordinates with a subset of \{1,2,\dots,n\}? The least unnatural way would seem to be to relate these coordinates in some way to the sets of size at most k-1. With this thought in mind, the following idea is quite a natural one: let \mathcal{Z} be the set of all subsets of \{1,2,\dots,n\} of size at most k-1, and associate with any subset F of \{1,2,\dots,n\} the function \mathcal{Z}\rightarrow\mathbb{R} that takes a set Z\in\mathcal{Z} to 1 if Z\subset F and 0 otherwise.

You may have wondered why I described this as a function to \mathbb{R} rather than to \{0,1\}. The reason was that it turns out to be best to regard the functions \phi_F as belonging to a vector space over \mathbb{R} rather than over \mathbb{F}_2. (This is often true, so this example is a useful illustration.) Indeed, it turns out that if \mathcal{F} has no full projections onto sets of size k, then the functions \phi_F are linearly independent.

Let us see how to prove this. To do this, we take a linear dependence \sum\lambda_F\phi_F=0 and try to prove that every \lambda_F is zero. The information we have is that for no set Y of size k does \mathcal{F}(Y) contain all subsets of Y. An obvious way to try to use this information is to assume that not all the \lambda_F are zero, to define some set Y of size k in a natural way, and to prove that every subset of Y is equal to F\cap Y for some F\in\mathcal{F}.

Let us consider what the statement \sum\lambda_F\phi_F=0 actually says. If Z is a set of size at most k-1, then \sum\lambda_F\phi_F(Z) can be rewritten as \sum_{F\in\mathcal{F}, Z\subset F}\lambda_F. So we know that this quantity is 0 for every set Z of size at most k-1. So we have here a function that vanishes for every set of size k-1, and we are looking for a set of size at least k. This suggests that we should look for a set Y such that \sum_{F\in\mathcal{F}, Y\subset F}\lambda_F does not vanish, since such a set is guaranteed to have size at least k. We could then hope that every subset of this set Y was of the form F\cap Y for some F\in\mathcal{F}.

Must there exist a set Y such that the expression in question does not vanish? Clearly yes: we can just take Y to be a maximal F\in\mathcal{F} such that \lambda_F\ne 0. However, it is too much to hope that every set Y with this property will work (and easy to find examples of sets Y that do not work). If we want to make it as likely as possible that every subset of Y is of the form F\cap Y then obviously we would like Y to be as small as possible. So let us choose a minimal Y such that \sum_{F\in\mathcal{F}, Y\subset F}\lambda_F\ne 0.

Let Z\subset Y. We would like to prove that there exists F\in\mathcal{F} such that F\cap Y=Z. And a natural way of doing that, given what we have so far, would be to prove that \sum_{F\in\mathcal{F}, F\cap Y=Z}\lambda_F\ne 0. And a natural way of doing that, if we want to exploit the minimality of Y, is to try to express this sum as a combination of sums of the form \sigma(W)=\sum_{F\in\mathcal{F}, W\subset F}\lambda_F with W\subset Y.

This can indeed be done. One can check (and this is basically the inclusion-exclusion formula) that \sum_{F\in\mathcal{F}, F\cap Y=Z}\lambda_F=\sum_{Z\subset W\subset Y}(-1)^{|W-Z|}\sigma(W). By the minimality of Y, the sum on the right-hand side is equal to (-1)^{|Y-Z|}\sigma(Y), which is non-zero, and we are done.

Recommended further reading

This singleton list needs to be extended, but as a start, one could look at Part III of ``Extremal Combinatorics: with Applications in Computer Science'' by Stasys Jukna. It can be looked at online here (just click on "Preview this book").

Note iconIncomplete This article is incomplete. The expectation is that, like Wikipedia articles, Tricki articles will be edited many times by people other than the original author. This one is an obvious candidate for further work: it would be good to have more examples and a more coherent account of how to use the method itself. (For instance, is there a more systematic way of converting problems into linear algebra, is there a way of telling which scalars to use, etc.?)


Frankl-Wilson Theorem

Here is a blog discussion of a dimension argument for this theorem. . The linear algebra proof of Erdos-deBruijn theorem (a great grand father of Frankl-Wilson theorem)is described among other things in this post:

Inline comments

The following comments were made inline in the article. You can click on 'view commented text' to see precisely where they were made.

Shatter function lemma

I would suggest using the less eponymous term "Shatter function lemma" (that Matousek uses in his Lectures on Discrete Geometry) in addition to "Sauer-Shelah lemma". I personally have known this lemma for a long time, but always under the more descriptive name "Shatter function lemma".