Tricki
a repository of mathematical know-how

How to use Zorn's lemma

Quick description

If you are building a mathematical object in stages and find that (i) you have not finished even after infinitely many stages, and (ii) there seems to be nothing to stop you continuing to build, then Zorn's lemma may well be able to help you.

Prerequisites

Basic concepts of undergraduate mathematics, such as vector spaces.

Example 1

A function f from \mathbb{R} to \mathbb{R} is called additive if f(x+y)=f(x)+f(y) for every x,y\in\mathbb{R}. Clearly any function of the form f(x)=\lambda x is additive. Are there any other additive functions?

An easy induction shows that if f(1)=\lambda then f(n)=\lambda n for every positive integer n. (This is true with \lambda=f(1), since f(n+1)=f(n)+f(1) for every n.) It is also easy to prove that f(0)=0 (since e.g. f(0)=f(0)+f(0)). And from this it follows that f(n)+f(-n)=0, so f(-n)=-\lambda n, for every positive integer n. Another easy induction shows that f(mx)=mf(x) for every real number x and every positive integer m. Therefore, mf(x/m)=f(x), from which it follows that f(x/m)=f(x)/m) for every real number x and every positive integer m. From these observations it follows that if f(1)=\lambda then f(x)=\lambda x for every rational number x.

At this point it seems to be hard to deduce anything about other values of f(x). Indeed, there doesn't seem to be much obstacle to defining f(\sqrt 2) to be anything we like. If we set f(\sqrt 2) to be \mu, then an argument similar to the argument of the previous paragraph shows that f(x\sqrt 2)=\mu x for every rational number x, but no number of the form x\sqrt 2 is rational except when x=0, so this is not going to conflict with the choices we have already made. We will of course be forced to set f(x+y\sqrt 2) to be \lambda x+\mu y, but there is no problem in doing that.

Let us be slightly more explicit about why this isn't a problem: it is because if x+y\sqrt 2=x'+y'\sqrt 2 then x=x' and y=y' (since otherwise we would find that \sqrt 2=(x'-x)/(y-y'), which is rational). This will enable us to see more clearly what is going on later.

The discussion so far strongly suggests that there should be a function that is additive but not of the form f(x)=\lambda x. We haven't yet defined one, since by no means every real number is of the form x+y\sqrt 2 with x and y rational. But we have produced a partially defined additive non-linear function, and the method we have used is rather flexible. Indeed, if we pick another number, such as \pi, say, where f is not yet defined, then we can extend the definition to all numbers of the form x+y\sqrt 2+z\pi with x,y,z\in\mathbb{Q} by setting f(x+y\sqrt 2+z\pi)=\lambda x+\mu y+\nu z for some arbitrarily chosen \nu.

More generally, we could construct a sequence of numbers t_1,t_2,\dots with t_1=1, with the property that no t_n is a rational linear combination of t_1,t_2,\dots,t_{n-1}. And then we could define f(t_i)=\lambda_i, for arbitrarily chosen \lambda_i, which would tell us that f(t_1x_1+\dots+t_nx_n)=\lambda_1x_1+\dots+\lambda_nx_n for every sequence x_1,\dots,x_n of rational numbers.

The trouble is, even when we have built an infinite sequence in this way, we still haven't defined f for all real numbers, since the set of rational linear combinations of those numbers is countable. However, we can still continue to build our function, since we can pick a new real number s_1 that is not a rational linear combination of the t_i and choose a value for f(s_1). And then we can choose s_2 that is not a rational linear combination of the t_i and s_1, and so on. But again we find that even if we produce an infinite sequence of s_i we have still defined f for only countably many real numbers.

A good way of regarding what we are doing is this: we are considering the real numbers as a vector space over the rationals, and we are trying to build a basis for this vector space, where this means a collection B of real numbers such that every real number is a rational linear combination of numbers in B in precisely one way. Then if we define the values of f however we like for the numbers in B and define the values of f in the obvious way for rational linear combinations of those numbers, we have a function from \mathbb{R} to \mathbb{R} that is linear over the rationals, and hence additive, but not necessarily of the form f(x)=\lambda x.

Now \mathbb{R}, considered as a vector space over \mathbb{Q}, is certainly infinite-dimensional. In fact, it has uncountable dimension. So does it have a basis? (All we mean by "has uncountable dimension" is "cannot be spanned by countably many vectors," so it is not true by definition that it has a basis.) Inspired by finite-dimensional vector spaces it is tempting to say, "Pick a maximal linearly independent set," since such a set is not just linearly independent but also spans the whole space, since if it didn't we could just pick an element that did not belong to its linear span and we could add it to the linearly independent set, contradicting maximality.

So now we seem to be done: we are looking for a basis of \mathbb{R} over \mathbb{Q}, and all we need to do to find a basis of any vector space is take a maximal linearly independent set.

But why should a maximal linearly independent set exist? Isn't that exactly the difficulty we were facing earlier: we could carry on picking more and more rationally independent real numbers but we never seemed to reach the point where we could no longer continue?

Let us now interrupt this example for a more general discussion of Zorn's lemma and how to use it.

General discussion

We are now in a very typical situation where Zorn's lemma can be applied. We would like to build a maximal object, and we feel as though we ought to be able to, because any non-maximal object can easily be extended. The usual statement of Zorn's lemma is as follows. A partially ordered set is a set X together with an ordering \leq of the elements of X that is transitive and antisymmetric (this means that if x\leq y and y\leq x then x=y). A typical example is where X is a collection of sets and x\leq y if and only if x\subset y. (Here I use the symbol " \subset " to mean "is a subset of" and not "is a proper subset of".) A chain in a partially ordered set X is a totally ordered subset of X: that is, a subset Y such that if y,z\in Y then either y\leq z or z\leq y. An upper bound for a subset Y of a partially ordered set X is an element u such that y\leq u for every y\in Y. And a maximal element in a partially ordered set X is an element x_0 such that the only element x\in X with x_0\leq x is x_0 itself. Zorn's lemma states that if X is a partially ordered set such that every chain in X has an upper bound, then X has a maximal element. (Note that a maximal element does not have to be bigger than everything else: it just mustn't be smaller than anything else.)

In order to see how this rather abstract-looking statement relates to the kind of problem we had earlier, let us imagine that we have a partially ordered set X and are looking for a maximal element. We could try to build one as follows. We start with an element x_1. If it isn't maximal then we find a bigger element x_2. If that isn't maximal we find a bigger element x_3, and so on. That gives us an increasing sequence x_1,x_2,\dots. Now we seem to be stuck, and indeed sometimes we are stuck. For example, if X is the set of natural numbers with their usual ordering then we might have created the sequence 1,2,3,\dots, which would not help us to find a maximal element — not surprisingly, since \mathbb{N} doesn't have a maximal element.

However, \mathbb{N} does not satisfy the hypothesis of Zorn's lemma, because the sequence 1,2,3,\dots is a chain with no upper bound. If X does satisfy this hypothesis then the sequence x_1,x_2,x_3,\dots, which is also a chain, has an upper bound, which we could call x_\omega. If this is not maximal, then we can find a larger element x_{\omega+1}. If that is not maximal, then we can find a yet larger element x_{\omega+2}, and so on.

Example 1, continued

Note the similarity between the position we are now in and the position we were in when we were trying to create a basis for \mathbb{R} over \mathbb{Q}. Once again, we can continue to create larger and larger objects, but there seems to be no easy way of saying that the process eventually ends. Or rather, there is an easy way: Zorn's lemma tells us that it ends.

Let us see how Zorn's lemma applies in our example. The objects we were looking at were subsets of \mathbb{R} that were linearly independent over \mathbb{Q}. We noted that a maximal linearly independent subset of \mathbb{R} spanned the whole of \mathbb{R}, where by "maximal" we meant that the set was not contained in any larger linearly independent set. Thus, we were looking at the set of all linearly independent subsets of \mathbb{R}, with the partial order \subset.

All we have to do if we want to apply Zorn's lemma is check that every chain has an upper bound. So let us imagine that we have a collection Y of linearly independent subsets of \mathbb{R} and that for any two of those sets one is contained in the other. What could serve as an upper bound? By definition it has to be a set that contains all the sets in Y, so it has to contain their union. We want it to be linearly independent, so the smaller it is, the better. So there is basically only one candidate to try: the union itself. Is the union linearly independent? Well, if t_1,\dots,t_n belong to the union, then each t_i belongs to some linearly independent set L_i\in Y. Because Y is a chain, one of these sets L_i contains all the others. If that is L_j, then the linear independence of L_j implies that no non-trivial linear combination of t_1,\dots,t_n can be zero, which proves that the union of the sets in Y is linearly independent, just as we wanted. Therefore, by Zorn's lemma, there is a maximal linearly independent set. Earlier we observed that such a set was a basis and could be used to create additive functions not of the form f(x)=\lambda x, so our problem is now solved.

Further general remarks

How, one might ask, is Zorn's lemma itself proved? One answer is that it cannot be proved: it is just an axiom. But a slightly more informative answer is that it is equivalent to the axiom of choice and the well-ordering principle. A hint of why this should be can be found in the attempted proof above. There we created an infinite sequence x_1,x_2,x_3,\dots, which we then continued "transfinitely" with the elements x_\omega,x_{\omega+1},x_{\omega+2},\dots. This transfinite process can continue until a maximal element is reached, but to do so one needs to make infinitely many choices (since there isn't a way of defining the next element of the sequence). Thus, the axiom of choice comes into play. If we knew in advance that X could be well-ordered (that is, given a total ordering such that every non-empty subset had a minimal element), then we could build up the sequence by always taking the minimal element that worked. And it's an easy exercise to use Zorn's lemma to prove that every set has a well-ordering.

One should add that the above sketch of how to use the axiom of choice to prove Zorn's lemma may make the deduction look easier than it really is. In order to justify rigorously that the transfinite induction can continue, one must prove a result known as Hartog's lemma, which states that for every set X there is an ordinal with the same cardinality as X. And Hartog's lemma is not a triviality.

Example 2

We asserted above that it was easy to deduce from Zorn's lemma that every set has a well-ordering. Let us justify this claim, since it gives another representative application of Zorn's lemma. Suppose, then, that we have a set X and we would like to give it a well-ordering. That is, we would like to define a total ordering on the elements of X such that every non-empty subset Y of X has a minimal element.

Once again, we find ourselves in a situation where there are no obvious constraints to building the object we require, other than the "transfinite length of time" needed to complete the process. We just pick an arbitrary element x_1 to be the minimal element of X itself, then an arbitrary element x_2 to be the minimal element of X\setminus\{x_1\}, and so on. In other words, at each stage we would like to choose an element not yet chosen and declare it to be the next element in our ordering.

To convert that rough idea into a Zorn's-lemma argument, we need to define a partial ordering on the set of "incomplete attempts" at defining a well-ordering on X. An incomplete attempt means a subset Y of X and a well-ordering of that subset. Let us define an attempt to be precisely that: a subset Y of X with a well-ordering of the elements of Y. The partial ordering on the set of all attempts should reflect the idea of one attempt extending another, so the obvious partial ordering to take is as follows: given two attempts Y and Z, we say that Y\leq Z if Y is an initial segment of Z. This means that Y is a subset of Z, that the ordering associated with Y is the same as the ordering associated with Z when you restrict it to Y, and that every element of Y is less than every element of Z\setminus Y in the ordering on Z.

Does this partially ordered set satisfy the chain condition? Well, if we have a chain of attempts, then we can define an upper bound to be the union U of the sets in that chain, with the following ordering: u\leq u' if there is some attempt Y in the chain such that u and u' both belong to Y and u\leq u' in Y. This ordering is well-defined, because if Z is another attempt that contains u and u', then either Y\leq Z or Z\leq Y (since Y and Z both belong to the chain), and the definition of \leq guarantees that the orderings on Y and Z are consistent.

Is the ordering on U a well-ordering? Yes, since if V is a non-empty subset of U, then there must be an attempt Y in the chain such that V\cap Y is non-empty. But then V\cap Y has a minimal element v in Y. This must be minimal in U as well. To see why, suppose that v'\leq v in U. Then there must exist an attempt W in the chain with v'\in W. If v'\notin V then V does not contain W, so we must have V\leq W. But this is a contradiction, since V must then be an initial segment of W and we have v'\leq v with v'\in W\setminus V and v\in V. Therefore, v'\in V, and since v is minimal in V we must have v'=v.

We have now shown that U is well-ordered, and thereby verified the chain condition. (Note that verifying the chain condition was fairly straightforward: this is true of many applications of Zorn's lemma.) Therefore, by Zorn's lemma, there is a maximal attempt, which we hope will be a well-ordering of the whole of X.

If it is not the whole of X, then it is a well-ordering of some proper subset Y of X. But we can easily extend this attempt: let z be any element of X\setminus Y and define Z to be Y\cup\{z\}, ordering Z by taking the ordering we already have on Y and stipulating in addition that y\leq z for every y\in Y. (Of course, we also stipulate that z\leq z.) This produces a larger attempt, contradicting the maximality of Y. This contradiction completes the proof that (Zorn's lemma implies that) every set can be well-ordered.

Further general remarks

Note that the final step of the last argument, the proof that every maximal attempt must be a well-ordering of the whole of X, corresponds to the informal observation made earlier that we can keep on and on extending a well-ordering, while the verification of the chain condition corresponds to the fact that if we produce an infinite sequence of attempts then we can take their union and carry on. Again, this is typical of Zorn's-lemma arguments.

So how, in general, does one recognise the need for Zorn's lemma and how does one construct an appropriate partially ordered set in order to apply it? The clues are in the two examples above. Typically, one is trying to build a structure of some kind (such as a basis for a vector space, or a well-ordering of a set). The natural way to do it appears to be to build the structure up in stages, but there are too many stages for this to work straightforwardly. However, once one has an idea of what a stage is and what the building-up process is, one can wheel out Zorn's lemma to finish the job. The partially ordered set will consist of all objects that might conceivably be stages in the construction, and one of these objects will be smaller than another if it might conceivably come before the other in the building-up process. If the resulting partial order satisfies the chain condition and if a maximal element must be a structure of the kind one is trying to build, then the proof is complete.

Comments

The following seems like a

The following seems like a natural addition to this article: is every finitely additive probability measure on the integers countably additive? The answer could be a proof of the ultrafilter lemma by appealing to Zorn's lemma, which gives that the answer is no.

Inline comments

The following comments were made inline in the article. You can click on 'view commented text' to see precisely where they were made.

Problem with argument

If V is any non-empty subset of U, does it follow that it must be an initial segment of some set W in the chain. This does not seem necessary. Does any non-empty subset of U have to contain the minimal element?

There is no assumption here

There is no assumption here that V is an initial segment, but Y is an initial segment, and if Y\cap V is non-empty then Y must contain the minimal element.

Inline comments

The following comments were made inline in the article. You can click on 'view commented text' to see precisely where they were made.

Identified a typo

"This must be minimal in V as well " instead of "This must be minimal in U as well".

The claim here is that the

The claim here is that the element v is a minimal element of V not just with respect to the ordering on Y but also with respect to the ordering on U.

How about the theorem that

How about the theorem that ideals are contained in maximal ideals in rings with identities? That seems to be in the same spirit.
Or, the existence of minimal prime ideals.