If you have not seen a proof that yields a bound of the form for some (possibly negative) constant then it may seem surprising that this function often arises naturally. This article explains why and gives an example. More examples would greatly improve the article.
The answer to why such a bound ever occurs is simple, and similar answers apply to many functions. It often happens that one has an idea for how to obtain a bound, but the proof depends on a parameter, such as the size of some set, or dimension of some space, that is needed in the proof. It is not obvious in advance what value of this parameter will best suit your purposes, so you keep it as a variable, , say. Then at the end of the proof, you find that the quantity that you are trying to maximize can be at least as big as , where increases with , decreases with , and is the main parameter that specifies the size of the problem. The resulting bound can easily be quite a bit more exotic than the functions and that are used to define it. For example, to obtain the bound one can take and . Taking logs, one sees that these two are equal when , so , so .
In a moment we shall see an example of a proof where a bound of this kind arises, but note that we can already guess something important about it. Almost certainly it will involve a construction where one has to balance two things that can go wrong, and one chooses a certain parameter optimally at the end. And we can guess all this before we even know what the statement is that will be proved! Such is the power of metamathematics.
Actually, that was an exaggeration. It is possible to conceive of other ways in which a bound of might arise. For example, suppose that counted the number of points in some vector space over and counted the number of points in a subspace . If the optimal dimension of was the square root of the dimension of , then would be . That is, another feature of the function that makes it reasonably natural is that it is a conjugation of one very natural function by another: on a logarithmic scale one is simply taking square roots. For this reason one often thinks of this function as being "half way" between bounded functions and polynomials (because is "half way" between constant functions and linear functions).♦ Having said that, it is still not completely obvious why one would ever want to take the square root of a dimension: the answer in some cases turns out to be that one is still optimizing a parameter as above.
Example 1: Behrend's set
Behrend's set is an example of a surprisingly dense subset of that does not contain any arithmetic progression of length 3. Here is a sketch of how it is constructed.
Observe first that the surface of a sphere in contains no arithmetic progression of length 3, if by "arithmetic progression" we mean a set of the form with .
Next, consider the grid (that is, the set of all points such that each is an integer between and ). The number of possible values of is at most , since this number is always an integer between 0 and .
Therefore, by the pigeonhole principle, there must be some such that for at least points in the grid .
By the first observation, the set of all such points does not contain an arithmetic progression of length 3.
Consider the map . (This map thinks of a sequence in as the digits of a number in base , written backwards.) This map takes values in the set . It is easy to check that if , and form an arithmetic progression of length 3 in , then , and form one in .
It follows that contains no arithmetic progression of length 3.
From this argument we see that we can obtain a subset of of size with no progression of length 3, provided that is at most . Thus, we are faced with a simple optimization problem: maximize subject to the constraint that is at most . To solve this, we reduce the number of variables by one by taking to be equal to . (There is a small problem here, which is that both and have to be integers. But that is easy to deal with and doesn't affect the answer by very much.) That is, we shall take to be . Then , so we are trying to maximize . Equivalently, we would like to minimize . Taking logs, we find that we would like to minimize . Since is much smaller than , and since the sum of two positive real numbers is comparable to the maximum of those two numbers, this minimum occurs roughly when and are equal, which tells us that for some constant . (This general trick is explored further in the article To optimize a sum try making the terms roughly equal in size.) And then is , which is of the form . And feeding that back in we find that is of the form . (Here the constant changes from line to line but is always positive.)
Note that in the above example we had to balance two considerations. On the one hand, the fact that we have to use base representations of integers rather than base representations costs us a factor of , and on the other hand the fact that we have to pass from the grid to a -dimensional subset of the grid costs us a factor of (which we couldn't expect to improve to anything better than ), which equals . So we are in almost exactly the situation described at the beginning of the article.
Another remark that makes the function slightly less mysterious is that it is in a sense the most natural function that is bigger than any power of and smaller than any power of . To see that, ask yourself the following question: what is the most natural function that is bigger than and smaller than any linear function in ? Surely the answer is . This suggests that the most natural function to lie between and any function of the form is . And then, exponentiating, we conclude that the most natural function to lie between and any power of is .