An important general principle in probability is that a sum of bounded independent random variables, each of mean 0, has a very small probability of having a large modulus, where "large" means substantially larger than its standard deviation. For example, if you toss a coin 1000 times, then the probability that the number of heads does not lie between 400 and 600 is tiny. This principle can be proved rigorously – see Example 3 of bounding probabilities by expectations. Also, and this is the main theme of this article, the assumption of independence can be weakened in a way that makes the principle much more widely applicable.
Basic concepts of probability, graph theory, metric spaces.
Let be the symmetric group on elements. There is a natural graph with as its vertex set, where two permutations and are joined if and only if is a transposition, or equivalently for some transposition This graph gives us a natural metric on : the distance between two permutations and is the smallest such that we can find transpositions with
Now let be a function that is 1-Lipschitz with respect to this distance. This is equivalent to the assumption that whenever there is a transposition such that (An obvious example of such a function is the function where is defined to be the smallest such that is a product of transpositions.) Since every permutation can be written as a product of at most transpositions and an -cycle needs exactly transpositions, the diameter of is , so cannot exceed We shall show that is "highly concentrated", in the sense that for almost all pairs is far smaller than – it is rarely much more than
Before we get on to the proof, we need to state Azuma's inequality, and before we do that we need to say what a martingale is. The latter is simple. Let be a sequence of random variables. They are a martingale if for every the expectation of given the values of is
It is perhaps clearer to express this in terms of the martingale difference sequence where for each . Then the martingale condition is that the expectation of given the values of and is always . An obvious example is when the are independent random variables with mean , but as we shall see there is an easy way of generating many more examples than this.
Suppose that each variable takes values between and . Then Azuma's inequality states that the probability that is at most More generally, if takes values between and , then the probability that is at most
Now let us return to our example, and see how we can define a martingale in a very natural way.
Example 1, continued
Recall that we have a function defined on , with the property that if and are partitions such that is a transposition, then In order to define a martingale difference sequence, we choose a random permutation and "reveal information about little by little".
What does this mean? Well, we look at the values of , , , and so on, and at the th stage we define to be the expected value of given the values In other words, is just is the expected value of given that and in general is the expected value of given that for every
What can we say about ? We would like to prove an upper bound for the maximum absolute value that can take, and we would also like to prove that the expectation of given the values of is .
is the amount by which the expectation of can change if we add to our existing knowledge of the values of the knowledge that To get an upper bound for this, let us define , for each not in the set , to be the set of all such that for every and Now let us pick and and define a bijection between and as follows. Given a permutation , let agree with except that and To put this another way, where is the transposition that exchanges and .
Now the distance between and is at most 1 (and exactly 1 if ). Therefore, for every It follows that differs from by at most 1.
In other words, once we know what is, the difference it can make to our expectation of is at most 1. Also, the average of given that for every is the average over of the averages of over (since all the sets have the same size), so the expectation of given is , as we wanted. (Equivalently, the expectation of given is )
We have now checked the facts we wanted to check: we have a martingale and the random variables in the martingale difference sequence take values in . It follows from Azuma's inequality that the probability that is at most From this we see that even though the diameter of is , for the vast majority of pairs the difference is at most
The proof of Azuma's inequality is a relatively straightforward generalization of the proof given in Example 3 of bounding probabilities by expectations, which deals with the special case where the martingale difference sequence consists of independent random variables that take values in