Tricki

## How to use martingales

### Quick description

An important general principle in probability is that a sum of bounded independent random variables, each of mean 0, has a very small probability of having a large modulus, where "large" means substantially larger than its standard deviation. For example, if you toss a coin 1000 times, then the probability that the number of heads does not lie between 400 and 600 is tiny. This principle can be proved rigorously – see Example 3 of bounding probabilities by expectations. Also, and this is the main theme of this article, the assumption of independence can be weakened in a way that makes the principle much more widely applicable.

### Prerequisites

Basic concepts of probability, graph theory, metric spaces.

 Contributions wanted This article could use additional contributions. More examples wanted.

### Example 1

Let be the symmetric group on elements. There is a natural graph with as its vertex set, where two permutations and are joined if and only if is a transposition, or equivalently for some transposition This graph gives us a natural metric on : the distance between two permutations and is the smallest such that we can find transpositions with

Now let be a function that is 1-Lipschitz with respect to this distance. This is equivalent to the assumption that whenever there is a transposition such that (An obvious example of such a function is the function where is defined to be the smallest such that is a product of transpositions.) Since every permutation can be written as a product of at most transpositions and an -cycle needs exactly transpositions, the diameter of is , so cannot exceed We shall show that is "highly concentrated", in the sense that for almost all pairs is far smaller than – it is rarely much more than

### General discussion

Before we get on to the proof, we need to state Azuma's inequality, and before we do that we need to say what a martingale is. The latter is simple. Let be a sequence of random variables. They are a martingale if for every the expectation of given the values of is

It is perhaps clearer to express this in terms of the martingale difference sequence where for each . Then the martingale condition is that the expectation of given the values of and is always . An obvious example is when the are independent random variables with mean , but as we shall see there is an easy way of generating many more examples than this.

Suppose that each variable takes values between and . Then Azuma's inequality states that the probability that is at most More generally, if takes values between and , then the probability that is at most

Now let us return to our example, and see how we can define a martingale in a very natural way.

### Example 1, continued

Recall that we have a function defined on , with the property that if and are partitions such that is a transposition, then In order to define a martingale difference sequence, we choose a random permutation and "reveal information about little by little".

What does this mean? Well, we look at the values of , , , and so on, and at the th stage we define to be the expected value of given the values In other words, is just is the expected value of given that and in general is the expected value of given that for every

What can we say about ? We would like to prove an upper bound for the maximum absolute value that can take, and we would also like to prove that the expectation of given the values of is .

is the amount by which the expectation of can change if we add to our existing knowledge of the values of the knowledge that To get an upper bound for this, let us define , for each not in the set , to be the set of all such that for every and Now let us pick and and define a bijection between and as follows. Given a permutation , let agree with except that and To put this another way, where is the transposition that exchanges and .

Now the distance between and is at most 1 (and exactly 1 if ). Therefore, for every It follows that differs from by at most 1.

In other words, once we know what is, the difference it can make to our expectation of is at most 1. Also, the average of given that for every is the average over of the averages of over (since all the sets have the same size), so the expectation of given is , as we wanted. (Equivalently, the expectation of given is )

We have now checked the facts we wanted to check: we have a martingale and the random variables in the martingale difference sequence take values in . It follows from Azuma's inequality that the probability that is at most From this we see that even though the diameter of is , for the vast majority of pairs the difference is at most

### General discussion

The proof of Azuma's inequality is a relatively straightforward generalization of the proof given in Example 3 of bounding probabilities by expectations, which deals with the special case where the martingale difference sequence consists of independent random variables that take values in

## Comments

### possible typo?

The article says

"Then the martingale condition is that the expectation of given the values of is always 0."

I see that the martingale condition implies this, but is it true that the martingale condition "is" this? This condition can be satisfied without the sequence being a martingale.

Indeed, suppose A, B are independent random variables with mean zero. Define
. Clearly, , so the above condition holds. On the other this is not a martingale: which may not equal .

### Thanks &ndash; I've corrected

Thanks – I've corrected it. I've tended to deal with martingales, such as the one in this permutations example, where is constant, which is why I wrote what I wrote.

### Trivia

There is a typo in the second paragraph of section Example 1, continued: one should cancel the symbol in the two occurrences of

the expected value of given that

And, reading the last paragraph of this section, I wondered why you picked as an example of extremely unlikely deviation. First, would already yield a rather respectable (i.e., small) bound, namely, something like . Second, there is nothing specific about here, although the formulation seems to indicate otherwise: you might want to add a for example somewhere.

Thank you for this article.