Quick description
If is a random variable that takes non-negative values, then the probability that
is at most
This simple result is called Markov's inequality. Clever uses of Markov's inequality can have surprisingly powerful consequences. The main trick is to choose an increasing function
and apply Markov's inequality to the random variable
instead. If
is rapidly growing but
is not too large, then one can prove that the probability that
, which equals the probability that
, is very small.
Prerequisites
A familiarity with basic concepts in probability such as random variables and expectation.
Example 1
Let us begin with a proof of Markov's inequality itself. By the law of total probability we know that
![\mathbb{E}X=\mathbb{E}[X|X\geq t]\mathbb{P}[X\geq t]+\mathbb{E}[X|X<t]\mathbb{P}[X<t].](/images/tex/23483120d7d02f6f304b7c32eaf5f817.png)
Since (by hypothesis), this is at least
which is obviously at least
and from this it follows that
, as we wanted.
Example 2
It is frequently useful in probability theory (and also in probabilistic combinatorics) to prove that a random variable is close to its mean most of the time. Of course, this may not always be true, which is why we invent concepts such as variance to give us some measure of how concentrated a random variable is about its mean. A precise relationship between the variance of a random variable and the concentration of
about its mean
is given by Chebyshev's inequality, which states that if
, then
To prove this, we first recall the definition of the variance of , which is
. This instantly suggests a function that we can plug into Markov's inequality: instead of looking at the random variable
we shall look at
(The function
is not increasing, but this is all right because we are actually interested in the random variable
)
The random variable is non-negative, so by Markov's inequality we find that
![\mathbb{P}[(X-\mu)^2\geq t^2]\leq\mathbb{E}(X-\mu)^2/t^2=\mathrm{Var}X/t^2,](/images/tex/2b0cfc894c5b6b11e78c96f4c4cdbaea.png)
and we are done, since
General discussion
Suppose that we have a random variable with mean
such that
and
Then Markov's inequality implies that
while Chebyshev's inequality implies that
Since all probabilities are at most 1, Markov's inequality tells us nothing unless
, and Chebyshev's inequality tells us nothing unless
Since
this tells us that Markov's inequality will usually give us some information for smaller values than Chebyshev's inequality. However, if
and
are reasonably comparable, then Chebyshev's inequality is much stronger for larger values of
. This illustrates a general principle: a good bound on
for a quickly growing positive function
gives us a good upper bound for the probability that
is large. In the next example, we shall see a very strong result of this kind.
Example 3
Let be independent random variables, each of mean 0 and each taking values in the interval
What is the probability that
?
To get a preliminary feel for this question, let us use the fact that variances of independent events add together. Each obviously has variance at most 1, since it cannot differ from its mean by more than 1. Therefore, the variance of
is at most
, so by Chebyshev's inequality the probability that
is at most
This tells us that once
gets past
, the probability decays fairly rapidly. However, we shall now see that it decays far more rapidly than Chebyshev's inequality implies.
To prove this, we shall estimate the probability that and then, by symmetry, we will also have an estimate for the probability that
To obtain the first estimate we shall think about the expectation of
The exponential function has three properties that make it a particularly good choice. The first is that it is non-negative. The second is that it turns sums into products, which is very useful to us because the expectation of a product of independent random variables is equal to the product of the expectations. The third is that it is a rapidly growing function.
Using these properties, we find that

We would like to show that this is not too large, and our problem is now reduced to showing that each is not too large. It is at this point that we exploit the fact that each
has mean
and takes values in
As long as
, this implies that

And now a small calculation shows that if then this is at most
Indeed, we can prove this coefficient by coefficient:
, and so on. Therefore,
Now we are ready to apply Markov's inequality. The probability that is equal to the probability that
which by Markov's inequality is at most
We are now free to choose
so as to minimize this estimate, provided that
The minimum occurs when
when it gives
This estimate is valid if
, and therefore for all
, since the probability that
is zero.
After applying the same argument to we find that the probability that
is at most
so once
gets beyond
the probability decays very quickly indeed.
This conclusion is far stronger than that of Chebyshev's inequality, but it should be stressed that we have also used a stronger hypothesis: here we assumed that the random variables were bounded, but we could have obtained an upper bound of from the weaker assumption that each
had variance at most 1.
Comments
a suggestion for improvement
Wed, 27/01/2010 - 08:06 — alexHere is a point which I feel needs to be emphasized more in Example 3.
If we try to make the argument work without leaving
to be optimized later, but by, say, initially setting
, the argument will fail: we will get a useless bound. The reason: we need to pick a function
which will grow rapidly while
will grow slowly. In this case,
grows pretty fast with
.
What kind of functions
work for this? Well,
works: out of the
terms in
, almost all have expectation
. Similarly,
works fine as well: in the sum
, we will be able to easily cancel many terms because they have expectation
.
The central insight behind the proof in Example 3 is that picking
for really, really small
works. The reason: the average of
and
is really close to
. That is: near one, inversion is about the same as subtraction. This insight can be pushed a little further to argue that if
has mean
, the quantity
should be pretty close to
. Thus one might hope that
grows slowly - after all, it is "approximately"
.