a repository of mathematical know-how

To prove convergence, find a rapidly converging subsequence

Stub iconThis article is a stub.This means that it cannot be considered to contain or lead to any mathematically interesting information.

Quick description

Any subsequence of a convergent sequence also converges to the limit of the original sequence. One can reverse this idea and study first the convergence of a special subsequence of the original sequence. This study will be simpler than studying the original sequence because one is allowed to choose as well behaving a subsequence as one wishes. If the chosen subsequence converges, this can be useful in several ways. First, if there is no apriori candidate for limit, the limit of the subsequence identifies a unique candidate for the limit of the whole sequence 2) if the chosen subsequence approximates the whole sequence well then this can be used to show that the whole sequence converges to the limit of the subsequence.

Example 1

The proof of the strong law of large numbers. Richard Durrett, Probability: Theory and Examples, Second edition, Chapter 1. This theorem concerns a sequence X_1(\cdot), X_2(\cdot),..., X_n(\cdot) of pairwise independent, identically distributed random variables on a probability space (\Omega, {\mathcal F}, P) with {\mathbb E}|X_i| < \infty. The strong law of large numbers states that

 \frac{1}{n} \sum_{i=1}^n X_i(\omega) \rightarrow {\mathbb E}[X_1] (1)

almost surely. That is, there is a measurable subset \Omega' \subset \Omega with P(\Omega') = 1 and (1) holds for every \omega \in \Omega'.

The proof involves several ideas, one of which is passing to a simple convergent subsequence.

The argument begins with a truncation (see also replace your sequence with a simpler one): we replace the random variables X_n with Y_n = X_n 1_{\{|X_n| \le n\}}. (See the truncation article on why truncation makes sense and for thoughts on where to truncate).

We give the details of this replacement as it demonstrates well several interesting ideas. Let T_n = Y_1 + Y_2 + \cdots + Y_n. T_n and S_n can differ only on a subset of  X_n(\omega) \neq Y_n(\omega) for infinitely many n\} or N = \{ |X_n | > n, infinitely often  \} as this type of statements are commonly made in probability theory. (A link to an article on zero one laws, Borel Cantelli lemma).

By Borel Cantelli Lemma we know that P(N)=0 if \sum_{n=1} P( |X_n|>n) < \infty. In the current problem this sum is intimately related to the assumptions of our problem. We see this with an integration using control level sets:

 \sum_{n=1}^\infty P ( |X_n| > n )< \int_0^\infty P(|X_1| > x ) dx = {\mathbb E}[|X_1|] < \infty.

Rest of the steps:

  • Replace Y_n with Y_n^+ (Durrett cites Etemadi as the source of this idea)

  • Choose a subsequence of Y_{k_n} for which simple Borel Cantelli and Chebyshev inequality works (here the expression \sum var(Y_k) / k^2 naturally comes up and one needs to bound this sum; once again control level sets are used). One also uses Fubini's theorem to change the order of a double sum. The choice of the subsequence: i think the key idea here is to choose a partition k_1 < k_2 < k_3 <\cdots < k_n < \cdots of the time index so that the intervals k_{n+1} -k_n are getting large enough with n to make Chebyshev's inequality sufficiently precise to imply the desired convergence. Durrett uses k_n = \lfloor \alpha^n \rfloor and in the final argument \alpha is sent to 1.

  • Use monotonicity (which the first item brought) to show that the chosen subsequence approximates the whole sequence well.