Tricki

The tensor power trick

Quick description

If one wants to prove an inequality for some non-negative quantities , , but can only see how to prove a quasi-inequality that loses a multiplicative constant , then try to replace all objects involved in the problem by "tensor powers" of themselves and apply the quasi-inequality to those powers. If all goes well, one can show that for all , with a constant which is independent of , which implies that as desired by taking roots and then taking limits as .

General discussion

This trick works best when using techniques which are "dimension-independent" (or depend only weakly (e.g. polynomially) on the ambient dimension ). On the other hand, the constant is allowed to depend on other parameters than the dimension . In particular, even if the eventual inequality one wishes to prove is supposed to be uniform over all choices of some auxiliary parameter n, it is possible to establish this estimate by first establishing a non-uniform estimate , so long as this estimate "tensorises" to , where does not grow with .

It is of course essential that the problem "commutes" or otherwise "plays well" with the tensor power operation in order for the trick to be effective. In order for this to occur, one often has to phrase the problem in a sufficiently abstract setting, for instance generalizing a one-dimensional problem to one in arbitrary dimensions.

Prerequisites

Advanced undergraduate analysis (real, complex, and Fourier). The user of the trick also has to be very comfortable with working in high-dimensional spaces such as , or more generally for some mathematical object (e.g. a set, a group, a measure space, etc.) The later examples are more advanced but are only given as sketches.

Example 1: convexity of norms

Let be a measurable function such that and for some . Show that for all .

As a first attempt to solve this problem, observe that is less than when , and is less than when . Thus the inequality holds for all x, regardless of whether is larger or smaller than 1. This gives us the bound , which is off by a factor of 2 from what we need.

But we can eliminate this loss of 2 by the tensor power trick. We pick a large integer , and define the tensor power of f by the formula

From the Fubini–Tonelli theorem we see that

and similarly

If we thus apply our previous arguments with and replaced by and respectively, we conclude that

applying Fubini–Tonelli again we conclude that

Taking roots and then taking limits as we obtain the claim.

Exercise

More generally, show that if , is a measure space, and is measurable, then we have the inequality

whenever the right-hand side is finite, where is such that . (Hint: by multiplying f and by appropriate constants one can normalize .)

Exercise

Use the previous exercise (and a clever choice of f, r, and - there is more than one choice available) to prove Hölder's inequality.

Example 2: the maximum principle

This example is due to Landau. Let be a simple closed curve in the complex plane that bounds a domain D, and let be a function which is complex analytic in the closure of this domain, and which obeys a bound on the boundary . The maximum principle for such functions asserts that one also has on the interior as well. One way to prove this is by using the Cauchy integral formula

for (assuming that is oriented anti-clockwise). Taking absolute values and using the triangle inequality, we obtain the crude bound

where is the length of . This bound is off by a factor of . This loss depends on the point z and the curve , but it is crucial to observe that it does not depend on f. In particular, one can apply it with f replaced by (and A replaced by ) for any positive integer , noting that is of course also complex analytic. We conclude that

and on taking roots and then taking limits as we obtain the maximum principle.

Example 3: new Strichartz estimates from old

This observation is due to Jon Bennet. Technically, it is not an application of the tensor power trick as we will not let the dimension go off to infinity, but it is certainly in a similar spirit.

Strichartz estimates are an important tool in the theory of linear and nonlinear dispersive equations. Here is a typical such estimate: if solves the two-dimensional Schrödinger equation , then one has the inequality

for some absolute constant . (In this case, the best value of is known to equal , a result of Foschi and of Hundertmark-Zharnitsky.) It is possible to use this two-dimensional Strichartz estimate and a version of the tensor power trick to deduce a one-dimensional Strichartz estimate. Specifically, if solves the one-dimensional Schrödinger equation , then observe that the tensor square solves the two-dimensional Schrödinger equation. (This tensor product symmetry of the Schrödinger equation is fundamental in quantum physics; it allows one to model many-particle systems and single-particle systems by essentially the same equation.) Applying the above Strichartz inequality to this product we conclude that

Remark

A similar trick allows us to deduce "interaction" or "many-particle" Morawetz estimates for the Schrödinger equation from their more traditional "single-particle" counterparts; see for instance Chapter 3.5 of Terence Tao's book.

Example 4: the Hausdorff-Young inequality

Let be a finite abelian group, and let be a function. We let be the group of characters of G, and define the Fourier transform by the formula

From the triangle inequality we have

while from Plancherel's theorem we have

By applying the Riesz-Thorin interpolation theorem, we can then conclude the Hausdorff-Young inequality

whenever and . However, it is also possible to deduce (3) from (2) and (1) by a more elementary method based on the tensor power trick. First suppose that is supported on a set and that takes values between (say) and on . Then from (1) and (2) we have

from which we establish a "restricted weak-type" version of Hausdorff-Young, namely

and thus

This inequality is restricted to those whose non-zero magnitudes range inside a dyadic interval , but by the technique of dyadic decomposition one can remove this restriction at the cost of an additional factor of , thus obtaining the estimate

for general . (There are of course an infinite number of dyadic scales in the world, but if one normalizes , then it is not hard to see that any scale above, say, or below is very easy to deal with, leaving only dyadic scales to consider.) The estimate (4) is off by a factor of from the true Hausdorff-Young inequality, but if one replaces G with a Cartesian power and f with its tensor power , and makes the crucial observation that the Fourier transform of a tensor power is the tensor power of the Fourier transform, we see from applying (4) to the tensor power that

Taking roots and sending we obtain the desired inequality. Note that the logarithmic dependence on in the constants turned out not to be a problem, because . Thus the tensor power trick is able to handle a certain amount of dependence on the dimension in the constants, as long as the loss does not grow too rapidly in that dimension.

Exercise

Establish Young's inequality for and and finite abelian groups , where , by the same method.

Exercise

Prove the Riesz-Thorin interpolation theorem by this method, thus avoiding all use of complex analysis. (Note the similarity here with Example 1 and Example 2.)

Example 5: an example from additive combinatorics, due to Imre Ruzsa

An important inequality of Plünnecke asserts, among other things, that for finite non-empty sets , of an additive group , and any positive integer , the iterated sumset , which is defined as the set of all possible sums of not necessarily distinct elements of , obeys the bound

(This inequality, incidentally, is itself proved using a version of the tensor power trick, in conjunction with Hall's marriage theorem, but never mind that here.) This inequality can be amplified to the more general inequality

via the tensor power trick as follows. Applying (5) with , we obtain

The right-hand side looks a bit too big, but we can resolve this problem with a Cartesian product trick (which can be viewed as a cousin of the tensor product trick). If we replace with the larger group and replace each set with the larger set , where is the standard basis for and are arbitrary positive integers (and replacing with ), we obtain

Optimizing this in (basically, by making the close to constant; this is a general rule in optimization, namely that to optimize it makes sense to make and comparable in magnitude – someone should think of some examples and write a Tricki article on this) we obtain the amplified estimate

for some constant ; but then if one replaces with their Cartesian powers , takes roots, and then sends to infinity, one can delete the constant and recover the inequality.

Example 6: the Cotlar-Knapp-Stein lemma

This example is not exactly a use of the tensor power trick, but is very much in the same spirit. Let be bounded linear operators on a Hilbert space; for simplicity of discussion let us say that they are self-adjoint, though one can certainly generalize the discussion here to more general operators. If we assume that the operators are uniformly bounded in the operator norm, say

for all and some fixed , then this is not enough to ensure that the sum is bounded uniformly in ; indeed, the operator norm can be as large as . If however one has the stronger almost orthogonality estimate

for all , then the very useful Cotlar-Knapp-Stein lemma asserts that is now bounded uniformly in , with . To prove this, we first recall that a direct application of the triangle inequality using (6) (which is a consequence of (7)) only gave the inferior bound of . To improve this, first observe from self-adjointness that

Expanding the right-hand side out using the triangle inequality, we obtain

using (7) one soon ends up with a bound of . This is better than before as far as the dependence on is concerned, but one can do better by using higher powers. In particular, if we raise to the power for some M, and repeat the previous arguments, we end up with

Now, one can estimate the operator norm either by

or (using (6)) by

We take the geometric mean of these upper bounds to obtain

Summing this in , then , and so forth down to using (7) repeatedly, one eventually establishes the bound

Sending one eliminates the dependence on and obtains the claim.

Exercise

Show that the hypothesis of self-adjointness can be dropped if one replaces (7) with the two conditions

and

for all .

Example 7: entropy estimates

Suppose that is a random variable taking finitely many values. The Shannon entropy of this random variable is defined by the formula

where runs over all possible values of . For instance, if is uniformly distributed over values, then

If is not uniformly distributed, then the formula is not quite as simple. However, the entropy formula (8) does simplify to the uniform distribution formula (9) after using the tensor power trick. More precisely, let be the random variable formed by taking M independent and identicaly distributed samples of ; thus for instance, if was uniformly distributed on values, then is uniformly distributed on values. For more general , it is not hard to verify the formula

which is of course consistent with the uniformly distributed case thanks to (9).

A key observation is that as , the probability distribution of becomes "asymptotically uniform" in a certain sense. Indeed, the law of large numbers tells us that with very high probability, each possible value of will be attained by of the trials . The number of possible configurations of which are consistent with this distribution can be computed (using Stirling's formula) to be , and each such configuration appears with probability (again by Stirling's formula). Thus, at a heuristic level at least, behaves like a uniform distribution on possible values; note that this is consistent with (9) and (10), and can in fact be taken as a definition of Shannon entropy.

One can use this "microstate" or "statistical mechanics" interpretation of entropy in conjunction with the tensor power trick to give short (heuristic) proofs of various fundamental entropy inequalities, such as the subadditivity inequality

whenever , are discrete random variables which are not necessarily independent. Indeed, since and (heuristically) take only and values respectively, then will (mostly) take on at most values. On the other hand, this random variable is supposed to behave like a uniformly distributed random variable over values. These facts are only compatible if

taking roots and then sending we obtain the claim.

Exercise

Make the above arguments rigorous. (The final proof will be significantly longer than the standard proof of subadditivity based on Jensen's inequality, but it may be clearer conceptually. One may also compare the arguments here with those in Example 1.)

Example 8: the monotonicity of Perelman's reduced volume

One of the key ingredients of Perelman's proof of the Poincaré conjecture is a monotonicity formula for Ricci flows, which establishes that a certain geometric quantity, now known as the Perelman reduced volume, increases as time goes to negative infinity. Perelman gave both a rigorous proof and a heuristic proof of this formula. The heuristic proof is much shorter, and proceeds by first (formally) applying the Bishop-Gromov inequality for Ricci-flat metrics (which shows that another geometric quantity - the Bishop-Gromov reduced volume, increases as the radius goes to infinity) not to the Ricci flow itself, but to a high-dimensional version of this Ricci flow formed by adjoining an -dimensional sphere to the flow in a certain way, and then taking (formal) limits as . This is not precisely the tensor power trick, but is certainly in a similar spirit. For further discussion see "285G Lecture 9: Comparison geometry, the high-dimensional limit, and Perelman's reduced volume."

Example 9: the Riemann hypothesis for function fields

For background, see the Wikipedia entry on the Riemann hypothesis for function fields.

This example is much deeper than the previous ones, and I am not qualified to explain it in its entirety, but I can at least describe the one piece of this argument that uses the tensor power trick. For simplicity let us restrict attention to the Riemann hypothesis for curves over a finite field of prime power order. Using the Riemann-Roch theorem and some other tools from arithmetic geometry, one can show that the number of points in taking values (projectively) in is given by the formula

where g is the genus of the curve, and are complex numbers that depend on but not on . Weil showed that all of these complex numbers had magnitude exactly (the elliptic curve case being done earlier by Hasse, and the case morally going all the way back to Diophantus!), which is the analogue of the Riemann hypotheses for such curves. There is also an analogue of the functional equation for these curves, which asserts that the numbers come in pairs .

As a corollary, we see that stays quite close to :

But because of the formula (11) and the functional equation, one can in fact deduce the Riemann hypothesis from the apparently weaker statement

where the implied constants can depend on the curve , the prime , and the genus , but must be independent of the "dimension" . Indeed, from (13) and (12) one can see that the largest magnitude of the (which can be viewed as a sort of "spectral radius" for an underlying operator) is at most ; combining this with the functional equation, one obtains the Riemann hypothesis.

[This is of course only a small part of the story; the proof of (13) is by far the hardest part of the whole proof, and beyond the scope of this article.]

The blog post "Amplification, arbitrage, and the tensor power trick" discusses the tensor product trick as an example of a more general "amplification trick".

Minor correction to 'The tensor power trick'

Before (1), the equation for the Fourier transform has \chi when it should have \xi.

Thanks!

For consistency I've now decided to use \chi throughout that section.

Examples in complexity theory

1. (Approximate counting of problems in )

Let be the number of satisfying assignments of a Boolean formula . Using the leftover hash lemma, there exists a polynomial time probabilistic algorithm taking oracle for SAT that given a formula outputs with high probability a number such that

i.e. approximates within factor of 2.

By replacing with "tensor power" , the number of assignments rises to . Running the algorithm on this formula,

and putting we get approximation within factor of in time polynomial in and length of . The algorithm is by Stockmeyer.

2. (Independent set cannot be approximated to a constant factor)

For any constant , -approximating maximal independent set in a graph is -hard. (Taking complement, the same argument goes to clique)

Proof idea:

The PCP theorem implies that there is a constant such that MAX-3SAT cannot be approximated within . The usual reduction from 3SAT to independent set takes a formula with clauses and gives a graph with vertices, where each vertex corresponds to one of 7 ways a clause can be satisfied, and edges correspond to conflicts. Therefore the approximation ratio is preserved, and independent set cannot be approximated within some constant factor. Applying "tensor square" allows to lower the constant arbitrarily.