Tricki
a repository of mathematical know-how

The tensor power trick

Quick description

If one wants to prove an inequality X \leq Y for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality X \leq CY that loses a multiplicative constant C, then try to replace all objects involved in the problem by "tensor powers" of themselves and apply the quasi-inequality to those powers. If all goes well, one can show that X^M \leq C Y^M for all M \geq 1, with a constant C which is independent of M, which implies that X \leq Y as desired by taking M^{th} roots and then taking limits as M \to \infty.

General discussion

This trick works best when using techniques which are "dimension-independent" (or depend only weakly (e.g. polynomially) on the ambient dimension M). On the other hand, the constant C is allowed to depend on other parameters than the dimension M. In particular, even if the eventual inequality X \leq Y one wishes to prove is supposed to be uniform over all choices of some auxiliary parameter n, it is possible to establish this estimate by first establishing a non-uniform estimate X \leq C_n Y, so long as this estimate "tensorises" to X^M \leq C_n Y^M, where n does not grow with M.

It is of course essential that the problem "commutes" or otherwise "plays well" with the tensor power operation in order for the trick to be effective. In order for this to occur, one often has to phrase the problem in a sufficiently abstract setting, for instance generalizing a one-dimensional problem to one in arbitrary dimensions.

Prerequisites

Advanced undergraduate analysis (real, complex, and Fourier). The user of the trick also has to be very comfortable with working in high-dimensional spaces such as {\Bbb R}^M, or more generally G^M for some mathematical object G (e.g. a set, a group, a measure space, etc.) The later examples are more advanced but are only given as sketches.

Example 1: convexity of L^p norms

Let  {\Bbb R} \to {\Bbb C} be a measurable function such that \int_{\Bbb R} |f(x)|^p\ dx \leq 1 and \int_{\Bbb R} |f(x)|^q\ dx \leq 1 for some 0 < p < q < \infty. Show that \int_{\Bbb R} |f(x)|^r\ dx \leq 1 for all p < r < q.

As a first attempt to solve this problem, observe that |f(x)|^r is less than |f(x)|^q when |f(x)| \geq 1, and is less than |f(x)|^p when |f(x)| \leq 1. Thus the inequality |f(x)|^r \leq |f(x)|^p + |f(x)|^q holds for all x, regardless of whether |f(x)| is larger or smaller than 1. This gives us the bound \int_{\Bbb R} |f(x)|^r\ dx \leq 2, which is off by a factor of 2 from what we need.

But we can eliminate this loss of 2 by the tensor power trick. We pick a large integer M \ge 1, and define the tensor power  {\Bbb R}^M \to {\Bbb C} of f by the formula

 f^{\otimes M}(x_1,\ldots,x_M) = f(x_1) \ldots f(x_M).

From the Fubini–Tonelli theorem we see that

 \int_{{\Bbb R}^M} |f^{\otimes M}|^p\ dx = (\int_{\Bbb R} |f|^p\ dx)^M \leq 1

and similarly

 \int_{{\Bbb R}^M} |f^{\otimes M}|^q\ dx = (\int_{\Bbb R} |f|^q\ dx)^M \leq 1.

If we thus apply our previous arguments with f and {\Bbb R} replaced by f^{\otimes M} and {\Bbb R}^M respectively, we conclude that

 \int_{{\Bbb R}^M} |f^{\otimes M}|^r\ dx \leq 2;

applying Fubini–Tonelli again we conclude that

 (\int_{\Bbb R} |f|^r\ dx)^M \leq 2.

Taking M^{th} roots and then taking limits as M \to \infty we obtain the claim.

Exercise

More generally, show that if 0 < p < r < q < \infty, (X,\mu) is a measure space, and  X \to {\Bbb C} is measurable, then we have the inequality

 \| f\|_{L^r(X,\mu)} \leq \|f\|_{L^p(X,\mu)}^{1-\theta}  \|f\|_{L^q(X,\mu)}^\theta

whenever the right-hand side is finite, where 0 < \theta < 1 is such that \frac{1}{r} = (1-\theta) \frac{1}{p} + \theta \frac{1}{q}. (Hint: by multiplying f and \mu by appropriate constants one can normalize \|f\|_{L^p(X,\mu)} = \|f\|_{L^q(X,\mu)}=1.)

Exercise

Use the previous exercise (and a clever choice of f, r, \theta and \mu - there is more than one choice available) to prove Hölder's inequality.

Example 2: the maximum principle

This example is due to Landau. Let \gamma be a simple closed curve in the complex plane that bounds a domain D, and let  \overline{D} \to {\Bbb C} be a function which is complex analytic in the closure of this domain, and which obeys a bound |f(z)| \leq A on the boundary \gamma. The maximum principle for such functions asserts that one also has |f(z)|\leq A on the interior D as well. One way to prove this is by using the Cauchy integral formula

 f(z) =\frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

for z \in D (assuming that \gamma is oriented anti-clockwise). Taking absolute values and using the triangle inequality, we obtain the crude bound

 |f(z)| \leq \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)} A

where |\gamma| is the length of \gamma. This bound is off by a factor of \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)}. This loss depends on the point z and the curve \gamma, but it is crucial to observe that it does not depend on f. In particular, one can apply it with f replaced by f^M (and A replaced by A^M) for any positive integer M, noting that f^M is of course also complex analytic. We conclude that

 |f(z)|^M \leq \frac{1}{2\pi} \frac{|\gamma|}{\hbox{dist}(z,\gamma)} A^M

and on taking M^{th} roots and then taking limits as M \to \infty we obtain the maximum principle.

Example 3: new Strichartz estimates from old

This observation is due to Jon Bennet. Technically, it is not an application of the tensor power trick as we will not let the dimension go off to infinity, but it is certainly in a similar spirit.

Strichartz estimates are an important tool in the theory of linear and nonlinear dispersive equations. Here is a typical such estimate: if  {\Bbb R} \times {\Bbb R}^2 \to {\Bbb C} solves the two-dimensional Schrödinger equation i u_t + \Delta u = 0, then one has the inequality

 \| u \|_{L^4_t L^4_x( {\Bbb R} \times {\Bbb R}^2 )} \leq C \| u(0) \|_{L^2( {\Bbb R}^2 )}

for some absolute constant C. (In this case, the best value of C is known to equal 1/\sqrt{2}, a result of Foschi and of Hundertmark-Zharnitsky.) It is possible to use this two-dimensional Strichartz estimate and a version of the tensor power trick to deduce a one-dimensional Strichartz estimate. Specifically, if  {\Bbb R} \times {\Bbb R} \to {\Bbb C} solves the one-dimensional Schrödinger equation iu_t + u_{xx} = 0, then observe that the tensor square = u(t,x) u(t,y) solves the two-dimensional Schrödinger equation. (This tensor product symmetry of the Schrödinger equation is fundamental in quantum physics; it allows one to model many-particle systems and single-particle systems by essentially the same equation.) Applying the above Strichartz inequality to this product we conclude that

 \| u \|_{L^8_t L^4_x( {\Bbb R} \times {\Bbb R} )} \leq C^{1/2} \| u(0) \|_{L^2( {\Bbb R} )}.

Remark

A similar trick allows us to deduce "interaction" or "many-particle" Morawetz estimates for the Schrödinger equation from their more traditional "single-particle" counterparts; see for instance Chapter 3.5 of Terence Tao's book.

Example 4: the Hausdorff-Young inequality

Let G be a finite abelian group, and let  G \to {\Bbb C} be a function. We let \hat G be the group of characters  G \to S^1 of G, and define the Fourier transform  \hat G \to {\Bbb C} by the formula

= \frac{1}{|G|} \sum_{x \in G} f(x) \overline{\chi(x)}.

From the triangle inequality we have

 \sup_{\chi \in \hat G} |\hat f(\chi)| \leq \frac{1}{|G|} \sum_{x \in G} |f(x)| (1)

while from Plancherel's theorem we have

 (\sum_{\chi \in \hat G} |\hat f(\chi)|^2)^{1/2} = (\frac{1}{|G|} \sum_{x \in G} |f(x)|^2)^{1/2} (2)

By applying the Riesz-Thorin interpolation theorem, we can then conclude the Hausdorff-Young inequality

 (\sum_{\chi \in \hat G} |\hat f(\chi)|^q)^{1/q} \leq (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} (3)

whenever 1 < p < 2 and \frac{1}{p}+\frac{1}{q}=1. However, it is also possible to deduce (3) from (2) and (1) by a more elementary method based on the tensor power trick. First suppose that f is supported on a set A \subset G and that |f| takes values between (say) 2^m and 2^{m+1} on A. Then from (1) and (2) we have

 \displaystyle \sup_{\chi \in \hat G} |\hat f(\chi)| \leq \frac{|A|}{|G|} 2^{m+1}
 (\sum_{\chi \in \hat G} |\hat f(\chi)|^2)^{1/2} \leq (\frac{|A|}{|G|})^{1/2} 2^{m+1}

from which we establish a "restricted weak-type" version of Hausdorff-Young, namely

 (\sum_{\chi \in \hat G} |\hat f(\chi)|^q)^{1/q} \leq (\frac{|A|}{|G|})^{1/p} 2^{m+1},

and thus

  (\sum_{\chi \in \hat G} |\hat f(\chi)|^q)^{1/q} \leq 2 (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p}.

This inequality is restricted to those f whose non-zero magnitudes |f(x)| range inside a dyadic interval {}[2^m, 2^{m+1}], but by the technique of dyadic decomposition one can remove this restriction at the cost of an additional factor of O(1 + \log |G| ), thus obtaining the estimate

 (\sum_{\chi \in \hat G} |\hat f(\chi)|^q)^{1/q} \leq C (1 + \log |G|) (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} (4)

for general f. (There are of course an infinite number of dyadic scales {}[2^m, 2^{m+1}] in the world, but if one normalizes  (\frac{1}{|G|} \sum_{x \in G} |f(x)|^p)^{1/p} = 1, then it is not hard to see that any scale above, say, |G|^{100} or below |G|^{-100} is very easy to deal with, leaving only O( 1 + \log |G| ) dyadic scales to consider.) The estimate (4) is off by a factor of O( 1 + \log |G| ) from the true Hausdorff-Young inequality, but if one replaces G with a Cartesian power G^M and f with its tensor power f^{\otimes M}, and makes the crucial observation that the Fourier transform of a tensor power is the tensor power of the Fourier transform, we see from applying (4) to the tensor power that

 (\sum_{\chi \in \hat G} |\hat f(\chi)|^q)^{M/q} \leq C (1 + \log |G|^M) (\frac{1}{|G|} \sum_{x\in G} |f(x)|^p)^{M/p}.

Taking M^{th} roots and sending M \to \infty we obtain the desired inequality. Note that the logarithmic dependence on |G| in the constants turned out not to be a problem, because M^{1/M} \to 1. Thus the tensor power trick is able to handle a certain amount of dependence on the dimension in the constants, as long as the loss does not grow too rapidly in that dimension.

Exercise

Establish Young's inequality \|f*g\|_{l^r(G)} \leq \|f\|_{l^p(G)} \|g\|_{l^q(G)} for 1 \leq p,q,r < \infty and \frac{1}{r}+1=\frac{1}{p}+\frac{1}{q} and finite abelian groups G, where = \sum_G f(y) g(x-y), by the same method.

Exercise

Prove the Riesz-Thorin interpolation theorem by this method, thus avoiding all use of complex analysis. (Note the similarity here with Example 1 and Example 2.)

Example 5: an example from additive combinatorics, due to Imre Ruzsa

An important inequality of Plünnecke asserts, among other things, that for finite non-empty sets A, B of an additive group G, and any positive integer k, the iterated sumset kB = B +\ldots + B, which is defined as the set of all possible sums of k not necessarily distinct elements of B, obeys the bound

 |kB| \leq \frac{|A+B|^k}{|A|^{k-1}}. (5)

(This inequality, incidentally, is itself proved using a version of the tensor power trick, in conjunction with Hall's marriage theorem, but never mind that here.) This inequality can be amplified to the more general inequality

 |B_1 + \ldots + B_k| \leq \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}

via the tensor power trick as follows. Applying (5) with = B_1 \cup \ldots \cup B_k, we obtain

 |B_1 + \ldots + B_k| \leq \frac{(|A+B_1| + \ldots + |A+B_k|)^k}{|A|^{k-1}}.

The right-hand side looks a bit too big, but we can resolve this problem with a Cartesian product trick (which can be viewed as a cousin of the tensor product trick). If we replace G with the larger group G \times {\Bbb Z}^k and replace each set B_i with the larger set B_i \times \{ e_i, 2e_i, \ldots, N_i e_i \}, where e_1,\ldots,e_k is the standard basis for {\Bbb Z}^k and N_i are arbitrary positive integers (and replacing A with A \times \{0\}), we obtain

 N_1 \ldots N_k |B_1 + \ldots + B_k| \leq \frac{(N_1 |A+B_1| + \ldots + N_k |A+B_k|)^k}{|A|^{k-1}}.

Optimizing this in N_1,\ldots,N_k (basically, by making the N_i |A+B_i| close to constant; this is a general rule in optimization, namely that to optimize X+Y it makes sense to make X and Y comparable in magnitude – someone should think of some examples and write a Tricki article on this) we obtain the amplified estimate

 |B_1 + \ldots + B_k| \leq C_k \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}.

for some constant C_k; but then if one replaces A, B_1, \ldots, B_k with their Cartesian powers A^M, B_1^M, \ldots, B_k^M, takes M^{th} roots, and then sends M to infinity, one can delete the constant C_k and recover the inequality.

Example 6: the Cotlar-Knapp-Stein lemma

This example is not exactly a use of the tensor power trick, but is very much in the same spirit. Let  H \to H be bounded linear operators on a Hilbert space; for simplicity of discussion let us say that they are self-adjoint, though one can certainly generalize the discussion here to more general operators. If we assume that the operators are uniformly bounded in the operator norm, say

 \| T_i \|_{op} \leq A (6)

for all i and some fixed A, then this is not enough to ensure that the sum \sum_{i=1}^N T_i is bounded uniformly in N; indeed, the operator norm can be as large as AN. If however one has the stronger almost orthogonality estimate

 \sum_{j=1}^N \| T_i T_j \|_{op}^{1/2} \leq A (7)

for all i, then the very useful Cotlar-Knapp-Stein lemma asserts that \sum_{i=1}^N T_i is now bounded uniformly in N, with \| \sum_{i=1}^N T_i \|_{op} \leq A. To prove this, we first recall that a direct application of the triangle inequality using (6) (which is a consequence of (7)) only gave the inferior bound of AN. To improve this, first observe from self-adjointness that

 \| \sum_{i=1}^N T_i \|_{op} =  \|(\sum_{i=1}^N T_i)^2 \|_{op}^{1/2}.

Expanding the right-hand side out using the triangle inequality, we obtain

 \| \sum_{i=1}^N T_i \|_{op} \leq (\sum_{i=1}^N \sum_{j=1}^N \| T_i T_j \|_{op})^{1/2};

using (7) one soon ends up with a bound of A N^{1/2}. This is better than before as far as the dependence on N is concerned, but one can do better by using higher powers. In particular, if we raise \sum_{i=1}^N to the 2M^{th} power for some M, and repeat the previous arguments, we end up with

 \| \sum_{i=1}^N T_i \|_{op} \leq(\sum_{1 \leq i_1,\ldots,i_{2M} \leq N} \| T_{i_1} \ldots T_{i_{2M}} \|_{op})^{1/2M}.

Now, one can estimate the operator norm  \| T_{i_1} \ldots T_{i_{2M}} \|_{op} either by

 \| T_{i_1} T_{i_2} \|_{op} \ldots \| T_{i_{2M-1}} T_{i_{2M}} \|_{op},

or (using (6)) by

 A^2 \| T_{i_2} T_{i_3} \|_{op} \ldots \| T_{i_{2M-2}} T_{i_{2M-1}} \|_{op}.

We take the geometric mean of these upper bounds to obtain

 A \| T_{i_1} T_{i_2} \|_{op}^{1/2} \|T_{i_2} T_{i_3} \|_{op}^{1/2} \ldots \| T_{i_{2M-1}} T_{i_{2M}} \|_{op}^{1/2}.

Summing this in i_{2M}, then i_{2M-1}, and so forth down to i_1 using (7) repeatedly, one eventually establishes the bound

 \| \sum_{i=1}^N T_i \|_{op} \leq  (N A^{2M})^{1/2M}.

Sending M \to \infty one eliminates the dependence on N and obtains the claim.

Exercise

Show that the hypothesis of self-adjointness can be dropped if one replaces (7) with the two conditions

 \sum_{j=1}^N \| T_i T_j^* \|_{op}^{1/2} \leq A

and

 \sum_{j=1}^N \| T_i^* T_j \|_{op}^{1/2} \leq A

for all i.

Example 7: entropy estimates

Suppose that X is a random variable taking finitely many values. The Shannon entropy H(X) of this random variable is defined by the formula

= - \sum_x {\Bbb P}(X=x) \log {\Bbb P}(X=x), (8)

where x runs over all possible values of X. For instance, if X is uniformly distributed over N values, then

 H(X)=\log N. (9)

If X is not uniformly distributed, then the formula is not quite as simple. However, the entropy formula (8) does simplify to the uniform distribution formula (9) after using the tensor power trick. More precisely, let X^{\otimes M} = (X_1,\ldots,X_M) be the random variable formed by taking M independent and identicaly distributed samples of X; thus for instance, if X was uniformly distributed on N values, then X^{\otimes M} is uniformly distributed on N^M values. For more general X, it is not hard to verify the formula

which is of course consistent with the uniformly distributed case thanks to (9).

A key observation is that as M \to \infty, the probability distribution of X^{\otimes M} becomes "asymptotically uniform" in a certain sense. Indeed, the law of large numbers tells us that with very high probability, each possible value x of X will be attained by ({\Bbb P}(X=x)+o(1)) M of the M trials X_1,\ldots,X_M. The number of possible configurations of X^{\otimes M} = (X_1,\ldots,X_M) which are consistent with this distribution can be computed (using Stirling's formula) to be e^{M (H(X)+o(1))}, and each such configuration appears with probability e^{-M (H(X)+o(1))} (again by Stirling's formula). Thus, at a heuristic level at least, X^{\otimes M} behaves like a uniform distribution on e^{M (H(X)+o(1))} possible values; note that this is consistent with (9) and (10), and can in fact be taken as a definition of Shannon entropy.

One can use this "microstate" or "statistical mechanics" interpretation of entropy in conjunction with the tensor power trick to give short (heuristic) proofs of various fundamental entropy inequalities, such as the subadditivity inequality

 H(X,Y) \leq H(X) + H(Y)

whenever X, Y are discrete random variables which are not necessarily independent. Indeed, since X^{\otimes M} and Y^{\otimes M} (heuristically) take only e^{M (H(X)+o(1))} and e^{M (H(Y)+o(1))} values respectively, then (X,Y)^{\otimes M} \equiv (X^{\otimes M}, Y^{\otimes M}) will (mostly) take on at most e^{M (H(X)+o(1))} e^{M (H(Y)+o(1))} values. On the other hand, this random variable is supposed to behave like a uniformly distributed random variable over e^{M (H(X,Y)+o(1))} values. These facts are only compatible if

 e^{M (H(X,Y)+o(1))} \leq e^{M(H(X)+o(1))}  e^{M (H(Y)+o(1))};

taking M^{th} roots and then sending M \to \infty we obtain the claim.

Exercise

Make the above arguments rigorous. (The final proof will be significantly longer than the standard proof of subadditivity based on Jensen's inequality, but it may be clearer conceptually. One may also compare the arguments here with those in Example 1.)

Example 8: the monotonicity of Perelman's reduced volume

One of the key ingredients of Perelman's proof of the Poincaré conjecture is a monotonicity formula for Ricci flows, which establishes that a certain geometric quantity, now known as the Perelman reduced volume, increases as time goes to negative infinity. Perelman gave both a rigorous proof and a heuristic proof of this formula. The heuristic proof is much shorter, and proceeds by first (formally) applying the Bishop-Gromov inequality for Ricci-flat metrics (which shows that another geometric quantity - the Bishop-Gromov reduced volume, increases as the radius goes to infinity) not to the Ricci flow itself, but to a high-dimensional version of this Ricci flow formed by adjoining an M-dimensional sphere to the flow in a certain way, and then taking (formal) limits as M \to \infty. This is not precisely the tensor power trick, but is certainly in a similar spirit. For further discussion see "285G Lecture 9: Comparison geometry, the high-dimensional limit, and Perelman's reduced volume."

Example 9: the Riemann hypothesis for function fields

For background, see the Wikipedia entry on the Riemann hypothesis for function fields.

This example is much deeper than the previous ones, and I am not qualified to explain it in its entirety, but I can at least describe the one piece of this argument that uses the tensor power trick. For simplicity let us restrict attention to the Riemann hypothesis for curves C over a finite field F_{p^M} of prime power order. Using the Riemann-Roch theorem and some other tools from arithmetic geometry, one can show that the number |C(F_{p^M})| of points in C taking values (projectively) in F_{p^r} is given by the formula

where g is the genus of the curve, and \omega_1,\ldots,\omega_{2g} are complex numbers that depend on p but not on M. Weil showed that all of these complex numbers had magnitude exactly p^{1/2} (the elliptic curve case g=1 being done earlier by Hasse, and the g=0 case morally going all the way back to Diophantus!), which is the analogue of the Riemann hypotheses for such curves. There is also an analogue of the functional equation for these curves, which asserts that the 2g numbers \omega_1,\ldots,\omega_{2g} come in pairs \omega, p/\omega.

As a corollary, we see that |C(F_{p^M})| stays quite close to p^M:

But because of the formula (11) and the functional equation, one can in fact deduce the Riemann hypothesis from the apparently weaker statement

where the implied constants can depend on the curve C, the prime p, and the genus g, but must be independent of the "dimension" M. Indeed, from (13) and (12) one can see that the largest magnitude of the \omega_1,\ldots,\omega_{2g} (which can be viewed as a sort of "spectral radius" for an underlying operator) is at most p^{1/2}; combining this with the functional equation, one obtains the Riemann hypothesis.

[This is of course only a small part of the story; the proof of (13) is by far the hardest part of the whole proof, and beyond the scope of this article.]

Further reading

The blog post "Amplification, arbitrage, and the tensor power trick" discusses the tensor product trick as an example of a more general "amplification trick".

Comments

Minor correction to 'The tensor power trick'

Before (1), the equation for the Fourier transform has \chi when it should have \xi.

Thanks!

For consistency I've now decided to use \chi throughout that section.

Examples in complexity theory

1. (Approximate counting of \mathsf{\# P} problems in {\mathsf{BPP}}^{\mathsf{NP}})

Let N(\phi) be the number of satisfying assignments of a Boolean formula \phi. Using the leftover hash lemma, there exists a polynomial time probabilistic algorithm taking oracle for SAT that given a formula \phi outputs with high probability a number a such that

\frac{1}{2} N(\phi) \leq a \leq 2N(\phi)

i.e. approximates within factor of 2.

By replacing \phi with "tensor power" = \phi_1 \wedge \dots \wedge \phi_k, the number of assignments rises to N(\phi^k) = N(\phi)^k. Running the algorithm on this formula,

\frac{1}{2} N(\phi) \leq a \leq 2N(\phi)^k
\frac{1}{2^{1/k}} N(\phi) \leq a^{1/k} \leq 2^{1/k} N(\phi)

and putting k=O(\frac{1}{\epsilon}) we get approximation within factor of \epsilon in time polynomial in \frac{1}{\epsilon} and length of \phi. The algorithm is by Stockmeyer.

2. (Independent set cannot be approximated to a constant factor)

For any constant \delta \in (0, 1), \delta-approximating maximal independent set in a graph is \mathsf{NP}-hard. (Taking complement, the same argument goes to clique)

Proof idea:

The PCP theorem implies that there is a constant \delta' < 1 such that MAX-3SAT cannot be approximated within \delta'. The usual reduction from 3SAT to independent set takes a formula with m clauses and gives a graph with 7m vertices, where each vertex corresponds to one of 7 ways a clause can be satisfied, and edges correspond to conflicts. Therefore the approximation ratio is preserved, and independent set cannot be approximated within some constant factor. Applying "tensor square" G^2 allows to lower the constant arbitrarily.