a repository of mathematical know-how

To make a function nicer without changing it much, convolve it with an approximate delta function

Quick description

Suppose you have a function and want to prove that it can be approximated by a smoother function. A method that often works is convolution. If you convolve a function with the delta function, you don't change it; if you convolve it with a smooth approximation to the delta function, then you may well change it only very slightly and end up with a smooth function. Indeed, a general principle that often applies is that a convolution of two functions inherits the nice properties of each function.


Basic real analysis

Example 1

Weierstrass's approximation theorem states that every continuous function [0,1]\rightarrow\R can be uniformly approximated by polynomials. That is, for every \epsilon>0 there exists a polynomial P such that |P(x)-f(x)|<\epsilon for every x\in[0,1].

There are several ways of proving this result (though they are not always as distinct as they at first appear). Let us prove it here by starting with the observation, which we shall not try to state rigorously, that if you convolve f with the delta-function, then you get f.

What does this mean? Well, the convolution of two functions f and g is the function f*g defined by the formula

f*g(x)=\int_{-\infty}^\infty f(x-y)g(y)dy.

The delta-function \delta is not really a function, but whatever it is, it has the property that \int_{-\infty}^\infty h(x)\delta(x)dx=h(0), and we can think of it as taking the value \infty "with mass 1" at 0 and the value 0 everywhere else. Therefore,

f*\delta(x)=\int_{-\infty}^\infty f(x-y)\delta(y)dy=f(x-0)=f(x).

Now let us do another calculation. What is F(x)=\int_{-\infty}^\infty f(x-y)y^ndy? We shall assume that f decays sufficiently rapidly for this integral to be finite. (Indeed, soon we shall assume that f vanishes outside some interval.) Substituting x-y for y we can rewrite the integral as \int_{-\infty}^\infty f(y)(x-y)^ndy. If we now differentiate under the integral sign n+1 times we keep differentiating the (x-y)^n term and end up with 0. Therefore, F is a polynomial of degree at most n.

We have just shown that convolving by a delta-function has no effect, and convolving by a polynomial of degree n gives a polynomial of degree at most n. This suggests a possible way of approximating f by a polynomial: convolve it by a polynomial approximation of the delta-function. And this, give or take one small technicality, works.

The small technicality is that we would like f to be continuous and defined on all of \R rather than just on [0,1]. We would also like it to have good decay, so all we do is replace it by a continuous function \R\rightarrow\R that equals f on [0,1] and vanishes outside [-1,2]. Let us call this new function f. Our aim will be to approximate f uniformly by a polynomial on [0,1] even though both f and the polynomial are defined on \R. (Obviously we can't hope to approximate f uniformly on all of \R, since any non-constant polynomial will tend to \pm\infty.)

So now let us try to approximate the delta-function by a polynomial. What we mean by this is that we would like a polynomial that "looks like the delta-function" for all values that have any chance of being involved when we convolve with f. Since f vanishes outside a bounded interval, this just means that our polynomial should look like the delta-function inside some appropriate bounded interval. The interval [-3,3] will do fine (and is in fact bigger than necessary). For a polynomial P to look like the delta-function on this interval, we would like P to take non-negative values (this is not essential, but it is nice), and for \int_{-3}^3P(x)dx to be almost equal to \int_{-\eta}^\eta P(x)dx for some small \eta, and for both of these integrals to be approximately 1. Thus, we take the delta-function and replace "mass 1 on an interval of width zero and zero everywhere else" by "mass approximately 1 on a very narrow interval and almost zero everywhere else".

It is easy to construct such a polynomial. First we take a polynomial that has a unique maximum at the origin and is non-negative on [-3,3]. The simplest example is 1-(x/3)^2 so let us take that, but the precise choice is not too important. Next, we raise this polynomial to some large power n, obtaining a polynomial P(x)=(1-x^2/9)^n. This function is 1 at the origin, and becomes very small as soon as you get any distance from the origin. (To be more precise about this, we can think of (1-x^2/9)^n as being something like e^{-nx^2/9}, which becomes small when x is a large multiple of n^{-1/2}. The appearance of a Gaussian function here is not a total coincidence ...)

We want our delta-function imitation to integrate to approximately 1 over the interval [-3,3], so let us define D(x) to be \lambda P(x), where \lambda^{-1}=\int_{-3}^3P(x)dx. An easy but slightly tiresome calculation, which we shall omit here, shows that even after we have multiplied by the constant \lambda, which will be quite large, D(x) is very small outside an interval of width that tends to zero with n.

What happens when we convolve D with our original function f? Let x\in[0,1]. Then

f*D(x)=\int_{-\infty}^\infty f(y)D(x-y)dy=\int_{-3}^3f(y)D(x-y)dy\approx\int_{x-\eta}^{x+\eta}f(y)D(x-y)dy.

The first equality is the definition of convolution. The second uses the fact that f vanishes outside [-1,2] and that x\in[0,1]. For any fixed \eta>0, the approximation is valid provided n is large enough, since f, being continuous, is bounded in modulus by some constant C, and we can ensure with our choice of n that D(x-y) is much smaller than C^{-1} whenever \eta<|x-y|\leq 3.

This still leaves us free to choose \eta. We do that as follows. If we want f*D(x) to approximate f(x) to within \epsilon for every x\in[0,1], then we choose \eta such that |x-y|<\eta implies that |f(x)-f(y)| is less than \epsilon/2. This we can do because f is uniformly continuous. This step is using the standard result that a continuous function on a closed bounded interval is uniformly continuous, which is like being continuous except that we can choose the same \eta for every single x rather than having to let \eta depend on x. Then



\int_{x-\eta}^{x+\eta}f(x)D(x-y)dy=f(x)\int_{x-\eta}^{x+\eta}D(x-y)dy\approx f(x),

so we are done.

General discussion

To summarize, the strategy of the proof above was as follows.

  • Observe that the delta-function is an identity for the binary operation of convolution.

  • Observe that convolving with a polynomial gives you a polynomial.

  • Approximate the delta-function by a polynomial D, in some appropriate sense.

  • Then one can expect that convolving f with D ought to give a polynomial that approximates f.

  • Work out the details.

Example 2

Let f be a uniformly continuous function defined on the whole of \R. (As an example, one could take a continuous piecewise linear function that zigzags up and down, always with gradient \pm 1.) We would like to approximate f uniformly by an infinitely differentiable function. How can we do so?

Let us follow a very similar strategy. We shall take an infinitely differentiable approximation to the delta-function and convolve with that. We describe the proof only very briefly, since it is similar to the proof in the first example. First of all, let us suppose that we have managed to find a non-negative function D that is infinitely differentiable, integrates to 1, and vanishes outside a very small interval about 0. As before, differentiation under the integral sign allows us to prove that f*D is also infinitely differentiable, and the proof that it uniformly approximates f is basically the same as the proof in the previous example. So all that is left is to construct D.

To do this we use the well-known function g(x)=0 when x\leq 0 and e^{-1/x^2} when x>0. This function is infinitely differentiable, non-negative, and zero when x\leq 0. We then let h(x)=g(1+x)g(1-x). This function is obviously still infinitely differentiable (since it is a product of infinitely differentiable functions), and vanishes when x\leq -1 or x\geq 1. And just for good measure it is an even function with maximum at 0. The next step in the building process is to adjust the height and width of h to taste, by defining D(x)=Ah(Bx) for constants A and B of our choice. The constant B allows us to choose the interval outside which D vanishes (which will be [-D^{-1},D^{-1}]) and A allows us to ensure that \int_{-\infty}^\infty D(x)dx=1.

Example 3

The previous example also implies that the space C^\infty_c(\R) of smooth, compactly supported functions (or "test functions") is dense in L^p(\R) for any 1 \leq p < \infty. Indeed, any function in L^p(\R) can be approximated to arbitrary accuracy in L^p norm by a continuous, compactly supported function (this can be seen for instance by truncating the function to be compactly supported and then applying Lusin's theorem), and by convolution with a smooth approximation to the identity, the latter function can in turn be approximated uniformly (and hence in L^p, thanks to the compact support) by a smooth, compactly supported function.

A variant of this argument also shows that if f \in L^p(\R) and \phi_n is a sequence of approximations to the identity, then f*\phi_n converges to f in L^p (since this is true for the dense subclass of test functions, and one can take limits using Young's inequality).

General discussion

The ability to approximate rough functions by smoother ones is often employed in the trick "Create an epsilon of room".

Example 4

Note iconAttention This article is in need of attention. It would be good if someone knew a quick presentation of this example, which is convex-geometry folklore, and could give it.

A fact that is sometimes of use in convex geometry is that if you have norm on \R^n, then you can approximate it arbitrarily closely by a norm that is infinitely differentiable except at 0. It would be good to have a detailed sketch proof of this fact, which would be very tedious to prove directly I think.

Note iconAttention This article is in need of attention. How's this? Although so far, v. v. sketchy.

Let \|\cdot\| be a norm on V a finite-dimensional vector space. Take a positive smooth function \delta on the space G=GL(V) of linear isomorphisms of V, with support a compact neighborhood of the identity. Then integrating with respect to the Haar measure d\mu on G, the function

 |x| = \int_{g\in G} \|gx\|\delta(g) d\mu

is smooth away from 0\in V, nonnegative, and convex, being a linear combination of the positive convex functions x\mapsto \|gx\|. It is similarly linear under scaling; hence, a smooth norm.

In order to better approximate \|\cdot\| as above, it might be handy to use the exponential map from the lie algebra \mathfrak{gl}(V) which is again a vector space, and naturally supports parameter re-scaling.


Showing that the Schwartz space dense in L^p another good example of this trick, which I just covered in my class actually. I might put it in here later (and also interlink with "create an epsilon of room".)

On Example 4

Also a good example for uses of duality is the C^1 case: consider a sublevel set for the norm, and Minkowski-sum with a smooth and suitably symmetric convex region; its size doesn't matter! The result is an approximating C^1 symmetric convex region, dual to a C^1 norm.

Post new comment

(Note: commenting is not possible on this snapshot.)

Before posting from this form, please consider whether it would be more appropriate to make an inline comment using the Turn commenting on link near the bottom of the window. (Simply click the link, move the cursor over the article, and click on the piece of text on which you want to comment.)