A common mistake people make when trying to answer a mathematical question is to work from first principles: it is almost always easier to modify something you already know. This article illustrates the point with examples that range from simple arithmetic to the forefront of research.
Example 1: multiplying two-digit numbers together
What is ? The obvious way of answering this question is to use your favourite multiplication technique to work it out. For example, you might say that it is . However, if you are keen on mathematics, then this probably won't be the best way of going about it, since you are likely to know many "nearby" facts and it is a pity to waste that knowledge. For example, if you know the squares of all numbers up to 20, or at least if you know that (which perhaps you know because you like powers of 2), then you can reason as follows: if sixteen sixteens are 256, then I get to seventeen sixteens by adding another 16, which takes me to 272.
This very simple example illustrates an important principle in mathematics: even if getting from A to C is hard, you may know how to get to a point B from which getting to C is easy. This phenomenon shows up strongly in mental arithmetic: if you are asked to multiply two two-digit numbers together in your head, and have a reasonable "web of number facts" at your disposal, then it is almost always better to use tricks like the one in the previous paragaph. (Another approach would have been to multiply 17 by 2 four times, getting the sequence 34, 68, 136, 272.)
More mental-arithmetic examples
What other "number facts" might one know? One that many people are aware of is the amusing fact that . This makes multiplying by pretty easy. For example, if you need to work out you can use the fact that , so . Or if you wanted to work out you could first work out , getting . Then you could add to it and deduce that . And finally you could add a further to get , which is the answer to the question asked.
Another useful principle is that . This means that if you know your squares, then you can multiply together any two numbers that are reasonably close together. For example, . What about ? We could first work out , getting (by the difference-of-squares method) and then add another to get (because it's
A less-known trick is that this principle also works "backwards", allowing us to easily compute any square using the equation . We just need to pick a suitable for which one of and becomes simple. For example, we have
In all these cases, the basic method is to get what one might think of as a "first approximation" to the calculation and then adjust it.
Example 2: convergence of sequences
Suppose that you are asked to prove completely rigorously that tends to as tends to infinity. Some people think that the words "completely rigorously" are an instruction to go back to first principles and produce a proof that starts like this. "Let . We shall show that there exists an such that whenever ." But that is not how to go about it. A much better way is to begin by dividing top and bottom by . That gives us . Once we have written the expression in this way, we can deduce that it converges to just from the fact that tends to zero and from standard theorems about limits. Those theorems are the ones that say that if and , then , , and (if ) . So from the fact that we get that , so , so , so . Similarly, , so , so . And finally, that implies that .
Example 3: convergence of series
How does one show that the series converges? That is, how does one show that as you keep on adding reciprocals of squares, the resulting sum doesn't go off to infinity? Convergence of infinite series is usually one of the first topics covered in a first course in real analysis, and one of the messages one needs to learn is that the best way of proving that a given series converges is usually not to start from scratch, but rather to deduce the convergence from the fact that some other series is known to converge.
In this case, the difficulty is that there doesn't seem to be a nice expression for the function . But perhaps we should think about that. What do we know about ? Well, from its definition we know that Applying the advice that one should hunt for analogies, we might recall that taking the difference is a bit like taking the derivative of . And if we did take the derivative of and obtained , then would have had the form .
Now let us turn that on its head and see what is if for every . The answer is . And turning that on its head, we see that the series can be analysed easily, because
But , so from this it follows that . Since that is true for every , the series converges and its limit is less than 2.
The point here is that we deduced the convergence of the series in an easy way from the convergence of a series that can be shown easily to converge.
You might object that you prefer a different proof that the series converges. For example, another well-known proof is to group the terms of the series as follows:
This is clearly less than what you get if you replace each fraction in a group by the largest fraction in that group. And what you get is
So the original series converges and its limit is less than 2.
However, if you prefer this proof, that doesn't imply that you like starting from scratch. This proof also uses what you know (that the series converges) to help you show something you don't know (that the series converges).
Example 4: the matrix of a rotation
Here is a well-known matrix calculation: to find the matrix of a rotation through about a unit vector in . The direct starting-from-scratch approach would be to work out what happens to the vectors , and under this rotation. However, that is quite a difficult calculation. A much better approach is to exploit the fact that we can write down the answer easily if we are lucky enough that is the vector . Then the rotation is just a rotation of the -plane, so it has matrix . Now all we have to do is come up with the matrix of a rotation that takes to and calculate . Why? Because the corresponding linear map moves to , then rotates everything through about , then moves back to again: the total effect is a rotation through about .
How do we find a rotation that takes to ? Well, the matrix of such a rotation has as its third column. So all we need to do is find two other unit column vectors and that are orthogonal to each other and to , and the matrix formed by , and will be the matrix of a rotation that takes to (unless it has determinant , in which case swap the first two columns). There is still a bit of calculation involved here, but it is pretty straightforward.
Example 5: the set of invertible matrices is open
The set of invertible matrices can be thought of as a subset of , and this subset turns out to be open. In other words, if you take any invertible matrix and change its entries by a small enough amount, then the perturbed matrix will also be invertible.
How should one prove this? The obvious approach is to dive straight in. If is our invertible matrix then we let be a small positive constant (to be chosen later) and try to prove that any matrix such that for every and is also invertible. But how do we find the inverse of ?
What we have just done is work directly with the definitions of "open set" and "invertible". However, if we are prepared to use a little bit of theory, the problem becomes dramatically easier. First of all, what are some fundamental facts about open sets? One is that they are complements of closed sets. Another is that the inverse image of an open set under a continuous function is open. If we use the first of these, we transform our problem into that of showing that the set of non-invertible matrices is closed. We could then use the sequence definition of closed sets and try to prove that a limit of non-invertible matrices is non-invertible. But there doesn't seem to be an obvious way to do that – or at least not yet. If we think about the second property of open sets, then we see that one way of proving that our set is open is to find a continuous function from the set of all matrices to some other space such that there is an open subset of with equal to the set of all invertible matrices.
It's not instantly obvious what such a function might be, so let us turn to the word "invertible". What important facts do we know about invertible matrices? One of them is that a matrix is invertible if and only if its rank is . Another is that it is invertible if and only if it has non-zero determinant.
At this point a lightbulb should go on. We were looking for a continuous function defined on matrices, and the determinant is a function defined on matrices. Is it continuous? Yes, since it is a polynomial function of the entries of the matrix. Can we express the set of invertible matrices as the inverse image of an open set? Yes, since they are precisely the matrices with determinant belonging to the set of non-zero real (or complex if we are talking about complex matrices) numbers, and that set is open.
So the eventual proof is extremely easy: the invertible matrices are the inverse image of the open set under the continuous function "determinant of". How did we find this proof? By not starting from scratch: instead, we spent a short time thinking about well-known facts connected with the concepts that we were trying to relate.
I hinted earlier that one could use the fact that open sets are complements of closed sets, and indeed that gives an alternative, fairly similar approach. We would like to prove that a limit of non-invertible matrices is non-invertible. Again, once we think of using determinants this becomes easy. It is enough to prove that a limit of matrices with zero determinant has zero determinant. This will certainly be the case if the determinant is continuous – which it is.
There are some areas of mathematics where this advice applies in a very extreme way: machinery is developed in order to save people the effort of writing out essentially the same argument again and again. A good example is algebraic topology, and indeed the article If your topological invariant takes integer values, they may well be homology classes in disguise could be summarized as giving the advice not to prove topological theorems from scratch, but to use the well-developed machinery of homology and cohomology instead.