This article is about algorithms that exploit the fact that, under suitable conditions, repeated trials of the same experiment almost always give rise to the same approximate behaviour in the long term. A famous example of such an algorithm is the randomized algorithm of Miller and Rabin for testing whether a positive integer is prime.
A knowledge of basic probability.
Let be a large positive integer. How can we work out whether is prime? A brute-force approach of checking all possible factors up to takes far too long: we're thinking here of numbers with a couple of hundred digits, say, and we do not want our computer to have to do calculations.
The Miller-Rabin test for primality is a remarkable method of dealing with this problem – but with a twist. The twist is that it does not tell us with certainty that a number is prime, but just with an incredibly high probability.
This sounds rather strange: what does it mean to say that a number has a 99% probability of being prime? Surely it is either prime or not prime. However, the statement can be given a precise meaning, as we shall see.
The test is based on the important observation that there are ways of telling that an integer is composite that do not involve finding factors. For example, Fermat's little theorem tells us that if is prime and is not a multiple of , then mod . Therefore, if we are given an integer and can find an integer such that mod and is not a multiple of , then we have a proof that is composite. (It is important here that one can work out powers of in polynomial time: first, by repeated squaring one works out , , , , and so on, and then one takes products of some of these to obtain )
Unfortunately, this simple test does not always work: there are integers , called Carmichael numbers, such that for every that is not a multiple of , mod . (A well-known example is .) However, there is a slight generalization of this Fermat test that does always work. It is described in the Wikipedia article on the Miller-Rabin test. Moreover, it works with probability . That is, if you are given an integer , then at least of the integers between and will have a certain property that guarantees that is composite and that can be tested in polynomial time.
How do we convert that fact into an algorithm? It is very simple: we just repeatedly choose random integers between and and test for the composite-guaranteeing property. If we choose integers and is composite, then the probability that we will show that is composite is at least Turning that round, if we run the algorithm and do not manage to prove that is composite, then there are two ways that that could have happened: either was composite and we were unlucky enough not to detect that (because we chose numbers that were in the bad set of size at most ), or was prime.
How to interpret this depends slightly on how we came up with the positive integer . This can be more important than one might think, but for now let us just assume that was a random integer chosen between and for some large Then the prior probability that was prime is around and a simple calculation using Bayes's theorem shows that the probability that is composite becomes very small once exceeds by a significant amount.
What if we chose in a different way – say, by choosing a random integer between and and taking to be ? (Note that here the running time of the algorithm will be polynomial in , whereas in the previous paragraph it was polynomial in .) Now we know much less about the extent to which we should expect to be prime. Nevertheless, unless we have very good reason to suppose that is composite, then if we choose a large and fail to show that is composite, we can be pretty confident that it is prime.
The Miller-Rabin algorithm works by reducing the problem to that of approximating the cardinality of a set to within some multiple of The set in question is the set of integers that give rise to short proofs that is composite, and what makes the algorithm possible is that this set is either empty or of cardinality at least Therefore, if we can find an algorithm that with high probability approximates its size to within less than then we have a high probability of determining which of these two possibilities holds.
Since it just takes one element to demonstrate that a set is non-empty, we can in this instance say something stronger still: that we will either determine with certainty that the set has size at least or with very high probability that it is empty.
Suppose that you are given a rather wildly oscillating function from to and asked to estimate its integral. Suppose also that is sufficiently continuous that knowing a real number to 30 decimal places allows you to get a good approximation to but sufficiently wild that you often do need to know to that sort of accuracy. How could one go about the task? A brute-force method would be to do calculations and take the average, but calculations is way beyond what is feasible for today's computers, so this does not work.
A simple randomized algorithm will get round this difficulty instantly, with the usual cost that one has to exchange certainty for near-certainty. One simply chooses random numbers and calculates the average The numbers are independent random variables with mean Or rather, since these numbers will in practice be long terminating decimals, the mean is close to this, by our continuity assumption on . Let us suppose that we have chosen the number of digits so that The probability that can be shown to be at most (A proof of this can be found as Example 3 of bounding probabilities by expectations.) Therefore, the probability that is at most Thus, in order to approximate the integral to within with a probability of success, one needs to be proportional to
Because the dependence of on is logarithmic, the uncertainty inherent in this algorithm is not of practical importance: one can arrange for the probability of success to be so small that even if one repeated the calculation a trillion times a second, it would take a trillion years, on average, before an accuracy of was not achieved. Even if an aeroplane's staying up in the air depended on the calculation, this would be an acceptable level of risk (since it is far smaller than the risk that some mechanical fault would bring the aeroplane down).
A computational task of great practical importance is to estimate multiple integrals with a large number of variables. For these the above method is usually inappropriate, because it often happens that "all the activity takes place in a small area". To see what is meant by this, imagine that wished to integrate the function over the set where is the Euclidean norm of . This integral is easy to calculate, since it is just but we are using it to illustrate the fact that one will not be able to duplicate this calculation by simply choosing random points in the cube and taking the average value of the function at those points. The reason is that the integral of can be approximated by its integral over a set of exponentially small measure. (We shall not prove this here.) Therefore, you need to sample exponentially many points in order to get a good approximation to the integral, at least if by "good approximation" you mean "correct to within a small percentage error". (If you are happy to settle for a small additive error then you can solve the problem trivially by just estimating that the integral is . But that is not very interesting.)
This is a serious problem, but there are ways round it. They are far from elementary: to get an idea of what one of the main techniques is, see random sampling using Markov chains.