a repository of mathematical know-how

Elementary randomized algorithms

Quick description

This article is about algorithms that exploit the fact that, under suitable conditions, repeated trials of the same experiment almost always give rise to the same approximate behaviour in the long term. A famous example of such an algorithm is the randomized algorithm of Miller and Rabin for testing whether a positive integer is prime.


A knowledge of basic probability.

Example 1

Let n be a large positive integer. How can we work out whether n is prime? A brute-force approach of checking all possible factors up to \sqrt{n} takes far too long: we're thinking here of numbers with a couple of hundred digits, say, and we do not want our computer to have to do 10^{100} calculations.

The Miller-Rabin test for primality is a remarkable method of dealing with this problem – but with a twist. The twist is that it does not tell us with certainty that a number is prime, but just with an incredibly high probability.

This sounds rather strange: what does it mean to say that a number has a 99% probability of being prime? Surely it is either prime or not prime. However, the statement can be given a precise meaning, as we shall see.

The test is based on the important observation that there are ways of telling that an integer is composite that do not involve finding factors. For example, Fermat's little theorem tells us that if p is prime and a is not a multiple of p, then a^{p-1}\equiv 1 mod p. Therefore, if we are given an integer n and can find an integer a such that a^{n-1}\not\equiv 1 mod n and a is not a multiple of n, then we have a proof that n is composite. (It is important here that one can work out powers of a in polynomial time: first, by repeated squaring one works out a, a^2, a^4, a^8, and so on, and then one takes products of some of these to obtain a^{n-1}.)

Unfortunately, this simple test does not always work: there are integers n, called Carmichael numbers, such that for every a that is not a multiple of n, a^{n-1}\equiv 1 mod n. (A well-known example is 561.) However, there is a slight generalization of this Fermat test that does always work. It is described in the Wikipedia article on the Miller-Rabin test. Moreover, it works with probability 3/4. That is, if you are given an integer n, then at least 3n/4 of the integers a between 1 and n-1 will have a certain property that guarantees that n is composite and that can be tested in polynomial time.

How do we convert that fact into an algorithm? It is very simple: we just repeatedly choose random integers a between 1 and n-1 and test for the composite-guaranteeing property. If we choose k integers and n is composite, then the probability that we will show that n is composite is at least 1-(1/4)^k. Turning that round, if we run the algorithm and do not manage to prove that n is composite, then there are two ways that that could have happened: either n was composite and we were unlucky enough not to detect that (because we chose k numbers a that were in the bad set of size at most n/4), or n was prime.

How to interpret this depends slightly on how we came up with the positive integer n. This can be more important than one might think, but for now let us just assume that n was a random integer chosen between N and 2N for some large N. Then the prior probability that n was prime is around 1/\log N, and a simple calculation using Bayes's theorem shows that the probability that n is composite becomes very small once k exceeds \log\log N by a significant amount.

What if we chose n in a different way – say, by choosing a random integer m between M and 2M and taking n to be 2^m-1? (Note that here the running time of the algorithm will be polynomial in M, whereas in the previous paragraph it was polynomial in \log N.) Now we know much less about the extent to which we should expect n to be prime. Nevertheless, unless we have very good reason to suppose that n is composite, then if we choose a large k and fail to show that n is composite, we can be pretty confident that it is prime.

General discussion

The Miller-Rabin algorithm works by reducing the problem to that of approximating the cardinality of a set to within some multiple of n. The set in question is the set of integers a that give rise to short proofs that n is composite, and what makes the algorithm possible is that this set is either empty or of cardinality at least 3n/4. Therefore, if we can find an algorithm that with high probability approximates its size to within less than 3n/8, then we have a high probability of determining which of these two possibilities holds.

Since it just takes one element to demonstrate that a set is non-empty, we can in this instance say something stronger still: that we will either determine with certainty that the set has size at least 3n/4 or with very high probability that it is empty.

Example 2

Suppose that you are given a rather wildly oscillating function f from [0,1] to [0,1] and asked to estimate its integral. Suppose also that f is sufficiently continuous that knowing a real number x to 30 decimal places allows you to get a good approximation to f(x), but sufficiently wild that you often do need to know x to that sort of accuracy. How could one go about the task? A brute-force method would be to do 10^{30} calculations and take the average, but 10^{30} calculations is way beyond what is feasible for today's computers, so this does not work.

A simple randomized algorithm will get round this difficulty instantly, with the usual cost that one has to exchange certainty for near-certainty. One simply chooses k random numbers x_1,\dots,x_k\in[0,1] and calculates the average A=k^{-1}\sum_{i=1}^kf(x_i). The numbers f(x_i) are independent random variables with mean \int_0^1f(x)\,dx. Or rather, since these numbers will in practice be long terminating decimals, the mean M is close to this, by our continuity assumption on f. Let us suppose that we have chosen the number of digits so that |M-\int_0^1f(x)\,dx|\leq\epsilon. The probability that |A-M|\geq\epsilon can be shown to be at most 2\exp(-\epsilon^2k/4). (A proof of this can be found as Example 3 of bounding probabilities by expectations.) Therefore, the probability that |A-\int_0^1f(x)\,dx|>2\epsilon is at most 2\exp(-\epsilon^2k/4). Thus, in order to approximate the integral \int_0^1f(x)\,dx to within 2\epsilon with a probability 1-\delta of success, one needs k to be proportional to \epsilon^{-2}\log(\delta^{-1}).

General discussion

Because the dependence of k on \delta is logarithmic, the uncertainty inherent in this algorithm is not of practical importance: one can arrange for the probability of success to be so small that even if one repeated the calculation a trillion times a second, it would take a trillion years, on average, before an accuracy of 2\epsilon was not achieved. Even if an aeroplane's staying up in the air depended on the calculation, this would be an acceptable level of risk (since it is far smaller than the risk that some mechanical fault would bring the aeroplane down).

A computational task of great practical importance is to estimate multiple integrals with a large number of variables. For these the above method is usually inappropriate, because it often happens that "all the activity takes place in a small area". To see what is meant by this, imagine that wished to integrate the function f(x)=\exp(-\|x\|^2) over the set [-1,1]^n, where \|x\| is the Euclidean norm of x. This integral is easy to calculate, since it is just (\int_{-1}^1\exp(-t^2)\,dt)^n, but we are using it to illustrate the fact that one will not be able to duplicate this calculation by simply choosing random points in the cube [-1,1]^n and taking the average value of the function at those points. The reason is that the integral of f can be approximated by its integral over a set of exponentially small measure. (We shall not prove this here.) Therefore, you need to sample exponentially many points in order to get a good approximation to the integral, at least if by "good approximation" you mean "correct to within a small percentage error". (If you are happy to settle for a small additive error then you can solve the problem trivially by just estimating that the integral is 0. But that is not very interesting.)

This is a serious problem, but there are ways round it. They are far from elementary: to get an idea of what one of the main techniques is, see random sampling using Markov chains.


Post new comment

(Note: commenting is not possible on this snapshot.)

Before posting from this form, please consider whether it would be more appropriate to make an inline comment using the Turn commenting on link near the bottom of the window. (Simply click the link, move the cursor over the article, and click on the piece of text on which you want to comment.)