Lazy Prime Sieve / 2011-05-08

Last year at the first Clojure Conj, for some reason or another I started pondering ways to improve on the classic lazy-list-of-primes (which I took a quasi stab at in an old post). Most of the simpler examples use division to check primality, which performs terribly compared to addition-based algorithms like the Sieve of Eratosthenes.

However, sieve implementations usually compute huge chunks of primes at once, while I wanted to be able to compute one prime at a time, hopefully using Clojure’s iterate function. I wanted it to perform similarly to the classic sieve in terms of time, and to have space requirements proportional to the square root of the prime being computed (particularly not holding the entire preceding sequence of primes in memory when only a fraction of that is actually needed).

So after thinking about the structure of the sieve algorithm for a bit, I realized that it actually was amenable to computing primes iteratively. In the classic sieve algorithm, given a candidate range, for each relevant small prime you sweep through the range marking off the multiples of that prime. The prime numbers in the range are those that remain after all the small primes have swept through. The key to making this iterative is, given a prime, rather than sweeping through an array, you just keep track of its next multiple. When that multiple comes up as a candidate, you mark it as composite, and then update your “next multiple” value for that prime. Of course you have to juggle several small primes instead of just one, so this might be a good point to bring up the actual code.

We start with the following object:

(def initial
  {:p        3,
   :divisors {9 '(3)},
   :square   9,
   :primes   nil})

The value of :p is the current prime. We first create a lazy seq of objects like this one, and turning that into a seq of primes is simply mapping to the value of :p. The value of :divisors is the data structure partially described in the previous paragraph. It is a map whose keys are upcoming composite numbers, and whose values are the primes relevant to those composites. The first odd composite to be on the lookout for is 9, and the only prime corresponding to it is 3. The last two keys, :square and :primes, are used for adding primes to the :divisors map. The value of :square tells us when to add a new prime, and the value of :primes will tell us what that prime is.

Let’s see how the initial object is iterated. The initial object corresponds to the prime 3, so the next object corresponds to the prime 5:

{:p        5,
 :divisors {9 (3)},
 :square   9,
 :primes   nil}

Not much changed. The only difference is that :p now maps to 5 instead of 3. This will also be the case for 7, so let’s skip that and go right to 11:

{:p        11,
 :divisors {15 (3), 25 (5)},
 :square   25,
 :primes   {:p 5, :divisors {9 (3)}, :square 9, :primes nil}}

There’s been a few changes here. When the 7 object was passed to the iterator function, it incremented 7 to 9 and checked if 9 was present in the :divisors map. Since it was, it knew that 9 was composite. First it updated the :divisors map from {9 (3)} to {15 (3)}, by computing that the next composite for three is 9 + 3 * 2 (we double the prime since we’re skipping even numbers). Then it checked if the current number under consideration (9) was equal to the value of :square. Since it was, it computed the next prime to add to the :divisors hash, which is of course 5. Computing the next prime, however, requires a bit of recursion, since computing primes is the actual task at hand. So the :primes key ends up pointing to a nested version of the exact object we’re looking at, and if you look at the :primes object above you can see it’s identical to the entire object given for p = 5.

This recursion is how we avoid keeping the entire list of primes in memory. Let’s see what the object looks like after computing the 10000th prime:

{:p 104729,
   {104737 (17 61 101),
    104833 (79),
    104993 (283),
    104835 (241),
    104867 (71 211),
    104931 (131),
    105187 (293),
    104741 (7 13),
    104805 (137),
    104775 (127),
    104807 (311),
    104777 (29),
    104809 (163),
    104873 (199),
    104937 (263),
    105001 (197),
    105033 (157 223),
    104747 (19 37 149),
    104843 (59),
    104749 (31 109),
    104781 (53),
    104813 (281),
    104877 (271),
    105101 (227),
    104751 (103 113),
    104753 (89 107),
    104945 (139 151),
    105073 (179),
    105169 (251),
    104755 (41 73),
    105011 (173),
    105043 (167),
    105301 (307),
    104791 (43),
    104855 (67 313),
    104983 (277),
    105111 (229),
    104857 (97),
    104921 (239),
    105113 (257),
    109561 (331),
    104731 (11),
    104763 (47),
    104859 (191),
    105083 (233),
    105179 (269),
    104733 (3),
    104765 (23),
    104829 (83),
    104735 (5),
    104799 (181 193),
    104927 (317)},
 :square 109561,
   {:p 331,
      {343 (7), 333 (3), 351 (13), 335 (5), 357 (17), 341 (11), 361 (19)},
    :square 361,
      {:p 19,
       :divisors {21 (3), 25 (5)},
       :square 25,
       :primes {:p 5, :divisors {9 (3)}, :square 9, :primes nil}}}}

As you can see there are several layers of nesting going on. This nesting does not create much performance overhead, since the nested versions don’t get very far relative to the main sequence, and I think it makes the algorithm simpler.

I did some minor googling to see if anybody else had done something like this, and I did find one that was quite similar. However, it looks like that programmer used explicit nested lazy seqs, while I wanted to keep the nested seqs implicit so the the objects iterated over remained finite.

Anyhow, here’s the full source code. I have a difficult time explaining algorithms like this clearly, so I will consider myself lucky if I’ve successfully communicated to a single person.