The Sieve of Eratosthenes and F#

There is a problem on the Euler project, www.projecteuler.net, which asks to find the sum of all values under a given number. Problems on the Euler project have a range of solutions, where at least one solution has a runtime of under 1 minute. A popular, time efficient algorithm that finds all primes in a given range is the Sieve of Eratosthenes. The basic algorithm is:

  1. Create an array that contains all values from 2 up to the final maximum value
  2. Starting at 2, For each value in the array, mark all items that have indices of the current value that are multiples and NOT equivalent to the current index as 0 (eg. index mod [current value] == 0).
  3. March forward until you hit a non-zero value in the array, then mark all multiples as 0.
  4. Halt condition: stop when you cross the midpoint of the array (anything remaining that is non-zero is a prime since you already all multiples of 2).

In practice, this gives successive iterations of an array from 2 .. 10 as:

  1. [2, 3, 4, 5, 6, 7, 8, 9, 10]
  2. [2, 3, 0, 5, 0, 7, 0, 9, 0]
  3. [2, 3, 0, 5, 0, 7, 0, 0, 0]
  4. At this point, we reach 5, which is greater than the length of the array (9) divided by 2 (4.5). Any non-zero values in the array are prime.

In C#, one possible implementation is:

private static long SieveOfErastosthenes(long maxvalue)
{
    long[] values = new long[maxvalue];
    // Populate the list
    for (long i = 0; i < values.LongLength; ++i)
    {
        values[i] = i;
    }
    values[0] = 0;
    values[1] = 0;
    for (long i = 2; i < values.LongLength; ++i)
    {
        if (values[i] != 0)
        {
            for (long j = i * 2; j < values.LongLength; j += i)
            {
                values[j] = 0;
            }
        }
    }

    long retval = 0;
    foreach (long k in values)
    {
        retval += k;
    }
    return retval;
}

Overall computational complexity is O(n). If you aren’t familiar with big O notation, the overall complexity is bound by the size of the dataset times a constant—in this case 3, since we iterate through the complete dataset 3 times:

  1. Populate the dataset
  2. Mark all non primes as 0
  3. Sum the primes

And yes, this works quite well, even for large values of n. My challenge was to get the sieve to work with performance equal to C#, only in F#. At the start of this, I had my first pass of the C# code running in ~600 ms for a value of 5 million. The challenge I set for myself was to achieve a similar timing in F#. The first few passes left me with runtimes of over 15 minutes (I killed the runs prior to completion). The reason why was simple: I wrote an O(n^3) algorithm. My challenge was to get this faster without using mutable values. My first pass looked like this:

let rec sieveofErastosthenes values filterout maxval =
    if (filterout >= (maxval/2)) then
        values
    else
        let shouldFilter =
            try
                let a = values |> List.find(fun x -> x = filterout)
                a <> 0
            with
            | _ as ex -> false
        if (shouldFilter) then
            sieveofErastosthenes (values 
                |> List.filter(fun x -> (x = filterout) || ((x % filterout) <> 0))) (filterout + 1) maxval
        else
            sieveofErastosthenes values (values |> List.find(fun x -> x > filterout)) maxval

let max = 5000000
let sumofprimes =  sieveofErastosthenes [2 .. max] 2 max |> List.map( fun x -> int64 x) |> List.sum

This code is correct—pass in 10 for the max and the value returned is 17. But the answer just never returned for larger values within reasonable time. Next, I tried using a mutable type: System.Collections.Generic.List<T>. This got the time down to a measureable 4s. That’s still 6.5x longer than the C# version. That next attempt looks like this:

let sieveOfEratosthenes x =
    let retval = new List<int32>()
    retval.AddRange([0 .. x])
    retval.Item(0) <- 0
    retval.Item(1) <- 0
    for i in 2 .. x do
        if retval.Item(i) <> 0 then
            for j in 2 * i .. i .. x do
                retval.Item(j) <- 0
    retval.ToArray()

let max = 5000000
let sumofprimes =  sieveOfEratosthenes max |> Seq.map( fun x -> int64 x) |> Seq.sum
printfn "%A\n" sumofprimes

Notice that this latest iteration executes a lot less code and, frankly, looks pretty close to the C# code. Through some investigation, I found that the call to preload the array consumed 2.6 s. I then switched to a sequence and shrank the execution time to 2300 ms. I was closing in on my goal and learning some stuff about the performance of various F# constructs. Loading using a sequence changed the AddRange call to this:

retval.AddRange([|0 .. x|])

The reason this takes less time is simple. When allocating a List, F# allocates and initializes all values up front. The Seq is created with an iterator and only allocates enough space to store the iterator and not much else. So far, so good. At this point, I did a search on F# implementations of the Sieve of Eratosthenes and found that Chris Smith had one. His implementation keeps track of which numbers AREN’T prime. Performance on his algorithm hits 8600 ms. That wound up being a step in the wrong direction.

Next, I tried, explicitly initializing the list of integers and cut the time down to 1450 ms. The strategy was getting me closer to optimal timing and showed that, for 5 million int32s, I was filling in the list in 46 ms.

let sieveOfEratosthenes x =
    let retval = new List<int32>(x + 1)
    for i in 0 .. x do retval.Add(i)

    retval.Item(0) <- 0
    retval.Item(1) <- 0
    for i in 2 .. x do
        if retval.Item(i) <> 0 then
            for j in 2 * i .. i .. x do
                retval.Item(j) <- 0
    retval |> Seq.filter(fun x -> x <> 0)

This was better, but I’d like to get closer to the C# version. The last thing I tried was moving to Array from System.Collections.Generic.List. This last change cut 250 ms, bringing my times down to 1200 ms. It looks like using lighter weight objects really will speed things up! That version is:

let sieveOfEratosthenes x =
    let retval = Array.zeroCreate(x + 1)
    for i in 0 .. x do retval.[i] <- i
    retval.[0] <- 0
    retval.[1] <- 0
    for i in 2 .. x do
        if (i == 2 ||  i % 2 <> 0) && retval.[i] <> 0 then
            for j in 2 * i .. i .. x do
                retval.[j] <- 0

    retval |> Seq.filter(fun x -> x <> 0)

let max = 5000000
let sumofprimes =  sieveOfEratosthenes max |> Seq.map( fun x -> int64 x) |> Seq.sum


Posted 12-18-2009 9:42 PM by Scott Seely
Filed under: , ,

[Advertisement]

About The CodeBetter.Com Blog Network
CodeBetter.Com FAQ

Our Mission

Advertisers should contact Brendan

Subscribe
Google Reader or Homepage

del.icio.us CodeBetter.com Latest Items
Add to My Yahoo!
Subscribe with Bloglines
Subscribe in NewsGator Online
Subscribe with myFeedster
Add to My AOL
Furl CodeBetter.com Latest Items
Subscribe in Rojo

Member Projects
DimeCasts.Net - Derik Whittaker

Friends of Devlicio.us
Red-Gate Tools For SQL and .NET

NDepend

SlickEdit
 
SmartInspect .NET Logging
NGEDIT: ViEmu and Codekana
LiteAccounting.Com
DevExpress
Fixx
NHibernate Profiler
Unfuddle
Balsamiq Mockups
Scrumy
JetBrains - ReSharper
Umbraco
NServiceBus
RavenDb
Web Sequence Diagrams
Ducksboard<-- NEW Friend!

 



Site Copyright © 2007 CodeBetter.Com
Content Copyright Individual Bloggers

 

Community Server (Commercial Edition)