Project Euler summary

Created 30th November, 2008 09:17 (UTC), last edited 1st December, 2008 04:11 (UTC)

I didn't get any code submitted for this one, but a couple of people described their solutions. The question involves calculating a sequence of numbers and calculating the length of the sequence.

The formula for the next value in the sequence is this:

n -> n/2 (n is even)
n -> 3n + 1 (n is odd)

Odd numbers will lead to a larger value and even ones to a smaller one. Whether this is guaranteed to always lead to a value of one from any starting number is still an open question in mathematics, but as we're looking for the longest sequence starting from numbers up to one million we're not going to be pushing the envelope here.

The simplest algorithm here is to simply work it out for all of the values, but there's a lot of repeated work in there. Here is the example sequence from the question:

13 40 20 10 5 16 8 4 2 1

This has a length of 10. So how long is the sequences starting from 26? It's simply one longer because the next value in its sequence is 13. This means that once you have calculated the sequence length from a value you can re-use that calculate the sequence length from any number that hits that value.

Memoization

The obvious thing to do here is to memoize the results. JordiB describes his cache — he only cached values for values up to one million. Till described a caching mechanism he tried in Python — using a Python dict (an associative array) to cache results. Till had to give up after about ten minutes of run time, but JordiB got an answer pretty quickly.

Here is my memoizing version in C++:

#include <fost/cli>
#include <fost/main.hpp>

using namespace fostlib;

typedef unsigned int int_type;

int_type next_n( int_type n ) {
    if ( n % 2 )
        return 3 * n + 1;
    else
        return n / 2;
}

int_type chain_length( int_type n ) {
    if ( n == 1 )
        return 1;
    typedef std::vector< std::size_t > vector_type;
    static std::map< std::size_t, vector_type > cache;
    const std::size_t divisor = 1000;
    vector_type &cache_line( cache[ n / divisor ] );
    if ( cache_line.empty() )
        cache_line = vector_type( divisor );
    vector_type::value_type &item = cache_line[ n % divisor ];
    if ( item == 0 )
        item = 1 + chain_length( next_n( n ) );
    return item;
}

FSL_MAIN(
    L"p014",
    L"Project Euler puzzle 14"
)( fostlib::ostream &, fostlib::arguments & ) {
    std::pair< int_type, std::size_t > longest( 0, 0 );
    for ( int_type n = 1; n != 1000000; ++n ) {
        std::size_t length = chain_length( n );
        if ( length > longest.second ) {
            longest = std::make_pair( n, length );
            std::cout << longest.first << " has length " << longest.second << std::endl;
        }
    }
    return 0;
}

The main feature here is the cache. It uses a map, again an associative array, but instead of having each value as the leaf, the leaves are a vector one thousand elements long. This means that I can tune the size of the array versus the associative array to give me different profiles of memory contiguousness versus expected usage (the values that I want to memoize are likely to get pretty sparse the higher the numbers).

I guessed at using a divisor of 1,000 and it turns out to be a pretty good guess. My computer gives an answer in about 1.1 seconds. With 10,000 it takes about 1.9 seconds. Being more extreme, at 10 I get 1.8s and at 100,000 I get… Well, I got through the first 2GB of RAM in a couple of seconds. The next 1GB took another couple of minutes and I had to stop it lest my machine get totally unusable.

This is telling me something, and it's telling me something important. As the numbers get larger the cache hits get increasingly sparse which means that the cache really isn't doing me much good — I'm spending too much time allocating memory and zeroing it. If I change the divisor to 1 — which means that I'm getting only O(log n) indexing into the cache with a massive constant factor because I'm stilling using an array in there. Given all of that it's only taking about 2.3s to give me the result.

What happens if you remove the cache completely?

The straight forward solution

Ok, so I'm trading memory against computation — the computation itself is pretty simple, just a number that depends on the previous number. There is no external data required unless I try to cache the result. What happens if I simply don't bother?

The code change is simplicity itself:

int_type chain_length( int_type n ) {
    if ( n == 1 )
        return 1;
    else
        return 1 + chain_length( next_n( n ) );
}

Much shorter and easier to understand. This version also only takes 0.9s to run. Ah.

Once I realised that I ought to throw out the algorithmic optimisation then getting this even faster is very straightforward. To start with the n/2 term informs us that the answer will lie in the top half of the number range — that brings it down to just under half a second. Finally, as the odd numbers are the ones that increase the size they're the only ones that can really make long chains so we know that we only have to look at the odd numbers. The following loop runs in about a quarter of a second.

    for ( int_type n = 5000010; n <= 1000000; n += 2 ) {
        std::size_t length = chain_length( n );
        if ( length > longest.second ) {
            longest = std::make_pair( n, length );
            cout << longest.first << " has length " << longest.second << std::endl;
        }
    }

Anti-optimization

So, strip out the memoization — that is the thing that we're used to adding in to every program that we write wherever we can — and it runs faster without. The problem here is that our notion of what is expensive and what is cheap — at least my notion — is stuck in old straightforward hardware designs where trading a bit of computation for memory still makes sense. The thing with modern CPUs is that this isn't true any more. A CPU cache miss can cost thousands of cycles and modern CPUs can do a floating point operation faster than they can do an integer one. An awful lot of computation is effectively free when compared with a memory read — even more so when the read isn't in the CPU cache, and even that pales into insignificance when the read triggers a page fault.

There is only one way to know, and that is to profile — try the simple algorithm and if that isn't fast enough then try a more complex one.

So, here is the quick version. This comes up with the answer in less than a quarter of a second on my machine. Less is more, worse is better and premature optimisation is certainly evil!

#include <fost/cli>
#include <fost/main.hpp>


using namespace fostlib;

typedef unsigned int int_type;

int_type next_n( int_type n ) {
    if ( n % 2 )
        return 3 * n + 1;
    else
        return n / 2;
}

int_type chain_length( int_type n ) {
    if ( n == 1 )
        return 1;
    else
        return 1 + chain_length( next_n( n ) );
}

FSL_MAIN(
    L"p014",
    L"Project Euler puzzle 14"
)( fostlib::ostream &cout, fostlib::arguments & ) {
    std::pair< int_type, std::size_t > longest( 0, 0 );
    for ( int_type n = 500001; n <= 1000000; n += 2 ) {
        std::size_t length = chain_length( n );
        if ( length > longest.second ) {
            longest = std::make_pair( n, length );
            cout << longest.first << " has length " << longest.second << std::endl;
        }
    }
    return 0;
}

And just for Till, here is the same thing in Python. Nowhere near as fast of course (it takes just over 20 seconds), but more than quick enough:

def next(n):
    if n%2:
        return 3 * n + 1
    else:
        return n / 2

def chain_length(n):
    if n == 1:
        return 1
    else:
        return 1 + chain_length(next(n))

value, longest = 0, 0 for i in range(250000):
    v = i * 2 + 500001
    length = chain_length(v)
    if length > longest:
        value, longest = v, length
        print value, " has length ", length